Newline conventions
PCRE supports five different conventions for indicating line breaks in strings: a single CR (carriage return) character, a single LF (linefeed) character, the two-character sequence CRLF, any of the three preceding, or any Unicode newline sequence. The pcreapi page has further discussion about newlines, and shows how to set the newline convention in the options arguments for the compiling and matching functions.
It is also possible to specify a newline convention by starting a pattern string with one of the following five sequences:
(*CR) carriage return
(*LF) linefeed
(*CRLF) carriage return, followed by linefeed
(*ANYCRLF) any of the three above
(*ANY) all Unicode newline sequences
These override the default and the options given to the compiling function. For example, on a Unix system where LF is the default newline sequence, the pattern
(*CR)a.b
changes the convention to CR. That pattern matches "a\nb" because LF is no longer a newline. If more than one of these settings is present, the last one is used.
The newline convention affects where the circumflex and dollar assertions are true. It also affects the interpretation of the dot metacharacter when PCRE_DOTALL is not set, and the behaviour of \N. However, it does not affect what the \R escape sequence matches. By default, this is any Unicode newline sequence, for Perl compatibility. However, this can be changed; see the description of \R in the section entitled "Newline sequences" below. A change of \R setting can be combined with a change of newline convention.
NOTE!
The editor always use CRLF internally, even if the file is a Unix file using LF as a newline character. To identify a newline you have to enter "\r\n" in the search field.
Philip Hazel
University Computing Service
Cambridge CB2 3QH, England.
Last updated: 12 November 2013
Copyright © 1997-2013 University of Cambridge.
|