macOS NSRegularExpression

From Lazarus wiki
Revision as of 09:54, 14 June 2021 by Trev (talk | contribs) (→‎Code: Tweak code)
Jump to navigationJump to search

English (en)

macOSlogo.png

This article applies to macOS only.

See also: Multiplatform Programming Guide


Regular expressions

Regular expressions are patterns used to match specified alpha-numeric character combinations in the string data being searched.

Each character in a regular expression (that is, each character in the string describing its pattern) is either a metacharacter or operator, having a special meaning, or a regular character that has a literal meaning.

Regular expressions can be incredibly complex. Indeed, whole books have been written about them! For a gentle introduction to regular expressions, see this O'Reilly article.

NSRegularExpression Overview

The NSRegularExpression class has convenience methods for returning all the matches as an array, the total number of matches, the first match, and the range of the first match.

An individual match is represented by an instance of the NSTextCheckingResult class, which carries information about the overall matched range (via its range property), and the range of each individual capture group (via the rangeAtIndex method).

NSRegularExpression conforms to the International Components for Unicode (ICU) specification for regular expressions.

Metacharacters

For a comprehensive list of characters used by the NSRegularExpression class that have a special meaning in regular expression patterns, see the ICU listing.

Operators

For a comprehensive list of operators used by the NSRegularExpression class, see the ICU listing.

Example 1 - matching a pattern

In this fairly trivial and contrived code example, we use the \d metacharacter which matches a decimal digit and the + operator to match a decimal digit one or more times. This pattern \d+ aims to match all the occurrences of numbers in the search string which we then output using NSLog(). It uses the NSRegularExpression convenience methods for returning all of the matches in the search string as an array and the total number of matches.

Code

Program regex_ex1;

{$mode objfpc}{$H+}
{$modeswitch objectivec2}

Uses
  MacOSAll, CocoaAll, SysUtils;

Var
  srchStr : String;
  patnStr : String;
  myRegex : NSregularExpression;
  matches : NSArray;
  match   : NSTextCheckingResult;
  error   : NSErrorPtr;

Begin
  error   := Nil;
  srchStr := 'I have 43 bags of 60 marbles.';
  patnStr := '\d+';

  // Create a regular expression with given string and options
  myRegex := NSregularExpression.regularExpressionWithPattern_options_error(NSStr(patnStr), NSRegularExpressionOptions(NSRegularExpressionCaseInsensitive), error);

  // Check creation of regular expression with given string and options
  if(error <> Nil) then
    begin
      NSLog(NSStr('Regex creation error: %@'), error);
      Exit;
    end;

  // Save any matches in the given string in the matches array
  matches := myRegex.matchesInString_options_range(NSStr(srchStr), 0, NSMakeRange(0, srchStr.Length));

  // Output
  NSLog(NSStr('Search string: %@'), NSStr(srchStr));
  NSLog(NSStr('Pattern string: %@'), NSStr(patnStr));
  NSLog(NSStr('Number of matches: %lu'), myRegex.numberOfMatchesInString_options_range(NSStr(srchStr), 0, NSMakeRange(0, srchStr.Length)));

  for match in matches do
    NSLog(NSStr('match: %@'), NSStr(srchStr).substringWithRange(match.rangeAtIndex(0)));
End.

Output

The output from running the above code example is:

2021-06-12 21:05:46.335 regex_ex1[26138:232243] Search string: I have 43 bags of 60 marbles.
2021-06-12 21:05:46.336 regex_ex1[26138:232243] Pattern string: \d+
2021-06-12 21:05:46.336 regex_ex1[26138:232243] Number of matches: 2
2021-06-12 21:05:46.336 regex_ex1[26138:232243] match: 43
2021-06-12 21:05:46.336 regex_ex1[26138:232243] match: 60

Code explanation

The call to regularExpressionWithPattern_options_error() creates an NSRegularExpression object instance (myRegex) with the specified regular expression pattern and options.

Options are specified using NSRegularExpressionOptions(). Note that by default NSRegularExpression performs case-sensitive searches, so we specified the NSRegularExpressionCaseInsensitive option for case-insenstive searches although, because we are dealing with digits above, this has no effect and we might as well have specified Nil in this example for no options.

Once we have the NSRegularExpression object, we can then use it for matching text among other operations.

After checking that the creation of the regular expression did not fail with an error, we call the matchesInString_options_range() method to search for any matches and store them in our NSArray (matches). This method takes our search string, any options (there are none here) and the range to search. The range is specified by giving NSMakeRange() the starting location to search in the string (0 = the beginning of the string) and the length of the search string.

Next, we output the search string and pattern string, and then call the numberOfMatchesInString_options_range() method to determine the number of matches and output it.

Finally, we iterate through the matches NSArray and output the matches individually. The call to rangeAtIndex(0) is the full match. The code for this looks a little obscure. Let me try to unpack it for you.

If you just output the content of the matches NSArray you get this:

"<NSSimpleRegularExpressionCheckingResult: 0x14d617570>{7, 2}{<NSRegularExpression: 0x14d614fd0> \\d+ 0x1}",
"<NSSimpleRegularExpressionCheckingResult: 0x14d617610>{18, 2}{<NSRegularExpression: 0x14d614fd0> \\d+ 0x1}"

Notice the {7, 2} and {18, 2} ranges which locate the first number at position 7 (counting from zero) in the search string with a length of 2 and the second number at position 18 with a length of 2. Knowing those ranges, you could use:

NSLog(NSStr('match: ''%@'''), NSStr(srchStr).substringWithRange(NSMakeRange(7, 2)));

to output the first number. The substringWithRange() method extracts from our search string the substring that matches the specified range (7, 2). Clearer than mud? I hope so.

Example 2 - matching pattern groups

This is similar to Example 1 above, except that this time we match groups of characters. Our search string is the same as before, but our pattern string has some added complexity. The pattern matches a decimal digit one or more times as before, but this time as a group which is delineated by using parentheses: (/d+). Next, we use a point . to match any character and ? to match zero or one times following the digit(s). Finally, we match the set of characters from a to z ([a-z]+) one or more times as a group.

Code

Program regex_ex2;

{$mode objfpc}{$H+}
{$modeswitch objectivec2}

Uses
  MacOSAll, CocoaAll, SysUtils;

Var
  srchStr : String;
  patnStr : String;
  myRegex : NSregularExpression;
  matches : NSArray;
  match   : NSTextCheckingResult;
  error   : NSErrorPtr;

Begin
  error   := Nil;
  srchStr := 'I have 43 bags of 60 marbles.';
  patnStr := '(\d+).?([a-z]+)';

  // Create a regular expression with given string and options
  myRegex := NSregularExpression.regularExpressionWithPattern_options_error(NSStr(patnStr), NSRegularExpressionOptions(NSRegularExpressionCaseInsensitive), error);

  // Check creation of regular expression with given string and options
  if(error <> Nil) then
    begin
      NSLog(NSStr('Regex creation error: %@'), error);
      Exit;
    end;

  // Save any matches in the given string in the matches array
  matches := myRegex.matchesInString_options_range(NSStr(srchStr), 0, NSMakeRange(0, srchStr.Length));

  // Output
  NSLog(NSStr('Search string: %@'), NSStr(srchStr));
  NSLog(NSStr('Pattern string: %@'), NSStr(patnStr));
  NSLog(NSStr('Number of matches: %lu'), myRegex.numberOfMatchesInString_options_range(NSStr(srchStr), 0, NSMakeRange(0, srchStr.Length)));

  for match in matches do
    begin
      NSLog(NSStr('match(0): ''%@'''), NSStr(srchStr).substringWithRange(match.rangeAtIndex(0)));
      NSLog(NSStr('match(1): ''%@'''), NSStr(srchStr).substringWithRange(match.rangeAtIndex(1)));
      NSLog(NSStr('match(2): ''%@'''), NSStr(srchStr).substringWithRange(match.rangeAtIndex(2)));
    end;
End.

Output

2021-06-14 17:39:51.255 program2[7376:149163] Search string: I have 43 bags of 60 marbles.
2021-06-14 17:39:51.255 program2[7376:149163] Pattern string: (\d+).?([a-z]+)
2021-06-14 17:39:51.255 program2[7376:149163] Number of matches: 2
2021-06-14 17:39:51.255 program2[7376:149163] match(0): '43 bags'
2021-06-14 17:39:51.255 program2[7376:149163] match(1): '43'
2021-06-14 17:39:51.255 program2[7376:149163] match(2): 'bags'
2021-06-14 17:39:51.255 program2[7376:149163] match(0): '60 marbles'
2021-06-14 17:39:51.255 program2[7376:149163] match(1): '60'
2021-06-14 17:39:51.255 program2[7376:149163] match(2): 'marbles'

See also

External links