Type: Package
Title: Build Regular Expressions in a Human Readable Way
Version: 0.1-3
Date: 2017-04-25
Author: Richard Cotton [aut, cre]
Maintainer: Richard Cotton <richierocks@gmail.com>
Description: Build regular expressions piece by piece using human readable code. This package is designed for interactive use. For package development, use the rebus.* dependencies.
Depends: R (≥ 3.1.0)
Imports: rebus.base (≥ 0.0-3), rebus.datetimes, rebus.numbers, rebus.unicode (≥ 0.0-2)
Suggests: testthat
License: Unlimited
LazyLoad: yes
LazyData: yes
Acknowledgments: Development of this package was partially funded by the Proteomics Core at Weill Cornell Medical College in Qatar <http://qatar-weill.cornell.edu>. The Core is supported by 'Biomedical Research Program' funds, a program funded by Qatar Foundation.
RoxygenNote: 6.0.1
Collate: 'export-base.R' 'export-datetimes.R' 'export-numbers.R' 'export-unicode.R' 'imports.R' 'regex-package.R'
NeedsCompilation: no
Packaged: 2017-04-25 16:46:25 UTC; richierocks
Repository: CRAN
Date/Publication: 2017-04-25 21:42:46 UTC

rebus: Regular Expression Builder, Um, Something

Description

Build regular expressions in a human readable way.

Details

Regular expressions are a very powerful tool, but the syntax is terse enough to be difficult to read. This makes bugs easy to introduce, and hard to find. This package contains functions to make building regular expressions easier.

Author(s)

Richard Cotton richierocks@gmail.com

See Also

regex and regexpr The 'stringr' and 'stringi' packages provide tools for matching regular expressions and nicely complement this package. http://www.regular-expressions.info has good advice on using regular expression in R. In particular, see http://www.regular-expressions.info/rlanguage.html and http://www.regular-expressions.info/examples.html https://www.debuggex.com is a visual regex debugging and testing site.

Examples

### Match a hex colour, like `"#99af01"`
# This reads *Match a hash, followed by six hexadecimal values.*
  
"#" %R% hex_digit(6)    

# To match only a hex colour and nothing else, you can add anchors to the 
# start and end of the expression.

START %R% "#" %R% hex_digit(6) %R% END

### Simple email address matching. 
# This reads *Match one or more letters, numbers, dots, underscores, percents, 
# plusses or hyphens. Then match an 'at' symbol. Then match one or more letters, 
# numbers, dots, or hyphens. Then match a dot. Then match two to four letters.*
  
one_or_more(char_class(ASCII_ALNUM %R% "._%+-")) %R%
  "@" %R%
  one_or_more(char_class(ASCII_ALNUM %R% ".-")) %R%
  DOT %R%
  ascii_alpha(2, 4)

### IP address matching. 
# First we need an expression to match numbers between 0 and 255.  Both the 
# following syntaxes read *Match two then five then a number between zero and 
# five.  Or match two then a number between zero and four then a digit. Or match 
# an optional zero or one followed by an optional digit folowed by a compulsory 
# digit.  Make this a single token, but don't capture it.*

# Using the %|% operator
ip_element <- group(
  "25" %R% char_range(0, 5) %|%
  "2" %R% char_range(0, 4) %R% ascii_digit() %|%
  optional(char_class("01")) %R% optional(ascii_digit()) %R% ascii_digit()
)

# The same again, this time using the or function
ip_element <- or(
  "25" %R% char_range(0, 5),
  "2" %R% char_range(0, 4) %R% ascii_digit(),
  optional(char_class("01")) %R% optional(ascii_digit()) %R% ascii_digit()
)

# It's easier to write using number_range, though it isn't quite as optimal 
# as handcrafted regexes.
number_range(0, 255, allow_leading_zeroes = TRUE)

# Now an IP address consists of 4 of these numbers separated by dots. This 
# reads *Match a word boundary. Then create a token from an `ip_element` 
# followed by a dot, and repeat it three times.  Then match another `ip_element`
# followed by a word boundary.*

BOUNDARY %R% 
  repeated(group(ip_element %R% DOT), 3) %R% 
  ip_element %R%
  BOUNDARY    

The start or end of a string

Description

See Anchors.


Backreferences

Description

See Backreferences.


Class Constants

Description

See CharacterClasses.


Character classes

Description

See ClassGroups.


Combine strings together

Description

See Concatenation.


Date-time regexes

Description

See DateTime.


ISO 8601 date-time classes

Description

See IsoClasses.


Force the case of replacement values

Description

See ReplacementCase.


Special characters

Description

See SpecialCharacters.


Unicode classes

Description

See Unicode.


Unicode General Categories

Description

See UnicodeGeneralCategory.


Unicode Operators

Description

See UnicodeOperators.


Unicode Properties

Description

See UnicodeProperty.


Word boundaries

Description

See WordBoundaries.


Convert or test for regex objects

Description

See as.regex.


Capture a token, or not

Description

See capture.


A range or char_class of characters

Description

See char_class.


Escape special characters

Description

See escape_special.


Make a regex exact

Description

See exactly.


Print or format regex objects

Description

See format.regex.


Get the days of the week or months of the year

Description

See get_weekdays.


Treat part of a regular expression literally

Description

See literal.


Lookaround

Description

See lookahead.


Apply mode modifiers

Description

See modify_mode.


Generate a regular expression for a number range

Description

See number_range.


Alternation

Description

See or.


Make the regular expression recursive.

Description

See recursive.


Create a regex

Description

See regex.


Repeat values

Description

See repeated.


Roman numerals

Description

See roman.


Match a whole word

Description

See whole_word.