Skip to content

Generate random string from a given regular expression, strictly abide by the js RegExp.

License

Notifications You must be signed in to change notification settings

suchjs/reregexp

Repository files navigation

reregexp

npm version  Build Status Coverage Status

Generate a matched string with a given regular expression, it's useful if you want to mock some strings from a regexp rule. It strictly abide by the standard javascript regex rule, but you still need pay attentions with the Special cases.

Goals

  • Support named capture group, e.g. (?<named>\w)\k<named>, and also allowing to override it by expose a config field namedGroupConf.

  • Support unicode property class \p{Lu} by setting the static UPCFactory handle, see the example for more details.

  • Support u flag, so you can use unicode ranges.

  • Allow you get the capture group values.

Installation

# npm
npm install --save reregexp
# or yarn
yarn add reregexp

Usage

// Commonjs module
const ReRegExp = require('reregexp').default;

// ESM module
// since v1.6.1
import ReRegExp from 'reregexp';

// before v1.6.1
import re from 'reregexp';
const ReRegExp = re.default;

// For the first parameter of the constructor
// You can use a regex literal or a RegExp string
// if you need use some features that are not well supported by all browsers
// such as a named group, you should always choose a RegExp string

// Example 1:  use group reference
const r1 = new ReRegExp(/([a-z0-9]{3})_\1/);
r1.build(); // => 'a2z_a2z' '13d_13d'

// Example 2:  use named group
const r2 = new ReRegExp(/(?<named>\w{1,2})_\1_\k<named>/);
r2.build(); // => 'b5_b5_b5' '9_9_9'

// Example 3: use named group and with `namedGroupConf` config
// it will use the string in the config insteadof the string that will generated by the named group
// of course, it will trigger an error if the string in config not match the rule of named group.
const r3 = new ReRegExp('/(a)\\1(?<named>b)\\k<named>(?<override>\\w+)/', {
  namedGroupConf: {
    override: ['cc', 'dd'],
  },
});
r3.build(); // => "aabbcc" "aabbdd"

// Example 4: use a character set
const r4 = new ReRegExp(/[^\w\W]+/);
r4.build(); // will throw error, because the [^\w\W] will match nothing.

// Example 5: also a character set with negative operator
const r5 = new ReRegExp(/[^a-zA-Z0-9_\W]/);
r5.build(); // will throw error, this is the same as [^\w\W]

// Example 6: with the `i` flag, ignore the case.
const r6 = new ReRegExp(/[a-z]{3}/i);
r6.build(); // => 'bZD' 'Poe'

// Example 7: with the `u` flag, e.g. make some chinese characters.
const r7 = new ReRegExp('/[\\u{4e00}-\\u{9fcc}]{5,10}/u');
r7.build(); // => '偤豄酌菵呑', '孜垟与醽奚衜踆猠'

// Example 8: set a global `maxRepeat` when use quantifier such as '*' and '+'.
ReRegExp.maxRepeat = 10;
const r8 = new ReRegExp(/a*/);
r8.build(); // => 'aaaaaaa', 'a' will repeated at most 10 times.

// Example 9: use a `maxRepeat` in constructor config, it will override `maxRepeat` of the global.
const r9 = new ReRegExp(/a*/, {
  maxRepeat: 20,
});
r9.build(); // => 'aaaaaaaaaaaaaa', 'a' will repeated at most 20 times

// Example 10: use a `extractSetAverage` config for character sets.
const r10 = new ReRegExp(/[\Wa-z]/, {
  // \W will extract as all the characters match \W, a-z now doesn't have the same chance as \W
  extractSetAverage: true,
});

// Example 11: use a `capture` config if cared about the capture data
const r11 = new ReRegExp(/(aa?)b(?<named>\w)/), {
  capture: true, // if you cared about the group capture data, set the `capture` config true
});
r11.build(); // => 'abc'
console.log(r11.$1); // => 'a'
console.log(r11.$2); // => 'c'
console.log(r11.groups); // => {named: 'c'}

// Example 12: use the unicode property class by setting the `UPCFactory`
ReRegExp.UPCFactory = (data: UPCData) => {
  /*
  UPCData: {
    negate: boolean; // if the symbol is 'P'
    short: boolean; // take '\pL' as a short for '\p{Letter}'
    key?: string; // if has a unicode property name, such as `Script`
    value: string; // unicode property value, binary or non-binary
  }
  */
  return {
    generate(){
      return 'x'; // return an object that has a `generate` method.
    }
  }
};
const r12 = new ReRegExp('/\\p{Lu}/u');
console.log(r12.build()); // => 'x', should handle in the `UPCFactory` method.

Config

// The meaning of the config fields can seen in the examples.
{
 maxRepeat?: number;
 namedGroupConf?: {
   [index: string]: string[]|boolean;
 };
 extractSetAverage?: boolean;
 capture?: boolean;
}

Supported flags

  • i ignore case, /[a-z]/i is same as /[a-zA-Z]/

  • u unicode flag

  • s dot all flag

the flags g m y will ignore.

Methods

.build()

build a string that match the regexp.

.info()

get a regexp parsed queues, flags, lastRule after remove named captures.

{
  rule: '',
  context: '',
  flags: [],
  lastRule: '',
  queues: [],
}

Build precautions,do not use any regexp anchors.

  1. ^ $ the start,end anchors will be ignored.

  2. (?=) (?!) (?<=) (?<!) the regexp lookhead,lookbehind will throw an error when run build().

  3. \b \B will be ignored.

Special cases

  1. /\1(o)/ the capture group \1 will match null, the build() will just output o, and /^\1(o)$/.test('o') === true

  2. /(o)\1\2/ the capture group \2 will treated as code point of unicode. so the build() will output oo\u0002. /^(o)\1\2$/.test('oo\u0002') === true

  3. /(o\1)/ the capture group \1 will match null, build() will output o, /^(o\1)$/.test('o') === true

  4. /[]/ empty character class, the build() method will throw an error, because no character will match it.

  5. /[^]/ negative empty character class, the build() method will output any character.

  6. /[^\w\W]/ for the negative charsets, if all the characters are eliminated, the build() will throw an error. the same such as /[^a-zA-Z0-9_\W]//[^\s\S]/...

Questions & Bugs?

Welcome to report to us with issue if you meet any question or bug.

License

MIT License.