Skip to content

Commit

Permalink
feat: add unicode property class support
Browse files Browse the repository at this point in the history
  • Loading branch information
ganmin committed May 19, 2021
1 parent 3e287d1 commit 7300149
Show file tree
Hide file tree
Showing 5 changed files with 63 additions and 43 deletions.
8 changes: 7 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,12 @@

The changelog of the reregexp library.

## [1.5.0] - 2021-05-19

### Added

- Support unicode property class syntax, e.g. `\p{Letter}`, more details have shown in README.

## [1.4.0] - 2021-05-16

### Added
Expand All @@ -13,4 +19,4 @@ The changelog of the reregexp library.
- Optimize some regexp rules of the parser.
- Change the default export library name from 'RegexpParser' to `ReRegExp` in browser.
- Make the readme more clearly.
- Upgrade the typescript and other tools dependencies versions.
- Upgrade the typescript and other tools dependencies versions.
50 changes: 35 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,14 @@ Generate a matched string with a given regular expression, it's useful if you wa

## Goals

- Support named capture group, and set a configed string to override which will generated by the group.
- Support named capture group, e.g. `(?<named>\w)\k<named>`, and also allowing to override it by expose a config field `namedGroupConf`.

- Support unicode property class `\p{Lu}` by setting the static `UPCFactory` handle, see the example for more details.

- Support `u` flag, so you can use unicode ranges.

- Allow you get the capture group values.

## Installation

```bash
Expand All @@ -26,11 +29,11 @@ yarn add reregexp
```javascript
const ReRegExp = require('reregexp').default;

// You can use either a regex literal
// You can use either a regex literal
// or a RegExp string
// if you need use regex rules the browser not supported yet
// such as named group, you need choose a RegExp string
// for the first constructor parameter.
// such as named group, you need choose a RegExp string
// for the first constructor parameter.

// Example 1: use group reference
const r1 = new ReRegExp(/([a-z0-9]{3})_\1/);
Expand All @@ -40,9 +43,9 @@ r1.build(); // => 'a2z_a2z' '13d_13d'
const r2 = new ReRegExp(/(?<named>\w{1,2})_\1_\k<named>/);
r2.build(); // => 'b5_b5_b5' '9_9_9'

// Example 3: use named group and with `namedGroupConf` config
// Example 3: use named group and with `namedGroupConf` config
// it will use the string in the config insteadof the string that will generated by the named group
// of course, it will trigger an error if the string in config not match the rule of named group.
// of course, it will trigger an error if the string in config not match the rule of named group.
const r3 = new ReRegExp('/(a)\\1(?<named>b)\\k<named>(?<override>\\w+)/', {
namedGroupConf: {
override: ['cc', 'dd'],
Expand All @@ -64,7 +67,7 @@ r6.build(); // => 'bZD' 'Poe'

// Example 7: with the `u` flag, e.g. make some chinese characters.
const r7 = new ReRegExp('/[\\u{4e00}-\\u{9fcc}]{5,10}/u');
r7.build(); // => '偤豄酌菵呑', '孜垟与醽奚衜踆猠'
r7.build(); // => '偤豄酌菵呑', '孜垟与醽奚衜踆猠'

// Example 8: set a global `maxRepeat` when use quantifier such as '*' and '+'.
ReRegExp.maxRepeat = 10;
Expand All @@ -74,23 +77,42 @@ r8.build(); // => 'aaaaaaa', 'a' will repeated at most 10 times.
// Example 9: use a `maxRepeat` in constructor config, it will override `maxRepeat` of the global.
const r9 = new ReRegExp(/a*/, {
maxRepeat: 20,
});
});
r9.build(); // => 'aaaaaaaaaaaaaa', 'a' will repeated at most 20 times

// Examples 10: use a `extractSetAverage` config for character sets.
// Example 10: use a `extractSetAverage` config for character sets.
const r10 = new ReRegExp(/[\Wa-z]/, {
// \W will extract as all the characters match \W, a-z now doesn't have the same chance as \W
extractSetAverage: true,
// \W will extract as all the characters match \W, a-z now doesn't have the same chance as \W
extractSetAverage: true,
});

// Examples 11: use a `capture` config if cared about the capture data
// Example 11: use a `capture` config if cared about the capture data
const r11 = new ReRegExp(/(aa?)b(?<named>\w)/), {
capture: true, // if you cared about the group capture data, set the `capture` config true
});
r11.build(); // => 'abc'
console.log(r11.$1); // => 'a'
console.log(r11.$2); // => 'c'
console.log(r11.groups); // => {named: 'c'}

// Example 12: use the unicode property class by setting the `UPCFactory`
ReRegExp.UPCFactory = (data: UPCData) => {
/*
UPCData: {
negate: boolean; // if the symbol is 'P'
short: boolean; // take '\pL' as a short for '\p{Letter}'
key?: string; // if has a unicode property name, such as `Script`
value: string; // unicode property value, binary or non-binary
}
*/
return {
generate(){
return 'x'; // return an object that has a `generate` method.
}
}
};
const r12 = new ReRegExp('/\\p{Lu}/u');
console.log(r12.build()); // => 'x', should handle in the `UPCFactory` method.
```

## Config
Expand Down Expand Up @@ -140,7 +162,6 @@ get a regexp parsed queues, flags, lastRule after remove named captures.
## Build precautions,do not use any regexp anchors.

1. `^` `$` the start,end anchors will be ignored.

2. `(?=)` `(?!)` `(?<=)` `(?<!)` the regexp lookhead,lookbehind will throw an error when run `build()`.

3. `\b` `\B` will be ignored.
Expand All @@ -159,10 +180,9 @@ get a regexp parsed queues, flags, lastRule after remove named captures.

6. `/[^\w\W]/` for the negative charsets, if all the characters are eliminated, the `build()` will throw an error. the same such as `/[^a-zA-Z0-9_\W]/``/[^\s\S]/`...


## Questions & Bugs?

Welcome to report to us with [issue](https://github.com/suchjs/reregexp/issues) if you meet any question or bug.
Welcome to report to us with [issue](https://github.com/suchjs/reregexp/issues) if you meet any question or bug.

## License

Expand Down
12 changes: 4 additions & 8 deletions __tests__/index.test.ts
Original file line number Diff line number Diff line change
@@ -1,8 +1,4 @@
import ReRegExp, {
ParserConf,
CharsetHelper,
UnicodeCategoryData,
} from '../src/index';
import ReRegExp, { ParserConf, CharsetHelper, UPCData } from '../src/index';
type Rule = RegExp | string;
const validParser = (rule: Rule) => {
return () => {
Expand Down Expand Up @@ -477,8 +473,8 @@ describe('Test regexp parser', () => {
}).toThrowError();
// set the factory
expect(() => {
ReRegExp.unicodeCategoryFactory = function (data: UnicodeCategoryData) {
if (data.reverse) {
ReRegExp.UPCFactory = function (data: UPCData) {
if (data.negate) {
return {
generate() {
return '_';
Expand Down Expand Up @@ -511,7 +507,7 @@ describe('Test regexp parser', () => {
const r5 = new ReRegExp('/\\P{Letter}{2}/u');
expect(r5.build()).toEqual('__');
// delete the factory
delete ReRegExp.unicodeCategoryFactory;
delete ReRegExp.UPCFactory;
}).not.toThrowError();
});
// test last info
Expand Down
4 changes: 2 additions & 2 deletions package.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"name": "reregexp",
"version": "1.4.0",
"description": "Generate a random string match a given regular expression, useful for mocking strings.",
"version": "1.5.0",
"description": "Generate a random string match a given regular expression, suitable for mocking strings.",
"main": "./lib/index.js",
"typings": "./lib/index.d.ts",
"author": "[email protected]",
Expand Down
32 changes: 15 additions & 17 deletions src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -83,18 +83,16 @@ export type Result = Pick<
queues: RegexpPart[];
};

export type UnicodeCategoryData = {
reverse: boolean;
export type UPCData = {
negate: boolean;
short: boolean;
key?: string;
value?: string;
value: string;
};

export type UnicodeCategoryFactory = (
data: UnicodeCategoryData,
) => UnicodeCategoryInstance | never;
export type UPCFactory = (data: UPCData) => UPCInstance | never;

export interface UnicodeCategoryInstance {
export interface UPCInstance {
generate(): string;
}

Expand Down Expand Up @@ -306,7 +304,7 @@ export default class ReRegExp {
// static maxRepeat config
public static maxRepeat = 5;
// static handle for unicode categories
public static unicodeCategoryFactory?: UnicodeCategoryFactory;
public static UPCFactory?: UPCFactory;
// regexp input, without flags
public readonly context: string = '';
// flags
Expand Down Expand Up @@ -553,9 +551,9 @@ export default class ReRegExp {
// unicode categories/script/block
if (hasFlagU) {
// must have `u` flag
if (typeof ReRegExp.unicodeCategoryFactory !== 'function') {
if (typeof ReRegExp.UPCFactory !== 'function') {
throw new Error(
`You must set the ReRegExp.unicodeCategoryFactory before you use the unicode category.`,
`You must set the ReRegExp.UPCFactory before you use the unicode category.`,
);
}
target = new RegexpUnicodeCategory(next);
Expand Down Expand Up @@ -1637,9 +1635,9 @@ export class RegexpASCII extends RegexpHexCode {

export class RegexpUnicodeCategory extends RegexpPart {
public type = 'unicode-category';
protected data: UnicodeCategoryData;
protected data: UPCData;
protected rule = /^([A-Z]|\{(?:(?:([a-zA-Z_]+)=)?([A-Za-z_]+))})/;
protected generator: UnicodeCategoryInstance;
protected generator: UPCInstance;
// constructor
public constructor(private readonly symbol: string) {
super();
Expand All @@ -1649,12 +1647,12 @@ export class RegexpUnicodeCategory extends RegexpPart {
if (this.rule.test(context)) {
const { $1: all, $2: key, $3: value } = RegExp;
const { symbol } = this;
const reverse = symbol === 'P';
let data: UnicodeCategoryData;
const negate = symbol === 'P';
let data: UPCData;
if (value) {
data = {
short: false,
reverse,
negate,
value,
};
if (key) {
Expand All @@ -1663,12 +1661,12 @@ export class RegexpUnicodeCategory extends RegexpPart {
} else {
data = {
short: true,
reverse,
negate,
value: all,
};
}
this.data = data;
const factory = ReRegExp.unicodeCategoryFactory;
const factory = ReRegExp.UPCFactory;
this.generator = factory(data);
this.input = `\\${symbol}${all}`;
return all.length;
Expand Down

0 comments on commit 7300149

Please sign in to comment.