obscenity
- BlacklistedTerm
- BoundaryAssertionNode
- CollapseDuplicatesTransformerOptions
- LiteralNode
- MatchPayload
- Matcher
- OptionalNode
- ParsedPattern
- PhraseContainer
- ProcessedCollapseDuplicatesTransformerOptions
- RegExpMatcherOptions
- WildcardNode
- CensorContext
- CharacterMapping
- EnglishProfaneWord
- MatchPayloadWithPhraseMetadata
- Node
- TextCensorStrategy
- englishDataset
- englishRecommendedBlacklistMatcherTransformers
- englishRecommendedTransformers
- englishRecommendedWhitelistMatcherTransformers
- assignIncrementingIds
- asteriskCensorStrategy
- collapseDuplicatesTransformer
- compareMatchByPositionAndId
- fixedCharCensorStrategy
- fixedPhraseCensorStrategy
- grawlixCensorStrategy
- keepEndCensorStrategy
- keepStartCensorStrategy
- parseRawPattern
- pattern
- randomCharFromSetCensorStrategy
- remapCharactersTransformer
- resolveConfusablesTransformer
- resolveLeetSpeakTransformer
- skipNonAlphabeticTransformer
- toAsciiLowerCaseTransformer
Ƭ CensorContext: MatchPayload
& { input
: string
; overlapsAtEnd
: boolean
; overlapsAtStart
: boolean
}
Context passed to [[TextCensorStrategy | text censoring strategies]].
Ƭ CharacterMapping: Map
<string
, string
> | Record
<string
, string
>
Maps characters to other characters. The key of the map/object should be the transformed character, while the value should be a set of characters that map to the transformed character.
src/transformer/remap-characters/index.ts:60
Ƭ EnglishProfaneWord: "abeed"
| "abo"
| "africoon"
| "anal"
| "anus"
| "arabush"
| "arse"
| "ass"
| "bastard"
| "bestiality"
| "bitch"
| "blowjob"
| "bollocks"
| "boob"
| "boonga"
| "buttplug"
| "chingchong"
| "chink"
| "cock"
| "cuck"
| "cum"
| "cunt"
| "deepthroat"
| "dick"
| "dildo"
| "doggystyle"
| "double penetration"
| "dyke"
| "ejaculate"
| "fag"
| "felch"
| "fellatio"
| "finger bang"
| "fisting"
| "fuck"
| "gangbang"
| "handjob"
| "hentai"
| "hooker"
| "incest"
| "jerk off"
| "jizz"
| "kike"
| "lubejob"
| "masturbate"
| "negro"
| "nigger"
| "orgasm"
| "orgy"
| "penis"
| "piss"
| "porn"
| "prick"
| "pussy"
| "rape"
| "retard"
| "scat"
| "semen"
| "sex"
| "shit"
| "slut"
| "spastic"
| "tit"
| "tranny"
| "turd"
| "twat"
| "vagina"
| "wank"
| "whore"
All the profane words that are included in the [[englishDataset | english dataset]] by default.
Ƭ MatchPayloadWithPhraseMetadata<MetadataType
>: MatchPayload
& { phraseMetadata?
: MetadataType
}
Extends the default match payload by adding phrase metadata.
Name |
---|
MetadataType |
Ƭ Node: LiteralNode
| OptionalNode
| WildcardNode
All the possible kinds of nodes.
Ƭ TextCensorStrategy: (ctx
: CensorContext
) => string
▸ (ctx
): string
A text censoring strategy, which receives a [[CensorContext]] and returns a replacement string.
Name | Type |
---|---|
ctx |
CensorContext |
string
• Const
englishDataset: DataSet
<{ originalWord
: EnglishProfaneWord
}>
A dataset of profane English words.
Example
const matcher = new RegExpMatcher({
...englishDataset.build(),
...englishRecommendedTransformers,
});
Example
// Extending the data-set by adding a new word and removing an existing one.
const myDataset = new DataSet()
.addAll(englishDataset)
.removePhrasesIf((phrase) => phrase.metadata.originalWord === 'vagina')
.addPhrase((phrase) => phrase.addPattern(pattern`|balls|`));
Copyright
The words are taken from the cuss project, with some modifications.
(The MIT License)
Copyright (c) 2016 Titus Wormer <[email protected]>
Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
'Software'), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:
The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
• Const
englishRecommendedBlacklistMatcherTransformers: (SimpleTransformerContainer
| StatefulTransformerContainer
)[]
A set of transformers to be used when matching blacklisted patterns with the [[englishDataset | english word dataset]].
• Const
englishRecommendedTransformers: Pick
<RegExpMatcherOptions
, "blacklistMatcherTransformers"
| "whitelistMatcherTransformers"
>
Recommended transformers to be used with the [[englishDataset | english word dataset]] and the [[RegExpMatcher]].
• Const
englishRecommendedWhitelistMatcherTransformers: (SimpleTransformerContainer
| StatefulTransformerContainer
)[]
A set of transformers to be used when matching whitelisted terms with the [[englishDataset | english word dataset]].
▸ assignIncrementingIds(patterns
): BlacklistedTerm
[]
Assigns incrementing IDs to the patterns provided, starting with 0. It is useful if you have a list of patterns to match against but don't care about identifying which pattern matched.
Example
const matcher = new RegExpMatcher({
...,
blacklistedTerms: assignIncrementingIds([
pattern`f?uck`,
pattern`|shit|`,
]),
});
Name | Type | Description |
---|---|---|
patterns |
ParsedPattern [] |
List of parsed patterns. |
A list of blacklisted terms with valid IDs which can then be passed to the [[RegExpMatcher]].
src/matcher/BlacklistedTerm.ts:37
▸ asteriskCensorStrategy(): TextCensorStrategy
A text censoring strategy that generates strings made up of asterisks (*
).
Example
const strategy = asteriskCensorStrategy();
const censor = new TextCensor().setStrategy(strategy);
// Before: 'fuck you'
// After: '**** you'
A [[TextCensorStrategy]] for use with the [[TextCensor]].
src/censor/BuiltinStrategies.ts:71
▸ collapseDuplicatesTransformer(options?
): StatefulTransformerContainer
Creates a transformer that collapses duplicate characters. This is useful for detecting variants of patterns in which a character is repeated to bypass detection.
As an example, the pattern hi
does not match hhiii
by default, as the
frequency of the characters does not match. With this transformer, hhiii
would become hi
, and would therefore match the pattern.
Application order
It is recommended that this transformer be applied after all other transformers. Using it before other transformers may have the effect of not catching duplicates of certain characters that were originally different but became the same after a series of transformations.
Warning
This transformer should be used with caution, as while it can make certain
patterns match text that wouldn't have been matched before, it can also go
the other way. For example, the pattern hello
clearly matches hello
, but
with this transformer, by default, hello
would become helo
which does
not match. In this cases, the customThresholds
option can be used to
allow two l
s in a row, making it leave hello
unchanged.
Example
// Collapse runs of the same character.
const transformer = collapseDuplicatesTransformer();
const matcher = new RegExpMatcher({ ..., blacklistMatcherTransformers: [transformer] });
Example
// Collapse runs of characters other than 'a'.
const transformer = collapseDuplicatesTransformer({ customThresholds: new Map([['a', Infinity]]) });
const matcher = new RegExpMatcher({ ..., blacklistMatcherTransformers: [transformer] });
Name | Type | Description |
---|---|---|
options |
CollapseDuplicatesTransformerOptions |
Options for the transformer. |
StatefulTransformerContainer
A container holding the transformer, which can then be passed to the [[RegExpMatcher]].
src/transformer/collapse-duplicates/index.ts:46
▸ compareMatchByPositionAndId(a
, b
): 0
| 1
| -1
Compares two match payloads.
If the first match payload's start index is less than the second's, -1
is
returned;
If the second match payload's start index is less than the first's, 1
is
returned;
If the first match payload's end index is less than the second's, -1
is
returned;
If the second match payload's end index is less than the first's, 1
is
returned;
If the first match payload's term ID is less than the second's, -1
is
returned;
If the first match payload's term ID is equal to the second's, 0
is
returned;
Otherwise, 1
is returned.
Name | Type | Description |
---|---|---|
a |
MatchPayload |
First match payload. |
b |
MatchPayload |
Second match payload. |
0
| 1
| -1
The result of the comparison: -1 if the first should sort lower than the second, 0 if they are the same, and 1 if the second should sort lower than the first.
src/matcher/MatchPayload.ts:57
▸ fixedCharCensorStrategy(char
): TextCensorStrategy
A text censoring strategy that generates replacement strings that are made up of the character given, repeated as many times as needed.
Example
const strategy = fixedCharCensorStrategy('*');
const censor = new TextCensor().setStrategy(strategy);
// Before: 'fuck you'
// After: '**** you'.
Name | Type | Description |
---|---|---|
char |
string |
String that represents the code point which should be used when generating the replacement string. Must be exactly one code point in length. |
A [[TextCensorStrategy]] for use with the [[TextCensor]].
src/censor/BuiltinStrategies.ts:134
▸ fixedPhraseCensorStrategy(phrase
): TextCensorStrategy
A text censoring strategy that returns a fixed string.
Example
// The replacement phrase '' effectively removes all matched regions
// from the string.
const strategy = fixedPhraseCensorStrategy('');
const censor = new TextCensor().setStrategy(strategy);
// Before: 'fuck you'
// After: ' you'
Example
const strategy = fixedPhraseCensorStrategy('fudge');
const censor = new TextCensor().setStrategy(strategy);
// Before: 'fuck you'
// After: 'fudge you'
Name | Type | Description |
---|---|---|
phrase |
string |
Replacement phrase to use. |
A [[TextCensorStrategy]] for use with the [[TextCensor]].
src/censor/BuiltinStrategies.ts:115
▸ grawlixCensorStrategy(): TextCensorStrategy
A text censoring strategy that generates
grawlix,
i.e. strings that contain the characters %
, @
, $
, &
, and *
.
Example
const strategy = grawlixCensorStrategy();
const censor = new TextCensor().setStrategy(strategy);
// Before: 'fuck you'
// After: '%@&* you'
A [[TextCensorStrategy]] for use with the [[TextCensor]].
src/censor/BuiltinStrategies.ts:89
▸ keepEndCensorStrategy(baseStrategy
): TextCensorStrategy
A text censoring strategy that extends another strategy, adding the last character matched at the end of the generated string.
Example
const strategy = keepEndCensorStrategy(asteriskCensorStrategy());
const censor = new TextCensor().setStrategy(strategy);
// Before: 'fuck you'
// After: '***k you'
Name | Type | Description |
---|---|---|
baseStrategy |
TextCensorStrategy |
Strategy to extend. It will be used to produce the start of the generated string. |
A [[TextCensorStrategy]] for use with the [[TextCensor]].
src/censor/BuiltinStrategies.ts:51
▸ keepStartCensorStrategy(baseStrategy
): TextCensorStrategy
A text censoring strategy that extends another strategy, adding the first character matched at the start of the generated string.
Example
const strategy = keepStartCensorStrategy(grawlixCensorStrategy());
const censor = new TextCensor().setStrategy(strategy);
// Before: 'fuck you'
// After: 'f$@* you'
Example
// Since keepEndCensorStrategy() returns another text censoring strategy, you can use it
// as the base strategy to pass to keepStartCensorStrategy().
const strategy = keepStartCensorStrategy(keepEndCensorStrategy(asteriskCensorStrategy()));
const censor = new TextCensor().setStrategy(strategy);
// Before: 'fuck you'
// After: 'f**k you'
Name | Type | Description |
---|---|---|
baseStrategy |
TextCensorStrategy |
Strategy to extend. It will be used to produce the end of the generated string. |
A [[TextCensorStrategy]] for use with the [[TextCensor]].
src/censor/BuiltinStrategies.ts:28
▸ parseRawPattern(pattern
): ParsedPattern
Parses a string as a pattern directly.
Note
It is recommended to use the [[pattern | pattern template tag]] instead of this function for literal patterns (i.e. ones without dynamic content).
Throws
[[ParserError]] if a syntactical error was detected while parsing the pattern.
Name | Type | Description |
---|---|---|
pattern |
string |
The string to parse. |
The parsed pattern, which can then be used with the [[RegExpMatcher]].
▸ pattern(strings
, ...expressions
): ParsedPattern
Parses a pattern, which matches a set of strings; see the Syntax
section
for details. This function is intended to be called as a template
tag.
Syntax
Generally speaking, in patterns, characters are interpreted literally. That
is, they match exactly what they are: a
matches an a
, b
matches a b
,
;
matches a ;
, and so on.
However, there are several constructs that have special meaning:
-
[expr]
matches either the empty string orexpr
(an optional expression).expr
may be a sequence of literal characters or a wildcard (see below). -
?
matches any character (a wildcard). -
A
|
at the start or end of the pattern asserts position at a word boundary (a word boundary assertion). If|
is at the start, it ensures that the match either starts at the start of the string or a non- word character preceding it; if it is at the end, it ensures that the match either ends at the end of the string or a non-word character follows it.A word character is an lower-case or upper-case ASCII alphabet character or an ASCII digit.
-
In a literal, a backslash may be used to escape one of the meta-characters mentioned above so that it does match literally:
\\[
matches[
, and does not mark the start of an optional expression.Note about escapes
As this function operates on raw strings, double-escaping backslashes is not necessary:
// Use this: const parsed = pattern`hello \[`; // Don't use this: const parsed = pattern`hello \\[`;
Examples
-
baz
matchesbaz
exactly. -
b\[ar
matchesb[ar
exactly. -
d?ude
matchesd
, then any character, thenude
. All of the following strings are matched by this pattern:dyude
d;ude
d!ude
-
h[?]ello
matches eitherh
, any character, thenello
or the literal stringhello
. The set of strings it matches is equal to the union of the set of strings that the two patternshello
andh?ello
match. All of the following strings are matched by this pattern:hello
h!ello
h;ello
-
|foobar|
asserts position at a word boundary, matches the literal stringfoobar
, and asserts position at a word boundary:foobar
matches, as the start and end of string count as word boundaries;yofoobar
does not match, asf
is immediately preceded by a word character;hello foobar bye
matches, asf
is immediately preceded by a non-word character, andr
is immediately followed by a non-word character.
Grammar
Pattern ::= '['? Atom* ']'?
Atom ::= Literal | Wildcard | Optional
Optional ::= '[' Literal | Wildcard ']'
Literal ::= (NON_SPECIAL | '\' SUPPORTS_ESCAPING)+
NON_SPECIAL ::= _any character other than '\', '?', '[', ']', or '|'_
SUPPORTS_ESCAPING ::= '\' | '[' | ']' | '?' | '|'
Example
const parsed = pattern`hello?`; // match "hello", then any character
Example
const parsed = pattern`w[o]rld`; // match "wrld" or "world"
Example
const parsed = pattern`my initials are \[??\]`; // match "my initials are [", then any two characters, then a "]"
Throws
[[ParserError]] if a syntactical error was detected while parsing the pattern.
See
[[parseRawPattern]] if you want to parse a string into a pattern without using a template tag.
Name | Type |
---|---|
strings |
TemplateStringsArray |
...expressions |
unknown [] |
The parsed pattern, which can then be used with the [[RegExpMatcher]].
▸ randomCharFromSetCensorStrategy(charset
): TextCensorStrategy
A text censoring strategy that generates replacement strings made up of random characters from the set of characters provided.
Example
const strategy = randomCharFromSetCensorStrategy('$#!');
const censor = new TextCensor().setStrategy(strategy);
// Before: 'fuck you!'
// After: '!##$ you!'
Name | Type | Description |
---|---|---|
charset |
string |
Set of characters from which the replacement string should be constructed. Must not be empty. |
A [[TextCensorStrategy]] for use with the [[TextCensor]].
src/censor/BuiltinStrategies.ts:155
▸ remapCharactersTransformer(mapping
): SimpleTransformerContainer
Maps certain characters to other characters, leaving other characters unchanged.
Application order
It is recommended that this transformer be applied near the start of the transformer chain.
Example
// Transform 'a' to 'b'.
const transformer = remapCharactersTransformer({ 'b': 'a' });
const matcher = new RegExpMatcher({ ..., blacklistMatcherTransformers: [transformer] });
Example
// Transform '🅱️' to 'b', and use a map instead of an object as the argument.
const transformer = remapCharactersTransformer(new Map([['b', '🅱️']]));
const matcher = new RegExpMatcher({ ..., blacklistMatcherTransformers: [transformer] });
Example
// Transform '🇴' and '0' to 'o'.
const transformer = remapCharactersTransformer({ o: '🇴0' });
const matcher = new RegExpMatcher({ ..., blacklistMatcherTransformers: [transformer] });
See
- [[resolveConfusablesTransformer| Transformer that handles confusable Unicode characters]]
- [[resolveLeetSpeakTransformer | Transformer that handles leet-speak]]
Name | Type | Description |
---|---|---|
mapping |
CharacterMapping |
A map/object mapping certain characters to others. |
SimpleTransformerContainer
A container holding the transformer, which can then be passed to the [[RegExpMatcher]].
src/transformer/remap-characters/index.ts:38
▸ resolveConfusablesTransformer(): SimpleTransformerContainer
Creates a transformer that maps confusable Unicode characters to their
normalized equivalent. For example, ⓵
, ➊
, and ⑴
become 1
when using
this transformer.
Application order
It is recommended that this transformer be applied near the start of the transformer chain.
Example
const transformer = resolveConfusablesTransformer();
const matcher = new RegExpMatcher({ ..., blacklistMatcherTransformers: [transformer] });
SimpleTransformerContainer
A container holding the transformer, which can then be passed to the [[RegExpMatcher]].
src/transformer/resolve-confusables/index.ts:22
▸ resolveLeetSpeakTransformer(): SimpleTransformerContainer
Creates a transformer that maps leet-speak characters to their normalized
equivalent. For example, $
becomes s
when using this transformer.
Application order
It is recommended that this transformer be applied near the start of the transformer chain, but after similar transformers that map characters to other characters, such as the [[resolveConfusablesTransformer | transformer that resolves confusable Unicode characters]].
Example
const transformer = resolveLeetSpeakTransformer();
const matcher = new RegExpMatcher({ ..., blacklistMatcherTransformers: [transformer] });
SimpleTransformerContainer
A container holding the transformer, which can then be passed to the [[RegExpMatcher]].
src/transformer/resolve-leetspeak/index.ts:23
▸ skipNonAlphabeticTransformer(): SimpleTransformerContainer
Creates a transformer that skips non-alphabetic characters (a
-z
,
A
-Z
). This is useful when matching text on patterns that are solely
comprised of alphabetic characters (the pattern hello
does not match
h.e.l.l.o
by default, but does with this transformer).
Warning
This transformation is not part of the default set of transformations, as there are some known rough edges with false negatives; see #23 and #46 on the GitHub issue tracker.
Application order
It is recommended that this transformer be applied near the end of the transformer chain, if at all.
Example
const transformer = skipNonAlphabeticTransformer();
const matcher = new RegExpMatcher({ ..., blacklistMatcherTransformers: [transformer] });
SimpleTransformerContainer
A container holding the transformer, which can then be passed to the [[RegExpMatcher]].
src/transformer/skip-non-alphabetic/index.ts:31
▸ toAsciiLowerCaseTransformer(): SimpleTransformerContainer
Creates a transformer that changes all ASCII alphabet characters to lower-case, leaving other characters unchanged.
Application order
It is recommended that this transformer be applied near the end of the transformer chain. Using it before other transformers may have the effect of making its changes useless as transformers applied after produce characters of varying cases.
SimpleTransformerContainer
A container holding the transformer, which can then be passed to the [[RegExpMatcher]].