Skip to content

Latest commit

Β 

History

History
415 lines (268 loc) Β· 8.71 KB

3.8 Regular Expressions In JS.md

File metadata and controls

415 lines (268 loc) Β· 8.71 KB

3.8 Regular Expressions

Table of Contents

  1. Patterns & Flags
  2. Methods Of RegExp And String
  3. Classes and Special Symbols
  4. Sets & Ranges [...]
  5. Quantificators +, *, ?, {n}
  6. The Good, The Bad, The Quantificator

Patterns & Flags

Regular Expressions β€” search & replace tool within a string.

In JS these are realised with their own object RegExp and are integrated in the string methods.

Regexps

Regexp consists of pattern and flags

Syntax:

var regexp = new RegExp("pattern","flags");

Or a shorter version:

var regexp = /pattern/; // with no flags
var regexp = /pattern/gmi; //with gmi flags

/ tells JS it is regexp, they are common to strings ""

Usage

Pattern can be modified with special symbols in order to enhance the searching process. However, the easiest version would be just a substring search without any pattern:

var str = "Yura is the capital of Great Britain";

var regexp = /Yu/;
alert(str.search(regexp)); //2

Flags

In order to advance the search, one could use flags.

JS has only three of them:

  • i β€” capitalization isn't appropriate; consider A and a the same
  • g β€” Global. Searches all coincidences, without it β€” just the first
  • m β€” multi-line

i Example:

var str = "Yura is the capital of Great Britain";

alert(str.search(/yu/)); //-1 means nothing found
alert(str.search(/yu/i)); //2 x pos

Methods Of RegExp And String

К соТалСнию, общая структура встроСнных ΠΌΠ΅Ρ‚ΠΎΠ΄ΠΎΠ² слСгка Π·Π°ΠΏΡƒΡ‚Π°Π½Π°, поэтому ΠΌΡ‹ сначала рассмотрим ΠΈΡ… ΠΏΠΎ ΠΎΡ‚Π΄Π΅Π»ΡŒΠ½ΠΎΡΡ‚ΠΈ, Π° Π·Π°Ρ‚Π΅ΠΌ – Ρ€Π΅Ρ†Π΅ΠΏΡ‚Ρ‹ ΠΏΠΎ Ρ€Π΅ΡˆΠ΅Π½ΠΈΡŽ стандартных Π·Π°Π΄Π°Ρ‡ с Π½ΠΈΠΌΠΈ.

str.search(reg)

Return the x pos of the first coincidence or -1 in nothing found

var str = "Yura is the capital of Great Britain";

var regexp = /Yu/;
alert(str.search(regexp)); //2

search cannot find more than the first coincidence

str.match(reg) without g

Without g, looks for the first match.

Returns array {result, index, input} or null:

var str = "It's not more than university";

var result = str.match(/o/);
alert(result[0]); // o
alert(result.index); // 7 >> x pos
alert(result.input); // "It's not more than university"

If in brackets gives second match:

var str = "It's not more than university";

var result = str.match(/o(t)/i);
alert(result[0]); // ot
alert(result[1]); // t
alert(result.index); // 7 >> x pos
alert(result.input); // "It's not more than university"

str.match(reg) with g

Returns just an array with all matches or null

var str = "Ho-ho-hi-di-ho";

var result = str.match(/ho/ig);
alert(result); // {Ho, ho, ho}

str.split(reg|substr, limit)

We could split strings, using regexps:

Usual way:

alert('12-34-56'.split('-')); // [12, 34, 56]

Fancy way, same result:

alert('12-34-56'.split(/-/)); // [12, 34, 56]

str.replace(reg, str|func)

Ultimate tool. The simplest way to use:

alert('12-34-56'.replace("-", ":")); // 12:34-56

//Or with a regexp and flag:
alert( '12-34-56'.replace( /-/g, ":" ));  // 12:34:56

Special symbols:

  • $$ β€” inputs $
  • $& β€” inputs all match
  • $` - inputs all before match
  • $' β€” all after
  • $n β€” n as a number for a () system
// $*n*
var str = "Regina Rabotaet";
alert(str.replace(/(Regina) (Rabotaet)/ '$2 $1 ?')); // Rabotaet Regina ?

// $'
var str = "Peter The Great";
alert(str.replace(/Peter/i, "Keksik $'!")); //Keksik The Great!

If you want to show off, you can use functions as an argument:

var str = "Ho-ho-HO";
alert(str.replace(/ho/ig, function() {
  return i++;
})); // 1-2-3

That functions grabs several arguments:

  • str β€” coincidence
  • p1,p2... β€” body of brackets ()
  • offset β€” x position
  • s β€” original string
function replacer(str, offset, s) {
  alert( "Found: " + str + " position: " + offset + " in line: " + s );
  return str.toUpperCase();
}

var result = "Om-nom-nom".replace(/om/gi, replacer);
alert( 'Result: ' + result ); // Result: OM-NOM-NOM

regexp.text(str)

Returns true/false on a single match in string. Same as str.search?

/L/.test('Life'); // true

regexp.exec(str)

Without g works just as str.match, with g it appends regexp.lastIndex having the x pos of the last match.

Classes and Special Symbols

Let's take +7 (800) 555-35-35 and try to exract digits

There are classes, which specialise on certain types of data, for example, class \d stands for 'any digit'.

var str = '+7 (800) 555-35-35';
var reg = /\d/g;

alert (str.match(reg)); //7,8,0,0,5,5,5,3,5,3,5

Classes: \d \s \w

  • \d β€” digit, 0-9
  • \s β€” space, including tabs, breaks
  • \w β€” words, letters of english alphabet, digits, and _

For example \d\s\w stands for digit, next β€” space, next β€” letters. CSS\d would find a line 'CSS' with any digit folowing.

Word edges β€” \b class

It would find a breaks between words, so /\bYur\b would find match in Man, Yur a mad!, but will fail in case of Man, Yura s mad!;

alert( "Man, Yur a mad!".match(/\bYur\b/) ); // Ivan
alert( "Man, Yura s mad!".match(/\bYur\b/) ); // null

Reverse classes

For each class there must be an anti-class

  • \D for non-digit
  • \S for non-space
  • \W for non-word
  • \B for non-break

So another easy solution to extract all symbols, from telephone num would be:

var str = '+7 (800) 555-35-35';

alert (str.replace(/\D/g, "")); //78005553535

Spaces are symbols too

Consider that

alert("1 - 5".match(/\d-\d/)); //null, spaces are symbols too

alert("1 - 5".match(/\d - \d/)); // 0 x  pos

alert("1-5".match(/\d - \d/)); // null

Point β€” any symbol

var reg = /Re.ct/;

alert("Re ct".match(reg));
alert("React".match(reg));
alert("Re4ct".match(reg));
//all true
alert("Rect".match(reg)); //false

Escape Characters

Some more features: [ \ ^ $ . | ? * + ( ).

If we want to find any of these as a symbol we need to escape it with \

alert( "0.0".match(/\d\.\d/) ); // 0.0

Sets & Ranges [...]

If several characters are included into [], that tells JS to search any of given

Set

For example, [eao] means 'any of e,a or o'.

alert("A Van with Ian".match(/[iv]an/gi)); // "van", "Ian"

Meaning only one of characters from a set must be in regexp.

Ranges

  • [a-z] all alphabet, not capitalized
  • [0-5] any digit from 0 to 5
alert( "Exception 0xAF".match(/x[0-9A-F][0-9A-F]/g) ); // xAF

Symbolic classes are the same as ranges:

  • \d == [0-9]
  • \w == [a-zA-Z0-9_]
  • \s == [\t\n\v\f\r ]
var str = "Radius, whose life I have spared?";

alert(str.match(/\w+/g)); // 'Radius', 'whose', 'life' ,'I' ,'have', 'spared'

\w+ meaning we look for 1+ symbols

For Russian we want to use [Π°-яё], for both [\wΠ°-яё].

Reverse Ranges

For every range there must be... you guess it

It can be done with [^..]

alert("[email protected]".match(/[^a-z\d\s]/gi)); // "@", "."

You don't need to escape characters within []

Except some nuances. Here's what you can:

  • .
  • + if not in the end
  • ()
  • - if isn't in range
  • ^ if not in the beginning
  • [

Quantificators +, *, ?, {n}

Let's take that mystical phone number again +7 (800) 555-35-35, but now we want to extract numbers: 7,800,555,35,35.

/d would work for digits, yet for numbers...

Amount {n}

Who would do /\d\d\d\d\d/, when you can /\d{5}/

Also work with ranges /\d\d ... How to write from 3 to 5?, oh yeah: /\d{3,5}/

And /\d\d\d+/ >> /\d{3,}/

To find a numbers from the phone number you can do:

'+7 (800) 555-35-35'.match(/\d{1,}/);

Other shorties

  • + β€” means 1+
  • ? β€” means may be this? >> /ou?r/ makes u unnecessary >> will find or in color and our in colour
  • * == {0, }
alert( "100 10 1".match(/\d0*/g) ); // 100, 10, 1
alert( "100 10 1".match(/\d0+/g) ); // 100, 10

Examples of component regexp

Decimal number

alert( "0 1 12.345 7890".match(/\d+\.\d+/g) ); // 12.345

Opening HTML-tag

alert( "<h1>Still Reading!</h1>".match(/<[a-z][a-z0-9]*>/gi) ); // <h1>

Opening or closing HTML-tag

Escaping ?, so JS wouldn't consider it the end of the form

alert( "<h1>No?</h1>".match(/<\/?[a-z][a-z0-9]*>/gi) ); // <h1>, </h1>

Shorties need to be escaped

Coming soon