Regular expressions is a tool for searching, replacing and checking text. They are a special language that allows you to set templates and search or check the text corresponding to these templates.
Regular expressions consist of a special set of characters that denote certain patterns. Here are the main elements of regular expressions:
. for any character, \d for a digit.* for zero or more times, + for one or more times.[A-Za-z] for any letter.A list of some commonly used special characters in regular expressions:
\d — corresponds to any digit from 0 to 9.\w — matches any Latin character, digit, or underscore (in ASCII).\s — matches any space character.\b — indicates the word boundary.^ — corresponds to the beginning of the string.$ — matches the end of the line.. — matches any character except the line feed.* — repeats the preceding character 0 or more times.+ — repeats the preceding character 1 or more times.? — the preceding character occurs 0 or 1 times.{n} is the exact number of repetitions of the preceding character, where n is a number.Character escaping is a method in which special characters in regular expressions (for example, ., *, +) they stop working as commands and become ordinary symbols. In order for the program to understand them as ordinary signs, and not as instructions, they are preceded by a backslash (\).
Example of escaping:
If you need to find a point in the text, instead of using just ., you should escape the character with a backslash \.. For example, the regular expression \. it will search for a point, not any character.
If the phone number must be entered with +1, use a regular expression that checks for this prefix and then a sequence of digits. An example of a suitable regular expression and code to check it:
^\+1\d{10}$
Where:
^ is the beginning of the line.\+1 — indicates that the number should start with +1. The + character is special and means one or more of the previous character. To use it literally, as part of the phone code +1, you need to escape it: \+. In the expression ^+1\d{10}$, the symbol + will be perceived as special, so the work will not work correctly.\d{10} — means that after +1 there should be exactly 10 digits.$ is the end of the line.For more complex requirements for the phone number format, for example, to take into account the country code or various separators, the regular expression template can be complicated. Example for checking a phone number with a country code:
^\+?\d{1,3}[-.\s]?\(?\d{1,4}?\)?\d{1,4}\d{1,9}$
Where:
^\+? — the beginning of the number with the "+" symbol.\d{1,3} — country code of 1-3 digits.[-.\s]? — separator: dash, dot, space.\(?\d{1,4}?\)? is the region code in parentheses of 1-4 digits.\d{1,4} — 1-4 digits.\d{1,9} is the main part of the 1-9 digit number.To check the name, use a regular expression that allows only letters and some special characters. For example, a hyphen or an apostrophe. An example of a simple regular expression for name verification:
^[A-Za-zA-Ya-Yaee]+([-'][A-Za-zA-Ya-Yaee]+)*$
Where:
^ is the beginning of the line.[A-Za-zA-Ya-Yaee]+ — indicates that the name must begin with one or more letters (Latin or Cyrillic, including the letter "e").([-'][A-Za-Za-Ya-Yaee]+)* — indicates that the name may contain a hyphen or an apostrophe followed by one or more letters, and this may be repeated several times.$ is the end of the line.Checking the validity of the text may include checking the correctness of the format, content and length of the text. Below are some common text validation scenarios and examples of regular expressions for each of them.
To make sure that the text corresponds to a certain format, for example, it contains only letters, spaces and punctuation marks, use:
^[A-Za-Za-Ya-Yaee\s]+$
Where:
^ is the beginning of the line.[A-Za-Za-Ya-Yaee\s]+ — one or more characters including letters and spaces.$ is the end of the line.To check text from 1 to 100 characters long, use the following template:
^.{1,100}$
Where:
^ is the beginning of the line..{5,100} is the number from 1 to 100 times. Specify your range.$ is the end of the line.To make sure that the text contains certain words or phrases, use:
\b is good\b
Where:
\b is the word boundary.\b is the word boundary.To verify that the URL format is correct, use:
^(https?:\/\/)?([\w\-]+\.)+[\w\-]+(\/[\w\-\.]*)*\/?$
Where:
^ is the beginning of the line.(https?:\/\/)? — http or https, followed by ": //".([\ w\-]+\.)+ is one or more words/hyphens followed by a period.[\w\-]+ — one or more words / hyphens.(\/[\w\-\.]*)* — zero or more times the "/" character, words, hyphens or dots.\/? is the "/" character at the end of the line.$ is the end of the line.