Regular Expression

From BC$ MobileTV Wiki
Jump to: navigation, search

A Regular Expression is an arrangement of characters that, when combined together (under some specific language), indicates a pattern of matching of a representative subset of that language, where the representative subset is defined by the particular arrrangement of characters.

Regular Expressions are used for a wide array of functions, but the most common uses are as:

- Filters
- Information Retrieval patterns
- Automated location of a pattern in a text or document
- Validation of adherence to a pattern
- etc...


Commonly matched patterns

People's Names

^[a-zA-Z\.\s\-]{1,35}[a-zA-Z\.\s\-]{1,35}$

[1]

Allow French Characters in Names

^[a-zA-ZàâèêéëîïôùûüçÀÂÈÊÉËÎÏÔÙÛÜÇ\.\s\-]{1,35}[a-zA-ZàâèêéëîïôùûüçÀÂÈÊÉËÎÏÔÙÛÜÇ\.\s\-]{1,35}$

[2]

Username

Typically more restrictive to adhere to strict data security and/or data storage standards, not allowing just any character (often only limited list of special characters).

^[a-zA-Z][a-zA-Z0-9-_\.]{1,20}$


Location

Typically, you're better off allowing a user to select from a "known valid list" where possible, or for instance, select their location on a map and geocode (or better yet allow "Geolocation detection" then reverse geocode to a place name; however there are some approaches that can be taken for validation of inputs if needed.

Cities/Towns

Sub-Regions

Sub-Regions include any naming or categorization system a geo-politically sub-divided region may use such as States, Provinces, Prefectures, Counties, etc...

Countries

Communications

Postal Code

US Zip Codes:

(\d{5}([\-]\d{4})?) 

Canadian Postal Codes:

[A-Za-z][0-9][A-Za-z] [0-9][A-Za-z][0-9]

UK Postal Code:

[A-Za-z]{1,2}[0-9Rr][0-9A-Za-z]? [0-9][ABD-HJLNP-UW-Zabd-hjlnp-uw-z]{2}

Spanish Postal Code:

((0[1-9]|5[0-2])|[1-4][0-9])[0-9]{3}

Japanese Postal Codes:

\d{3}-\d{4}

[3]

Phone Numbers

International Phone Numbers:

[\+]\d{2}[\(]\d{2}[\)]\d{4}[\-]\d{4}

US & Canada phone numbers (omitting the known and common between the two nations "+1" country code):

\d{3}[\-]\d{3}[\-]\d{4}

[4] [5] [6] [7]

Mexico phone numbers:

^52-([1-9]\d{1}-\d{4}-d{4}|[1-9]\d{2}-\d{3}-\d{4})$

[8]

UK phone numbers:

^\s*\(?(020[7,8]{1}\)?[ ]?[1-9]{1}[0-9{2}[ ]?[0-9]{4})|(0[1-8]{1}[0-9]{3}\)?[ ]?[1-9]{1}[0-9]{2}[ ]?[0-9]{3})\s*$

France phone numbers:


Japan phone numbers:

^\d{2}(?:-\d{4}-\d{4}|\d{8}|\d-\d{3,4}-\d{4})$

[9] [10]


Email Addresses

The Email address standard in its current state, makes for an imperfect validation. Keep in mind that it is somewhat impossible to write a perfect validation that will catch all legal combinations. However, there are some patterns that can be used to validate the vast majority of normal allowed and expected email addresses:

((\w*)@([a-zA-Z0-9]*\.)*([a-zA-Z0-9]*))

[11]

More thorough, but missing French characters and some other legal special characters:

^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$

Very thorough with French character support, but still lacking support for very long yet legal domains:

^[_A-Za-zàâèêéëîïôùûüçÀÂÈÊÉËÎÏÔÙÛÜÇ0-9-+]+(.[_A-Za-zàâèêéëîïôùûüçÀÂÈÊÉËÎÏÔÙÛÜÇ0-9-]+)*@[A-Za-zàâèêéëîïôùûüçÀÂÈÊÉËÎÏÔÙÛÜÇ0-9-]+(.[A-Za-zàâèêéëîïôùûüçÀÂÈÊÉËÎÏÔÙÛÜÇ0-9]+)*(.[A-Za-zàâèêéëîïôùûüçÀÂÈÊÉËÎÏÔÙÛÜÇ]{2,})$

[12] [13]

IP Addresses

URL

The following RegEx does a great job of matching most URL patterns:

^(?:http(s)?:\/\/)?[\w.-]+(?:\.[\w\.-]+)+[\w\-\._~:/?#[\]@!\$&'\(\)\*\+,;=.]+$

In Notepad++, it can be effectively used to switch the order of the matched URL group, compared to other data on the same line, using:

^(http(s)?:\/\/[\w.-]+(?:\.[\w\.-]+)+[\w\-\._~:/?#[\]@!\$&'\(\)\*\+,;=.]+ )(.+)

For instance, to reformat "LINK other stuff" into "other stuff: LINK" use the following "Replace" field:

\3: \1  

[14]


Tools


Resources

Patterns


Tutorials

[18]


External Links


References

  1. What is a reasonable length limit on person “Name” fields?: https://stackoverflow.com/questions/30485/what-is-a-reasonable-length-limit-on-person-name-fields
  2. Understanding (the rules for) Diacritical Marks in French: https://www.thoughtco.com/understanding-french-accents-1369540
  3. HTML5 input -- pattern attribute - Postal/Zip Code regex examples: http://html5pattern.com/Postal_Codes
  4. US Area Codes By State: https://www.worldatlas.com/na/us/area-codes.html
  5. US States - all area codes: https://www.50states.com/areacodes/
  6. Canada area codes: https://www.areacodehelp.com/canada/canada_area_codes.shtml
  7. How to call Canada from the USA (area code guide): https://www.howtocallabroad.com/canada/
  8. wikipedia: Area codes in Mexico by code
  9. Regular Expression for Japan phone number: https://stackoverflow.com/questions/40801779/regular-expression-for-japan-phone-number
  10. HTML5 input -- pattern attribute - Phone regex examples: http://html5pattern.com/Phones
  11. Basic Email RegEx (mkyong): https://www.regextester.com/96927
  12. RegEx Tester -- Email example: https://www.regextester.com/19
  13. How to Find or Validate an Email Address (an explanation on the challenges of trying to find an email validation that covers all cases): https://www.regular-expressions.info/email.html
  14. URL Validation Regex: https://www.regextester.com/94502
  15. Rubular - Ruby regular expression editor: http://www.rubular.com/r/0TP2Dcx71H
  16. Try RegEx: http://tryregex.com/ (JavaScript interactive RegEx teaching tool)
  17. RegEx in JS "test vs match" performance test/comparison: https://jsperf.com/regexp-test-vs-match-m5
  18. Read user input from command-line with Java Scanner class: https://javabeat.net/java-scanner-class-examples/

See Also

Validation | JavaScript