Parsing phone numbers is a nightmare
Table of Contents
My expectations for parsing phone numbers:
+[COUNTRY] [AREA] [PREFIX][SUFFIX]into a regex.
- The “country code” is not a country code.
- For example: The US, Canada and Jamaica all share the +1 code, along with other 24 nations and territories (and that’s just one of the many examples)
- There are exclusive calling codes for satellite telephones, international Toll-free numbers and other things that aren’t exactly “countries”
- Antarctica research stations shares the country code with the sponsor country (when there’s telecommunication available)
- A country can have more than one calling codes. In fact, it can have even four.
- In China, Macau (+853) and Hong Kong (+852) have their own calling codes, distinct from the mainland one (+86)
- Kosovan phone numbers may start with +381 (Serbia), +386 (Slovenia), +377 (Monaco) or +383 (Kosovo), depending on where/when the number was registered
- Not exactly parsing, but formatting a number for domestic dial is very country-dependent
- In Argentina, a domestic mobile phone must be transformed
by removing the country code, replacing the a
9prefix with a
0and adding the number
15to the middle before dialing (seriously). Example:
+54 9 2982 123456➡️
02982 15 123456.
- In Brazil, you must add the prefix
0and a carrier code when making a call for a number in different area within the country. This rule, however, does not apply to “special” numbers, such as toll-free (
0800), flat-rate (
0300) or paid (
0900) numbers, that are the same nation-wide
- In Argentina, a domestic mobile phone must be transformed by removing the country code, replacing the a
- They can contain more than just
- In Israel, some advertising numbers start with an *
- In New Zealand, some emergency numbers also starts with an *
- If you’re parsing numbers from plates or ads, you may also have to deal with:
- In Egypt, it’s common to write phone numbers in the local numeral system
- Don’t forget the advertising “phoneword” numbers, such as +1-800-Flowers
And those are just some examples of annoyances you may encounter when dealing with phone numbers. So what this leaves us with?
libphonenumber to the rescue! #
In fact, international phone numbers are so chaotic that Google maintains an opensource library,
dedicated to this problem. To illustrate how chaotic it is, I counted more than 200 releases of the library since
version 3.0 (2011) – currently at major is 8.x – in the
Maven repository. The latest release
was published just about 3 weeks ago, as the time of writing. So yeah, I think it’s not just a one-line Regex.
But if after this reading you are still willing to parse it yourself (or just curious), you will want to read the
Falsehoods Programmers Believe About Phone Numbers
– that I used as base for this article –, from the same authors of
libphonenumber . It’s a
comprehensive document on phone numbers (lack of) standards, for your delight and despair.