Parsing phone numbers is a nightmare
Table of Contents
My expectations for parsing phone numbers:
- Turn
+[COUNTRY] [AREA] [PREFIX][SUFFIX]
into a regex. - Done.
Reality:
- The “country code” is not a country code.
- For example: The US, Canada and Jamaica all share the +1 code, along with other 24 nations and territories (and that’s just one of the many examples)
- There are exclusive calling codes for satellite telephones, international Toll-free numbers and other things that aren’t exactly “countries”
- Antarctica research stations shares the country code with the sponsor country (when there’s telecommunication available)
- A country can have more than one calling codes. In fact, it can have even four.
- In China, Macau (+853) and Hong Kong (+852) have their own calling codes, distinct from the mainland one (+86)
- Kosovan phone numbers may start with +381 (Serbia), +386 (Slovenia), +377 (Monaco) or +383 (Kosovo), depending on where/when the number was registered
- Not exactly parsing, but formatting a number for domestic dial is very country-dependent
- In Argentina, a domestic mobile phone must be transformed
by removing the country code, replacing the a
9
prefix with a0
and adding the number15
to the middle before dialing (seriously). Example:+54 9 2982 123456
➡️02982 15 123456
. - In Brazil, you must add the prefix
0
and a carrier code when making a call for a number in different area within the country. This rule, however, does not apply to “special” numbers, such as toll-free (0800
), flat-rate (0300
) or paid (0500
,0900
) numbers, that are the same nation-wide
- In Argentina, a domestic mobile phone must be transformed
by removing the country code, replacing the a
- They can contain more than just
numbers
and the+
sign- In Israel, some advertising numbers start with an *
- In New Zealand, some emergency numbers also starts with an *
- If you’re parsing numbers from plates or ads, you may also have to deal with:
- In Egypt, it’s common to write phone numbers in the local numeral system
- Don’t forget the advertising “phoneword” numbers, such as +1-800-Flowers
And those are just some examples of annoyances you may encounter when dealing with phone numbers. So what this leaves us with?
libphonenumber
to the rescue! #
In fact, international phone numbers are so chaotic that Google maintains an opensource library, libphonenumber
,
dedicated to this problem. To illustrate how chaotic it is, I counted more than 200 releases of the library since
version 3.0 (2011) – currently at major is 8.x – in the
Maven repository. The latest release
was published just about 3 weeks ago, as the time of writing. So yeah, I think it’s not just a one-line Regex.
The source library is implemented in C++, Java and Javascript, but there are community ports for other languages too (ex: Python, Ruby, Go, C#).
But if after this reading you are still willing to parse it yourself (or just curious), you will want to read the
Falsehoods Programmers Believe About Phone Numbers
– that I used as base for this article –, from the same authors of libphonenumber
. It’s a
comprehensive document on phone numbers (lack of) standards, for your delight and despair.