This Domain Name is Greek to Me: An Introduction to Internationalized Domain Names for Investigators

Article Posted: July 06, 2010

An Analogy…
Consider a telephone directory for a city such as New York. The peoples’ names and addresses are linked to their respective telephone numbers. Up to now, the listings are all in Latin characters.1 The publishers decide to allow people to have entries in their preferred languages. Now, Ms. Ivana Ivanova has a listing in Russian Cyrillic, Mr. Achmed Husseyn has a listing in Arabic script, and so on. While the underlying telephone system hasn’t changed much, the way people can look up people and their telephone numbers has greatly changed. Some people will find it easier looking up people and businesses sharing the same language. Other people will be confused because they cannot decipher the new entries

Internationalized Domain Names (IDNs)
Something like the above analogy has been happening to the Internet. In recent years, the Internet Corporation for Assigned Names and Numbers (ICANN) has been establishing a system for “Internationalized Domain Names” (IDNs).2 Unicode allows the IDNs to use a wide variety of character sets from the world’s languages.

For several years, non-Latin domain names could be registered under Latin character top level domain names (TLDs) such as .com or .net. This May, four countries—Egypt, Saudi Arabia, the United Arab Emirates, and the Russia Federation—started registering domains under country code TLDs (ccTLDs) in their native scripts.3 So, now it is possible to have domain names that, other than the dot (“.”), are totally in Arabic or Cyrillic characters. This is only the beginning. More countries will follow suit.

This will make the Internet accessible to billions of people whose native languages do not use Latin characters and who are not readily able to switch between their native script and Latin characters on their computer systems. They’ll be able to access local Internet resources and services in their local language.

Many other people, including some investigators, might never run into an IDN. But for those encountering an IDN for the first time, the IDNs can present several challenges. The good news is that not all that much has changed at the “bits and bytes” and IP address level. But at other levels, there are some big changes with challenges. We’ll look at these challenges and ways of dealing with them.

First Challenge: Recognizing non-Latin IDNs as Domain Names
We’ve tended to associate domain names with recognizable Latin character TLDs like .com or ccTLDs like .uk or .ru. Some IDNs will have a “traditional” TLD or ccTLD. So, москва.com, despite its Cyrillic text, can be recognized as a possible domain name (or, perhaps, a reference to a DOS/Win executable file). But the new non-Latin ccTLD IDNs, such as سجل.مصر or правительство.рф, aren’t so recognizable as domain names.

Related Topics: Network Forensics