August 12th, 2021 The Domain Name System or DNS is a never-ending source of amusement and amazement. If you have been dealing with just about anything related to operations on the internet, you know that it's always the DNS in the end, what with its almost 100 different resource records and, uhm, shall we say, "interesting" security threat model. But today, let's talk about Top-Level
Domains, or TLDs. You know,
Ok, so far, so good. With RFC920, we got the initial set of top level domains:
Oh, and:
|
.arpa |
Temporary; The current ARPA-Internet hosts. |
That's right: .arpa
was supposed
to be temporary:
"After a short period of initial experimentation, all current ARPA-Internet hosts will select some domain other than ARPA for their future use. The use of ARPA as a top level domain will eventually cease." -- RFC920
Yeah, well, we all know how temporary
temporary solutions are. And so today, we continue
to use .arpa
for e.g., reverse mapping of IP
addresses to names via the .in-addr.arpa
and
.ip6.arpa
second-level domains. But
.arpa
is used for a lot more:
as112.arpa
(RFC7535,
effectively RFC1918
reverse resolution; see also https://www.as112.net/),
e164.arpa
(RFC6116
/ NAPTR records),
home.arpa
(RFC8375,
non-unique use in residential home network),
in-addr-servers.arpa
and
ip6-servers.arpa
(RFC5855,
name servers for the in-addr.arpa
and
ip6.arpa
domains), ipv4only.arpa
(RFC7050,
detecting DNS64 and IPv6 Prefixes),
iris.arpa
(RFC4698,
for locating Internet Registry Information Services),
as well as uri.arpa
and urn.arpa
(RFC3405
for resolving Uniform Resource Identifiers / NAPTR).
Note: the arpa zone is served from all root servers except the J Root, which, per RFC2870, should not "provide secondary service for any zones other than the root and root-servers.net zones". Noted on dns-operations@dns-oarc.net.
ccTLDs
In addition to these original TLDs, we also got the country code top-level domains, or ccTLDs:
The English two letter code (alpha-2) identifying a country according the the ISO Standard for "Codes for the Representation of Names of Countries". -- RFC920
And this is where the fun begins, because of course you are always operating on Layer 9, and this list is necessarily somewhat fluid, as countries change, are born, divided, or cease to exist:
.ss
, the ccTLD for
South Sudan was allocated in August 2011, but not
added to the root zone until February 2019, with
general availability of names in that domain
only starting in September 2020..ge
, the
ccTLD for Georgia (the country, not the US state) uses
the ISO-3166-1
Alpha 2 country code that previously used to
represent the Gilbert and Ellice Islands; the Ellice
Islands became Tuvalu, which, per country code
designation, got the rather valuable .tv
ccTLD..tv
domains; Tuvalu used the money for the
marketing rights to allow them to pay the membership
dues for United Nations when they joined the UN in
2000!.eh
has been reserved (but not been
assigned yet) as the ccTLD for the disputed "Western
Sahara" territory; in 2013, on April 1st, CIRA, the Canadian
Internet Registration Authority (responsible for
.ca
), announced that it would offer
.eh
names, because, you know, Canadians would
like that, eh?.ps
is the ccTLD
for Palestine (not a sponsored TLD for the
(Turing-complete!) PostScript
programming language), recognized by 138
of the 193 UN members; .tw
is
assigned for the Republic of China, aka Taiwan, which
a mere 14 countries recognize..hk
as a
special administrative region of China (much like
.mo
for
Macao); .uk
represents
the entire United Kingdom, while the e.g., scarcely used
.gb
is assigned to Great Britain; England
doesn't even get a
ccTLD, and neither does Northern Ireland!.eu
counts as a
ccTLD, representing, obviously, not a single country.
But due to Brexit, British citizens who had
registered .eu
domains had their domains
suspended on January 1st, 2021, requiring proof of
European Economic Area (EEA) citizenship to
avoid them being deleted in March 2021..cs
(Czechoslovakia) became
.cz
(Czech
Republic) and .sk
(Slovakia);
.dd
(East Germany) disappeared after the
reunification of Germany; .yu
(Yugoslavia)
became .si
(Slovenia), .hr
(Croatia),
and Serbia and Montenegro, which had had .cs
assigned (but never used that, instead continuing to
use .yu
) before they split into
.rs
(Serbia) and .me
(Montenegro); .zr
(Zaire) became .cd
(Democratic
Republic of the Congo)..su
, the ccTLD for
the former Soviet Union, assigned a mere 15 months
before that Union was dissolved back in 1990, still
remains in active use.With ccTLDs having appealing two-letter names (gTLDs are a minimum of three characters), they lend themselves to so-called "domain hacks" to create words, to shorten URLs, or as a convenient way to jump on a popular trend, and many people began registering names in other countries' ccTLDs:
.ag
,
the ccTLD for Antigua and Barbuda is often used in
German speaking countries, where "AG" is an
abbreviation of "Aktiengesellschaft" (a private
limited / joint stock company), and use of
.ag
names for other entities may even
carry legal risks..ai
, the
ccTLD for Anguilla, is used for extra leet effect in
artificial intelligence marketing. .ai
also
is notable in that as a TLD it nevertheless has both
an A
and MX
record, meaning you could
have a functional email address like hal@ai
.
(Email addresses are difficult to
validate, it turns out.).am
(Armenia) is used by e.g., instagr.am.at
(Austria) is used for things like e.g., donteat.at.be
is used
by e.g., Google to shorten youtu.be links..by
(Belarus) is frequently used for sites relating to the
German state of Bavaria (Bayern).cm
(Cameroon) and .co
(Colombia) are
frequently used by typo-squatters to catch traffic
from people fat-fingering ".com"..cx
was assigned to the Christmas Island,
and appears currently to be defunct, but it did have
the significant glory of once having been the home of
goatse.cx
(Wikipedia)..im
(Isle
of Man) is used for various instant messaging domain
hacks..io
,
assigned to the British Indian Ocean Territory is
almost exclusively used by annoying startups for
content completely unrelated to the islands..la
(Laos) is
commonly used for Louisianna or Los Angeles related
domains as well as random domain hacks, like e.g.,
Mozilla's link shortener mzl.la or Tesla's ts.la.me
(Montenegro, which up until 2007 had been using
cg.yu
) became one of the most popular TLDs
and is used for link shorteners like Facebook's fb.me, Google's g.me or GoDaddy's go.me.me.me
for its "Yahoo!
Meme microblogging site"; after it shut that
service, it returned the domain to the registry, and
it's now, what else, a Meme
search engine..ms
(Montserrat) is, of course, used by Microsoft, sites
in the US state of Mississippi, and by e.g., the New
York Times for its nyti.ms link
shortener..py
), Rust nerds
in Serbia's .rs
..vi
exists
(U.S. Virgin Islands), but .emacs
does not
(emacs.vi
, however, does).Now one noteworthy aspect here is that since the
ccTLDs are administered by the given country, they may
be subject to (and enforce) different requirements.
Some domains can only be registered by entities
residing within the given country, others, like the
.cat
domain
sponsored by the dotCAT
foundation to promote the Catalan language, may
stipulate the language or content of the domains.
Lybia, with the ever so popular .ly
ccTLD did in
2010 shut down Violet
Blue's vb.ly
domain, objecting
to the content. In a similar manner, Colombia
could choose to break just about all of Twitter (which
uses the t.co
domain name to wrap every
single link on its platform); Greenland could shut down
Google's goo.gl
links.
In addition to the original TLDs and the
ccTLDs, in the late 1980s InterNIC added
.nato
, but that was later replaced by
.nato.int
, with the new .int
TLD
being added in 1988 for intergovernmental
organizations.
In 2000, ICANN, who had by
then taken over the administration of domain names,
added seven more TLDs: .aero
,
.biz
,
.coop
,
.info
,
.museum
,
.name
,
and .pro
.
It then began soliciting proposals for "sponsored
top-level domains" (sTLDs), but only received a
handful of proposals, ultimately adding .asia
, .cat
, .jobs
, .mobi
, .post
,
.tel
,
.travel
,
and .xxx
.
Sponsored TLDs being somewhat restricted in scope
and use, ICANN then went for another round of
accepting proposals for new, generic TLDs
(gTLDs), this time with a price tag of $185,000 per
TLD. In 2012, it processed 1,930 applications: 101
from Google (under the name Charleston Road
Registry Inc. (see
also), including .lol
,
.google
, .dog
, and .foo
(.lol
was ultimately registered by Uniregistry, now owned by GoDaddy), 76 from Amazon, 11 from Microsoft and 307
from the "Donuts" domain name
registry.
The list of ultimately approved domains included a
number of geographic TLDs (geoTLDs), adding
domains for certain cities (e.g., .berlin
, .london
,
.nyc
,
.paris
,
or .tokyo
),
countries that previously did not have a ccTLD (e.g.,
.cymru
,
.scot
, and
.wales
,
although England still doesn't get its own
TLD, while e.g., New Zealand (.nz
) now got a
second: .kiwi
), and
broader geographic regions (e.g., .africa
or
.lat
).
But of course people went a bit nuts, too: many
brands applied for .<brand>
and got
into various arguments over who should own the given
TLD. For example, Amazon applied for
(and was given) .amazon
over the objection
of several nations of, well, the Amazon; and
multiple applications for entirely generic terms had
to be sorted
out.
One of those was the .secure
domain,
which had been proposed by one Alex
Stamos of (then) Artemis Internet as a TLD that
would enforce certain
minimum security requirements; ultimately,
.secure
was assigned to Amazon.
Eventually, ICANN
added 1239 new TLDs to the DNS, bestowing upon us
such important TLDs as e.g., .beer
,
.cloud
, .dot
, .duck
,
.foo
, .google
, .rocks
and
.sucks
, .travelersinsurance
, and
.yahoo
.
But of course some TLDs then go under again: .wed
, for
example, was delegated, but the company that had
applied for this name apparently didn't pay up, and
ICANN terminated the registry agreement. However, the
TLD remains in the root; it appears to now be operated
by ICANN
EBERO and some names remain in use (e.g., get.wed, albeit
with an invalid certificate).
Finally, the perhaps most generic TLD, .gdn (Global Domain Name) was added in 2014.
Even before the landrush for the new gTLDs, ICANN approved the introduction of internationalized domain name (IDN) TLDs, and many ccTLDs added TLDs using their respective languages and alphabets (including right-to-left!), represented within the DNS using Punycode.
DNS name | IDN ccTLD | Country/Region | Language | Other ccTLD |
xn--lgbbat1ad8j | .الجزائر | Algeria | Arabic | .dz |
xn--fiqs8s | .中国 | China | Chinese (Simplified) | .cn |
xn--qxa6a | .ευ | European Union | Greek | .eu |
xn--4dbrk0ce | .ישראל | Israel | Hebrew | .il |
xn--o3cw4h | .ไทย | Thailand | Thai | .th |
(See Wikipedia's full table for all IDN ccTLDs.)
But IDNs are not only for ccTLDs: many of the new
gTLDs also include various Unicode characters, such as
e.g., .сайт
("website"),
.大众汽车
("volkswagen"),
.ファッション
("fashion"),
ابوظبي.
("Abu Dhabi"), and, of course,
.vermögensberatung
("wealth management /
advice").
Note that with IDNs, you can mix an IDN second-level with a non-IDN top-level or vice versa. Due to the resulting IDN Homograph Attack vector, browsers stopped rendering the IDNs and now always display them as Punycode.
In addition to all that, there is also a small
number of so-called "special use domains", of which
.arpa
(already discussed
above) is just one. These are:
.example
-- intended for use in
documentation, tutorials, and testing; defined,
together with example.com
,
example.net
, and example.org
in RFC6761..invalid
and .test
-- for testing and
documentation, originally defined in RFC2606..local
-- usually used for
zero-configuration networking (RFC6762)..localhost
-- reserved since
traditionally .localhost
existed in e.g.,
/etc/hosts
for the loopback address (RFC2606).
Note: .localdomain
is not reserved,
and use of localhost.localdomain
can lead to
unexpected results if your stub resolver expands
this..onion
-- used by Tor (.onion
service address) and defined in RFC7686.
Note: this "TLD" is not entered into the DNS,
but following work by Jim McCoy and Alec
Muffett leading to CA/B
Forum Ballot 144, you can get a valid
x509 certificate from public CAs. (For a while, Tor
also used to use the .exit
pseudo TLD; this is no longer supported.).bitnet
,
.csnet
,
.oz
(from ACSnet, now moved into .oz.au
), .uucp
(if you remember that), and .i2p
(the aptly named "Invisible Internet Project")..kp
, the ccTLD assigned for North Korea
serves the North Korea internal-only Kwangmyong
network..chn
domain internally for
its Internet of Things. This domain relies on the
use of an alternate DNS root as well, and is
not found in the common root.The DNS is an inherently public system (modulo alternate root shenenigans or split-horizon games). The root zone itself continues to be available for download via FTP or HTTPS and so we can easily extract the full count of all TLDs:
$ curl https://www.internic.net/domain/root.zone | awk '{if ($4 == "NS") { print $1;}}' | sort -u | wc -l 1499
Processing the simple zone file, we find that most
TLDs are two- (248) or three- (222) letter TLDs;
that there are 154 IDN TLDs; that there are TLDs
starting with every letter of the alphabet ('s'
being the most popular one); that the longest TLD is
vermögensberatung
(24
characters in punycode:
xn--vermgensberatung-pwb
).
But what about all the individual TLD zone files? Since that data is also public in nature, we should be able to get and process it as well. And for the ICANN assigned new gTLDs, this is indeed the case: ICANN offers the Centralized Zone Data Service, where you can apply to gain access to all gTLD zone files. For some domains the access is granted almost instantly, for others it takes a few days.
Now for the ccTLDs, however, there unfortunately is
no equivalent service, although there's a
(rather short) list of ccTLD zone sources here as well as here;
some registries let you AXFR
the domain
(e.g., .ee
,
.ch
and .li
,
.se
and .nu
), some
provide a list of names (e.g., sk
or .gov
),
but otherwise it's up to you to contact the registry
in question and plead your case. Yes, for each of the
over 300 domains -- good luck! (I've collected what I
found out about each here.)
Given how difficult it is to get to all the public data, it's then no surprise that several businesses are making good money by selling you that access or by providing TLD reports.
After having requested access to all gTLD zone
files and having received most of them (several are
still pending), I looked around a bit, seeking
entertaining stats. One thing to note is that a
large number of zones (230) do not have any names
defined (other than, say, a NIC NS
record) --
TLDs registered purely as a brand or placeholder, I
suspect. Over 360 zones have fewer than 10 records,
over 470 fewer than 100.
Zones that are actually used include the expected variety of silly names, including very long domain names:
accountantaccountantaccountantaccountantaccountantaccountant.accountant artartartartartartartartartartartartartartartartartartartartart.art yoyoyodogillbestraightwithyouicanttellifthatsatattoooranartisti.art barbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbar.bar clickclickclickclickclickclickclickclickclickclickclickclick.click ahndung-von-verkehrsordnungswidrigkeiten-mit-unfallfolge.cologne. 0-------------------------------------------------------------0.com. thelongestdomainnameintheworldliterallynobodycangetalongeronexd.community you-know-you-are-pretty-gosh-darned-cute-do-you-wanna-go-on-a.date. lololololololololololololololololololololololololololololololol.fun. gayfriendlyconvenientaffordabletrendyhairsalonsindowntowntoront.mobi wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww.org partypartypartypartypartypartypartypartypartypartypartyparty.party runrunrunrunrunrunrunrunrunrunrunrunrunrunrunrunrunrunrunrunrun.run thehighestthemostvaluableandthemostexpensivedomainnameofalltime.top this-crazy-url-is-definitely-one-of-the-longest-adresses-in-the.world. rindfleischetikettierungsuberwachungsaufgabenubertragungsgesetz.xyz xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.xyz. zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz.zone
...and so on and so on. Per RFC1035, the maximum size of a DNS label is 63 octets (note: octets, not characters, which is why the maximmum length of a domain name is 253 characters), which explains why there are no longer second-level domains, although it doesn't explain why people insist on registering over 1700 such names.
Of the 987 zones I looked at, the top ten zones based on number of domains were:
Rank | TLD | # of domains |
1 | .com |
155,883,253 |
2 | .net |
13,291,304 |
3 | .org |
10,424,321 |
4 | .info |
3,859,083 |
5 | .xyz |
3,128,897 |
6 | .online |
1,811,807 |
7 | .top |
1,200,953 |
8 | .site |
1,067,408 |
9 | .shop |
907,239 |
10 | .app |
722,140 |
(Note that not all TLDs are treated the same across the internet. Despite being rather popular, the .xyz domain appears to have a poor score in many automated domain reputation systems, which may lead to all sorts of unexpected problems for your business.)
For my own entertainment, I wrote a shabby little perl script to run over a zone file and produce some additional numbers:
$ gzcat net.txt.gz | perl -T zonestats.pl Total number of records: 34658946 Total number of names: 13291304 Total number of different record types: 7 ns: 32819414 rrsig: 759035 ds: 414744 nsec3: 379518 a: 270671 aaaa: 15543 soa: 1 Top ten name lengths: 9: 2839977 10: 2836099 8: 2783467 11: 2648213 7: 2496883 12: 2404541 6: 2205730 13: 2159827 14: 1886160 15: 1585087 Longest name: 000000000000000000000000000000000000000000000000000000000000001.net. (63) There are 134 names with 63 chars in this domain. Total number of unique name servers: 689703 The three most popular name servers found in this zone are: dns1.registrar-servers.com.: 298617 dns2.registrar-servers.com.: 298352 jm2.dns.com.: 239693 The most popular domains in which the nameservers are: domaincontrol.com: 6200836 googledomains.com: 1485364 dns.com: 908420 This domain contains names including the following dirty words: shit: 8732 fuck: 8057 tits: 2351 piss: 844 cunt: 575 motherfucker: 86 cocksucker: 16 $
The "seven
dirty words" domains are of course full of
mismatches, but it looks like most zones contain
more or less the same percentage of dirty domain
names: somewhere between 0.006% and 0.008% of the
total; .xxx
predictably ranks a bit higher
here, but not all that much at only 0.1% of all
names.
Now all of the above is good fun, but why would you want to know whether a given string is a TLD? Wouldn't it be trivially the right-most label of the fully-qualified domain name (FQDN)?
Strictly speaking: yes. However, consider that many TLDs are not generic in nature, meaning people cannot simply register any name under the given TLD. ccTLDs, being managed by individual registries, each may have unique requirements and regulations, and it is a common practice for these registries to enforce a second-level domain hierarchy, replicating or mirroring to some degree the top-level hierarchy.
For example, and perhaps most widely known, the
.uk
TLD uses .ac.uk
(for academic
institutions), .co.uk
(for commercial
entities), .gov.uk
, .net.uk
,
.org.uk
, and so on. How many such
second-level domains are reserved depends on each TLD;
Brazil (.br
), for example, has over
100.
Now within the context of, for example, HTTP
cookies or x509 TLS certificates, it's rather
important that an entity cannot use a wildcard to
match an entire TLD, but how does a browser know
whether foo.example
is a reserved
second-level domain, or simply a normal domain
registered by some entity? Should a website be able to
set a cookie for foo.example
? Should it be
able to get a certificate for
*.foo.example
? There is no programmatic way
to determine this.
To solve this problem, the good folks over at Mozilla started putting together a list of these TLDs and "effective TLDs", known as the Public Suffix List. That's right, it's another one of those manually compiled and maintained text files we like to build the internet infrastructure on!
This lists consists of over 9,000 prefixes, and is used by all of the popular browsers to restrict cookie scope as well as for various UI features.
Google uses similar
heuristics based on a domain name's TLD to
determine whether to offer users different language
versions of their content and other geo-targeting.
Within that context, Google treats some ccTLDs (such
as e.g., .io
, .me
, .tv
etc.) as if they were gTLDs rather than as indicators
of geographic location.
Finally, the HSTS Preload list baked into browsers like Chrome and Firefox to enforce HTTP Strict Transport Security includes a number of TLDs and public prefixes:
$ curl -O https://publicsuffix.org/list/public_suffix_list.dat $ curl -O https://hg.mozilla.org/mozilla-central/raw-file/tip/security/manager/ssl/nsSTSPreloadList.inc $ grep -v '^/' public_suffix_list.dat | grep . | sed -e 's/$/\./' | sort > psl $ sed -n -e 's/^\([^, ]*\), .*/\1\./p' nsSTSPreloadList.inc > hsts $ comm -1 -2 hsts psl | wc -l 73 $
That is, websites registered under any of these 73
prefixes, such as e.g.,
.app
or .dev
, will always use HTTPS
when using the common, popular browsers that consume
this list.
Well, there you go. Top-level domains are, it turns out, a lot more complicated than what we commonly think of. The internet being a truly global network of networks with varied jurisdictions being in control of parts of the whole continues to provide for curious challenges and -- as anybody working in tech knows -- you regularly run into weird scenarios that trace back to the DNS.
Sometimes all the way to the
toptoptoptoptoptoptoptoptoptoptoptoptoptoptoptoptoptoptoptop.top
.
August 12th, 2021
See also:
.rs
!)