New TLDs and Dictionary Words

Yesterday we announced some details about an upcoming Domain Name Filter software. We used an early version of the software to split the domain names in Guru, XYZ and Club zone files into words.

We used  the default English language dictionary within the software. This dictionary has over 75000 English words. It also includes the names of countries. We did not use dictionaries for common names, places and other proper nouns.

The software splits domain names into component words. Domain names with only numbers or a combination of  numbers and valid words were accepted. So 101domain.club is split as “101 domain” and accepted. But a domain name like xfrtyclub.guru is ignored because ‘xfrty’ is not in the dictionary.

Domains with hyphens were always accepted. For example, 1-koelner-pfeifen.club was split into “1 koelner pfeifen” and accepted even though both words koelner and pfeifen were not in the default English dictionary.

A Side Note: The software can split domain names accurately for most cases. However, in rare instances it does create unintended word combinations. For example, GreatIdeasInAging.xyz was split into “Great Idea Sin Aging” instead of “Great Ideas In Aging”.

Here are the results :-

XYZ

Total Domains Excluding IDN – 739,448

Valid Keyword Phrases after Splitting – 242,909

33% of the domains are valid English word combinations

CLUB

Total Domains Excluding IDN – 152,817

Valid Keyword Phrases after Splitting – 79,580

52% of the domains are valid English word combinations

GURU

Total Domains Excluding IDN – 78,271

Valid Keyword Phrases after Splitting – 49,983

64% of the domains are valid English word combinations

XYZ, Club, Guru Domains - Word Split