## Tuesday, January 26, 2010

### The squawking Squaw King was stabbed in a stab bed

Yesterday I tweeted: Realized that 'assisted' is 'ass is ted'. Are there other non-compound words in English which consist entirely of other words? and people replied with is land and cut lass.

Naturally, I couldn't resist writing a small amount of code to figure out other word sequences within words. Using a short program and a 57,000 word English dictionary of common words I had the answer: 12,870 words. That means 23% of English words have this property.

Of course, many are rather boring because they are just compound words. But others are more fun:

I have secretions of secret ions and a seepage (see page 21), but sematically my sematic ally says I am fatalistic and fatal is tic bite. But the fellow fell, ow! And asked, do we seal ants with sealants? I went to the palace to see my pal (ace) and said "Serge! Ants". He called for "Sergeants!". But with an antelope the ant elope.

I smelt an aroma: the rapist! Yet, it was just the aromatherapist.

You can get the full list here.

Update Here's the code
# ----------------------------------------------------------------------------# Small program to find words that consist entirely of other words# concatenated.  An example is 'fatalistic' which is 'fatal is tic'## Written by John Graham-Cumming# ----------------------------------------------------------------------------use strict;use warnings;# The first argument to the program is the filename of a dictionary of# words, this dictionary will be searched for words consisting of word# sequences.  It should be simply one word per line.## It is loaded into the %words hash.my $dict =$ARGV[0];my %words;if ( open F, "<$dict" ) { while (<F>) { chomp;$words{$_} = 1; } close F;} else { die "Cannot open dictionary file$dict\n";}# Check every word in the dictionary using the recursive function# check_word.  Note that I don't sort the words here since that might# take a long time.  Sorting can be done on the output.foreach my $w (keys %words) { my$sub = check_word($w); if ($sub ne '' ) { print "$w ($sub)\n";    }}# check_word extracts ever longer subsequences of the word to be# checked and sees if they are themselves words (by checking in# %words).  If a word is found then the remainder of the word is sent# to a recursive call to check_word.## For example, suppose we do check_word( fatalistic ), the code will# check the following:## check_word: fatalistic; found so far: #  f?#  fa?#  fat?#   check_word: alistic; found so far:  fat#    a?#     check_word: listic; found so far:  fat a#      l?#      li?#      lis?#      list?#       check_word: ic; found so far:  fat a list#        i?#      listi?#    al?#    ali?#    alis?#    alist?#    alisti?#  fata?#  fatal?#   check_word: istic; found so far:  fatal#    i?#    is?#     check_word: tic; found so far:  fatal is## This function returns an empty string if the word does not consists# of other words, or a string containing the word broken down into# space separated words## e.g. check_word('fatalistic') returns ' fatal is tic'#      check_word('potato') returns ''sub check_word{    my ( $w, # The word to check$depth ) = @_;  # Contains the words found so far, or    # undefined when first called    if ( !defined( $depth ) ) {$depth = '';    } else { if ( defined( $words{$w} ) ) {     return "$depth$w"; }    }    for my $i (1..length($w)-1) { my $fragment = substr($w,0,$i); if ( defined($words{$fragment} ) ) { my$sub = check_word(substr($w,$i), "$depth$fragment");     if ( $sub ne '' ) { return$sub;     } }    }    return '';}

Labels:

If you enjoyed this blog post, you might enjoy my travel book for people interested in science and technology: The Geek Atlas. Signed copies of The Geek Atlas are available.

<$BlogCommentBody$>

<$BlogCommentDateTime$> <$BlogCommentDeleteIcon$>

#### Links to this post:

<$BlogBacklinkControl$> <$BlogBacklinkTitle$> <$BlogBacklinkDeleteIcon$>
<$BlogBacklinkSnippet$>
Create a Link