Monday, June 04, 2012

Animated solution to the "Never gonna give you up" program problem

I mentioned the fun "smallest program to output the lyrics of 'Never gonna give you up'" the other day.  It being a bank holiday weekend I've come up with my own solution which has 589 bytes in Perl.

It works by building a dictionary from which to compress the lyrics and then there's a simple decompressor.  Like many of the solutions this works by having the dictionary be indexed by single bytes and be built into a single string that expands to the result.

Here's the compressor.
use strict;
use warnings;

my $n =<<EOF;
We're no strangers to love
You know the rules and so do I
A full commitment's what I'm thinking of
You wouldn't get this from any other guy
I just wanna tell you how I'm feeling
Gotta make you understand

Never gonna give you up
Never gonna let you down
Never gonna run around and desert you
Never gonna make you cry
Never gonna say goodbye
Never gonna tell a lie and hurt you

We've known each other for so long
Your heart's been aching but
You're too shy to say it
Inside we both know what's been going on
We know the game and we're gonna play it
And if you ask me how I'm feeling
Don't tell me you're too blind to see

Never gonna give you up
Never gonna let you down
Never gonna run around and desert you
Never gonna make you cry
Never gonna say goodbye
Never gonna tell a lie and hurt you

Never gonna give you up
Never gonna let you down
Never gonna run around and desert you
Never gonna make you cry
Never gonna say goodbye
Never gonna tell a lie and hurt you

(Ooh, give you up)
(Ooh, give you up)
(Ooh)
Never gonna give, never gonna give
(Give you up)
(Ooh)
Never gonna give, never gonna give
(Give you up)

We've know each other for so long
Your heart's been aching but
You're too shy to say it
Inside we both know what's been going on
We know the game and we're gonna play it

I just wanna tell you how I'm feeling
Gotta make you understand

Never gonna give you up
Never gonna let you down
Never gonna run around and desert you
Never gonna make you cry
Never gonna say goodbye
Never gonna tell a lie and hurt you

Never gonna give you up
Never gonna let you down
Never gonna run around and desert you
Never gonna make you cry
Never gonna say goodbye
Never gonna tell a lie and hurt you

Never gonna give you up
Never gonna let you down
Never gonna run around and desert you
Never gonna make you cry
Never gonna say goodbye
Never gonna tell a lie and hurt you
EOF

my @b = map(chr, (reverse 129..254));

foreach my $i (reverse 2..173) {
    my $found = 1;
    while($found) {
        my %saving;
        $found = 0;

        foreach my $j (0..length($n)-$i) {
            my $s = substr($n, $j, $i);
            my $t = $n;
            $t =~ s/\Q$s\E/0/g;
            $saving{$s} = length($n) - length($t) - $i - 1;
        }

        my $top = (sort { $saving{$b} <=> $saving{$a} } keys %saving)[0];
        if ($saving{$top} > 0) {
            my $nb = shift(@b);
            $n =~ s/\Q$top\E/$nb/g;
            $n = $top . "$nb$n";
            $found = 1;
        }
    }
}

print $n;
It works by looking for all strings starting at length 173 (which I determined by looping and corresponds to the entire chorus) and working backwards. It computes the space saved by extracting that string and replacing it with a dictionary entry and removes the strings that provide the most space saving. When there are no more left for a specific length it moves onto the next length (one smaller). The result is 514 byte string containing a bunch of high-ASCII characters like this:
 gÐngÑveÒe ÓthÔouÕo Önd× yÕØÖloÙiÑ Út's Û(OohÜe'reÝ
YÕÞ knowß,ÐivàÚoá I'm â tÖsãtell ä a× åØ æeåçay it
è äértØêß ëÐonna ì oÔer íæupîmakeæïÛbeen ðëÔÓñÕ'rÓtoÖòeî)
óeÒrìô
We'Òßõ
NôöóÜ÷ôgiÒø howâfeeliÑ
ù
Nøú÷)ú, nø
(Givû
I just wannaéyÕùGotta ïu×ersta×
þü eachífor sÙÑÞr hearðachÚbut
YòshyãèInsidÓwÓboÔëwhaðgoán
WeñgamçwÝìplèýúîöletædownörun arÕ×ådeseêöïcryösayÐoodbyeöäa liçhuê
þWÝ nÖstraÑers tÙÒÞñrulesåsÖdÖI
A full commitmenÛwhatâÔinkáfÞ wÕldn'tÐet Ôis from anyíguyüõnýA× 
ifæask meùDon'témÓyòbli×ãee
þþ
Üà÷àûûóõýüþþ
To understand the compression look at the start. The first string to be decompressed will be " g", then "ng", then "ve" and so on. The high-ASCII characters are both the dictionary separators (between entries) and the characters to be replaced with the corresponding entry in the dictionary. Thus each occurrence of "Ð" is replaced with " g". A complete program to decompress to the lyrics is as follows:
$b=<<E;
 gÐngÑveÒe >ÓthÔouÕo Önd× yÕØÖloÙiÑ Út's Û(OohÜe'reÝ
YÕÞ knowß,ÐivàÚoá I'm â tÖsãtell ä a× åØ æeåçay it
è äértØêß ëÐonna ì oÔer íæupîmakeæïÛbeen ðëÔÓñÕ'rÓtoÖòeî)
óeÒrìô
We'Òßõ
NôöóÜ÷ôgiÒø howâfeeliÑ
ù
Nøú÷)ú, nø
(Givû
I just wannaéyÕùGotta ïu×ersta×
þü eachífor sÙÑÞr hearðachÚbut
YòshyãèInsidÓwÓboÔëwhaðgoán
WeñgamçwÝìplèýúîöletædownörun arÕ×ådeseêöïcryösayÐoodbyeöäa liçhuê
þWÝ nÖstraÑers tÙÒÞñrulesåsÖdÖI
A full commitmenÛwhatâÔinkáfÞ wÕldn'tÐet Ôis from anyíguyüõnýA× 
ifæask meùDon'témÓyòbli×ãee
þþ
Üà÷àûûóõýüþþ
E
map{($a,$b)=split($c=chr,$b,2);$b=~s/$c/$a/g}(208..254);print$b
The final line does all the work progressively splitting off the dictionary entry and then using a regular expression substitution to transform the string. There's a bunch of Perl abuse going on. Notice there's no ; at the end of the program or in the { } block. The $t=chr works because chr reads the $_ variable set from the map and it both sets $c and returns the set value as an argument to split. Also notice how its valid Perl to do print$b with no space. Since it's fairly likely that some of those characters got messed up in the web version, here's a Base64 encoded version of the program for people to test with.
JGI9PDxFOwogZ9BuZ9F2ZdJlINN0aNRvddVvINZu
ZNcgedXY1mxv2WnRINp0J3Mg2yhPb2jcZSdyZd0K
WdXeIGtub3ffLNBpduDab+EgSSdtIOIgdNZz43Rl
bGwg5CBh1yDl2CDmZeXnYXkgaXQK6CDk6XJ02Orf
IOvQb25uYSDsIG/UZXIg7eZ1cO5tYWtl5u/bYmVl
biDw69TT8dUnctN0b9byZe4pCvNl0nLs9ApXZSfS
3/UKTvT289z39Gdp0vggaG934mZlZWxp0Qr5Ck74
+vcp+iwgbvgKKEdpdvsKSSBqdXN0IHdhbm5h6XnV
+UdvdHRhIO9112Vyc3Rh1wr+/CBlYWNo7WZvciBz
2dHeciBoZWFy8GFjaNpidXQKWfJzaHnj6Eluc2lk
03fTYm/U63doYfBnb+FuCldl8Wdhbed33exwbOj9
+u72bGV05mRvd272cnVuIGFy1dflZGVzZer272Ny
efZzYXnQb29kYnll9uRhIGxp52h16gr+V90gbtZz
dHJh0WVycyB02dLe8XJ1bGVz5XPWZNZJCkEgZnVs
bCBjb21taXRtZW7bd2hhdOLUaW5r4WbeIHfVbGRu
J3TQZXQg1GlzIGZyb20gYW557Wd1efz1bv1B1yBp
ZuZhc2sgbWX5RG9uJ3TpbdN58mJsadfjZWUK/v4K
3OD34Pv78/X9/P7+CkUKbWFweygkYSwkYik9c3Bs
aXQoJGM9Y2hyLCRiLDIpOyRiPX5zLyRjLyRhL2d9
KDIwOC4uMjU0KTtwcmludCRiCg==
And to make it totally clear what's happening, here's a little animation of the decompression process.

4 comments:

Will said...

Beautiful :)

Byte-Pair Encoding http://en.wikipedia.org/wiki/Byte_pair_encoding

Will said...

Beautiful :)

Byte-Pair Encoding http://en.wikipedia.org/wiki/Byte_pair_encoding

Will said...

Beautiful :)

Byte-Pair Encoding

Francis Turner said...

I had a bunch of problems with the compressed sting. Here was my way to resolve them (t.pl was your compressor script):

$ echo '$b=<tt.pl
$ perl t.pl >>tt.pl
$ echo ''>>tt.pl
$ echo 'E'>>tt.pl
$ echo 'map{($a,$b)=split($c=chr,$b,2);$b=~s/$c/$a/g}(208..254);print$b'>>tt.pl
$ perl tt.pl

I suspect that cunning use of bit ranges would allow you to compress further (you have 49? compressed strings and the characters are a subset of [\n a-zA-Z'()] So some cunning tr would allow you to use 7 bits instead of 8 per character in the compressed string.

The only question is whether you could hack perl's pack/unpack functions (and tr) to function work in less than 60 total characters to get a net gain on the compression