## Monday, December 05, 2011

### Down the GCHQ rabbit hole (or I think there really is a part 4)

Update: Confirmation that this is correct

Early this morning I blogged that I thought there was a hidden part 4 to the GCHQ Code Challenge because of the amount of non-random data in part 2. Through someone who knows someone who knows someone at GCHQ I got the response the data was "a random filler and that they ran out of time to do anything interesting".

Well, I don't believe that's the whole story. There's a distinct pattern worth investigating.

Firstly, here are the three blocks of data that are untouched in part 2:
0150: 7d 1f 15 60 4d 4d 52 7d 0e 27 6d 10 6d 5a 06 56   }..MMR}.'m.mZ.V
0160: 47 14 42 0e b6 b2 b2 e6 eb b4 83 8e d7 e5 d4 d9   G.B.............
0170: c3 f0 80 95 f1 82 82 9a bd 95 a4 8d 9a 2b 30 69   .............+0i
0180: 4a 69 65 55 1c 7b 69 1c 6e 04 74 35 21 26 2f 60   JieU.{i.n.t5!&/
0190: 03 4e 37 1e 33 54 39 e6 ba b4 a2 ad a4 c5 95 c8   .N7.3T9.........
01a0: c1 e4 8a ec e7 92 8b e8 81 f0 ad 98 a4 d0 c0 8d   ................
01b0: ac 22 52 65 7e 27 2b 5a 12 61 0a 01 7a 6b 1d 67   ."Re~'+Z.a..zk.g

0200: 37 7a 07 11 1f 1d 68 25 32 77 1e 62 23 5b 47 55   7z....h%2w.b#[GU
0210: 53 30 11 42 f6 f1 b1 e6 c3 cc f8 c5 e4 cc c0 d3   S0.B............
0220: 85 fd 9a e3 e6 81 b5 bb d7 cd 87 a3 d3 6b 36 6f   .............k6o
0230: 6f 66 55 30 16 45 5e 09 74 5c 3f 29 2b 66 3d 0d   ofU0.E^.t\?)+f=.
0240: 02 30 28 35 15 09 15 dd ec b8 e2 fb d8 cb d8 d1   .0(5............
0250: 8b d5 82 d9 9a f1 92 ab e8 a6 d6 d0 8c aa d2 94   ................
0260: cf 45 46 67 20 7d 44 14 6b 45 6d 54 03 17 60 62   .EFg }D.kEmT..b

0270: 55 5a 4a 66 61 11 57 68 75 05 62 36 7d 02 10 4b   UZJfa.Whu.b6}..K
0280: 08 22 42 32 ba e2 b9 e2 d6 b9 ff c3 e9 8a 8f c1   ."B2............
0290: 8f e1 b8 a4 96 f1 8f 81 b1 8d 89 cc d4 78 76 61   .............xva
02a0: 72 3e 37 23 56 73 71 79 63 7c 08 11 20 69 7a 14   r>7#Vsqyc|.. iz.
02b0: 68 05 21 1e 32 27 59 b7 cf ab dd d5 cc 97 93 f2   h.!.2'Y.........
02c0: e7 c0 eb ff e9 a3 bf a1 ab 8b bb 9e 9e 8c a0 c1   ................
02d0: 9b 5a 2f 2f 4e 4e 00 00 00 00 00 00 00 00 00 00   .Z//NN..........

I say three (rather than two blocks), because as you'll see below there's a real pattern that indicates that we are looking at three blocks of data encrypted using the same method and the same key.

The decryption scheme used for the other parts of part 2 (that actually get decrypted) is based on calculating the following where xn is the nth byte (0 based) of the block to be descrypted: f(xn) = xn xor ( a * n + b ). a and b are constants. The first decryption uses a = 1 and b = 0xAA. The second decryption uses a = 3 and b = 0x32.

Working under the assumption that a similar scheme was used for these three blocks I ran various tests and noticed a peculiar rhythmic pattern where whole runs of random ASCII characters would appear followed by runs of non-printable characters. It occurred to me that this could imply that (a) the underlying data is in ASCII (using printable characters) and that (b) the pattern I was seeing was a pattern in the top bit of key. If the key is following the scheme above it would go through runs of 0s followed by 1s and if the plain text is ASCII the top bit is always 0 and so for that one bit the 'key' comes through.

A quick calculation of just the top bits of the three blocks above showed the following pattern:
0150:
000000000000000000001111111111111111111111111
0000000000000000000000000011111111111111111111111111
000000000000000
0200:
000000000000000000001111111111111111111111111
0000000000000000000000000011111111111111111111111111
000000000000000
0270:
000000000000000000001111111111111111111111111
0000000000000000000000000011111111111111111111111111
00000

Immediately, it's apparent this this data is not random (at least in the top bit), and that it follows a pattern of 0s and 1s alternating with the following counts:
150: 20 (0s), 25 (1s), 26 (0s), 26 (1s), 15 (0s)
200: 20 (0s), 25 (1s), 26 (0s), 26 (1s), 15 (0s)
270: 20 (0s), 25 (1s), 26 (0s), 26 (1s), 15 (0s)

(Note that the last block is short because the data is truncated and ends with 0s). It also might indicate that there are three separate blocks of data encrypted using the same key.

Theorizing that that was generated using the f(xn) function above meant finding a and b that fit the bill. It's not hard to calculate this spotting that the 25 occurs when the 0xFF boundary is passed and that the 20 occurs because we start in the middle of a 26 block. The actual coefficents are a = 5 and b = 0x1F. That generates the following sequence to use as the key:

1f 24 29 2e 33 38 3d 42 47 4c 51 56 5b 60 65 6a 6f 74 79 7e 83 88 8d 92 97 9c a1 a6 ab b0 b5 ba bf c4 c9 ce d3 d8 dd e2 e7 ec f1 f6 fb 00 05 0a 0f 14 19 1e 23 28 2d 32 37 3c 41 46 4b 50 55 5a 5f 64 69 6e 73 78 7d 82 87 8c 91 96 9b a0 a5 aa af b4 b9 be c3 c8 cd d2 d7 dc e1 e6 eb f0 f5 fa ff 04 09 0e 13 18 1d 22 27 2c 31 36 3b 40 45 4a 4f 54 59 5e 63 68 6d 72 77 7c 81 86 8b 90 95 9a 9f a4 a9 ae b3 b8 bd c2 c7 cc d1 d6 db e0 e5 ea ef f4 f9 fe 03 08 0d 12 17 1c 21 26 2b 30 35 3a 3f 44 49 4e 53 58 5d 62 67 6c 71 76 7b 80 85 8a 8f 94 99 9e a3 a8 ad b2 b7 bc c1 c6 cb d0 d5 da df e4 e9 ee f3 f8 fd 02 07 0c 11 16 1b 20 25 2a 2f 34 39 3e 43 48 4d 52 57 5c 61 66 6b 70 75 7a 7f 84 89 8e 93 98 9d a2 a7 ac b1 b6 bb c0 c5 ca cf d4 d9 de e3 e8 ed f2 f7 fc 01 06 0b 10 15 1a 1f

Which has the required sequence of 0s and 1s and begins: 20 (0s), 25 (1s), 26 (0s), 26 (1s), 25 (0s). So, at this stage I've identified a pattern and a way to generate that sequence using the encryption scheme used in part 2.

The surprising thing that happens when you decrypt those three blocks using that key stream is that not only is it 7 bit (which was bound to happen because of the way the key was generated) but the entire thing is printable ASCII. No characters below 0x20 and no 0x7F. Here are the three decrypted blocks in ASCII (I added line breaks):

b;<N~uo?Ik<F6:c<(;p5:?t|("(|Uac|4I["Z_xZyU{a+5cE}|K?SD.Y85sjvz:
\*^p@,Dd=83;?e0bnP3R$ZF:V,L~O 5wS&[km?6x5M;7A+X-(^.?,%Ugu;O4x;"? <Dh<uy<tTPYcO|ui:9S-5YhY0!vU(k3erL.5ms;C~o6hW]TA[fqh_k4smCk}{$a;gY9_y?z76gZ'n0AOi3eY6Li\b8W%(J~cHR)j*2I3&bu!gV;L9j4pA%^eB::{
0%qjE)RcVax:/xsk}*.=u[\KT@IWk9/N7aHpA_\$5H'LCW76XHtRA*krs|WZxu|U;
d^&!]V"',16;@EJ

The fact that this decrypts as printable ASCII reinforces my belief that this is correct. Of course, it looks like gibberish, but it's printable. There are 89 characters used from the 95 possible. The missing characters are # 1 > G Q l.

It seems unlikely that the sequence of steps I have taken would have resulted in a block of perfectly printable ASCII characters had this not been the correct thing to do.

Perhaps this is a base-89 encoding of some other binary data. Whatever it is it has high-entropy and could be encrypted with some robust scheme, or it could be the result of compression. Of course, it could be that this data is random and was encrypted by GCHQ as I specified as a sort of dead end.

PS If anyone from GCHQ is reading... can you email me a simple 'carry on' or 'stop wasting your time'. Need to sleep...

Labels:

If you enjoyed this blog post, you might enjoy my travel book for people interested in science and technology: The Geek Atlas. Signed copies of The Geek Atlas are available.

Mikester said...

This is likely something you've already noticed, but it became apparent in my feed reader: the least-significant byte (big-endian) repeats every 16 bytes:

1f 24 29 2e 33 38 3d 42 47 4c 51 56 5b 60 65 6a
6f 74 79 7e 83 88 8d 92 97 9c a1 a6 ab b0 b5 ba
bf c4 c9 ce d3 d8 dd e2 e7 ec f1 f6 fb 00 05 0a
0f 14 19 1e 23 28 2d 32 37 3c 41 46 4b 50 55 5a
5f 64 69 6e 73 78 7d 82 87 8c 91 96 9b a0 a5 aa
af b4 b9 be c3 c8 cd d2 d7 dc e1 e6 eb f0 f5 fa
ff 04 09 0e 13 18 1d 22 27 2c 31 36 3b 40 45 4a
4f 54 59 5e 63 68 6d 72 77 7c 81 86 8b 90 95 9a
9f a4 a9 ae b3 b8 bd c2 c7 cc d1 d6 db e0 e5 ea
ef f4 f9 fe 03 08 0d 12 17 1c 21 26 2b 30 35 3a
3f 44 49 4e 53 58 5d 62 67 6c 71 76 7b 80 85 8a
8f 94 99 9e a3 a8 ad b2 b7 bc c1 c6 cb d0 d5 da
df e4 e9 ee f3 f8 fd 02 07 0c 11 16 1b 20 25 2a
2f 34 39 3e 43 48 4d 52 57 5c 61 66 6b 70 75 7a
7f 84 89 8e 93 98 9d a2 a7 ac b1 b6 bb c0 c5 ca
cf d4 d9 de e3 e8 ed f2 f7 fc 01 06 0b 10 15 1a
1f

(Apologies for formatting; Blogger does not allow code or pre tags)

9:05 PM
sep332 said...

Well the first byte also is repeated, 3x at a time. 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, ... This could very well be random filler.

9:37 PM
John Graham-Cumming said...

@Mikester and @sep332. You are looking at the key and not the plain text.

10:03 PM
sep332 said...

oh, i did think it was a bit odd >< sorry :)

12:41 AM
Kid said...

I've seen this result here: http://www.r00t.cz/Misc/CanYouCrackIt

An additional observation was: "One firmware value is 0xd2ab1f05 = 1F=31 05=5 -> values for above algorithm. Coincidence?"

The other two bytes could indicate where to start and loop how many times?

8:43 PM
Kid said...

I've seen this result here: http://www.r00t.cz/Misc/CanYouCrackIt

An additional observation was: "One firmware value is 0xd2ab1f05 = 1F=31 05=5 -> values for above algorithm. Coincidence?"

The other two values could be the starting point and how many times to loop?

8:44 PM
Kid said...

I've seen this result here: http://www.r00t.cz/Misc/CanYouCrackIt

An additional observation was: "One firmware value is 0xd2ab1f05 = 1F=31 05=5 -> values for above algorithm. Coincidence?"

The other two values could be the starting point and how many times to loop?

8:45 PM