Update: Confirmation that this is correct
Early this morning I blogged that I thought there was a hidden part 4 to the GCHQ Code Challenge because of the amount of non-random data in part 2. Through someone who knows someone who knows someone at GCHQ I got the response the data was "a random filler and that they ran out of time to do anything interesting".
Well, I don't believe that's the whole story. There's a distinct pattern worth investigating.
Firstly, here are the three blocks of data that are untouched in part 2:
The decryption scheme used for the other parts of part 2 (that actually get decrypted) is based on calculating the following where xn is the nth byte (0 based) of the block to be descrypted: f(xn) = xn xor ( a * n + b ). a and b are constants. The first decryption uses a = 1 and b = 0xAA. The second decryption uses a = 3 and b = 0x32.
Working under the assumption that a similar scheme was used for these three blocks I ran various tests and noticed a peculiar rhythmic pattern where whole runs of random ASCII characters would appear followed by runs of non-printable characters. It occurred to me that this could imply that (a) the underlying data is in ASCII (using printable characters) and that (b) the pattern I was seeing was a pattern in the top bit of key. If the key is following the scheme above it would go through runs of 0s followed by 1s and if the plain text is ASCII the top bit is always 0 and so for that one bit the 'key' comes through.
A quick calculation of just the top bits of the three blocks above showed the following pattern:
Theorizing that that was generated using the f(xn) function above meant finding a and b that fit the bill. It's not hard to calculate this spotting that the 25 occurs when the 0xFF boundary is passed and that the 20 occurs because we start in the middle of a 26 block. The actual coefficents are a = 5 and b = 0x1F. That generates the following sequence to use as the key:
1f 24 29 2e 33 38 3d 42 47 4c 51 56 5b 60 65 6a 6f 74 79 7e 83 88 8d 92 97 9c a1 a6 ab b0 b5 ba bf c4 c9 ce d3 d8 dd e2 e7 ec f1 f6 fb 00 05 0a 0f 14 19 1e 23 28 2d 32 37 3c 41 46 4b 50 55 5a 5f 64 69 6e 73 78 7d 82 87 8c 91 96 9b a0 a5 aa af b4 b9 be c3 c8 cd d2 d7 dc e1 e6 eb f0 f5 fa ff 04 09 0e 13 18 1d 22 27 2c 31 36 3b 40 45 4a 4f 54 59 5e 63 68 6d 72 77 7c 81 86 8b 90 95 9a 9f a4 a9 ae b3 b8 bd c2 c7 cc d1 d6 db e0 e5 ea ef f4 f9 fe 03 08 0d 12 17 1c 21 26 2b 30 35 3a 3f 44 49 4e 53 58 5d 62 67 6c 71 76 7b 80 85 8a 8f 94 99 9e a3 a8 ad b2 b7 bc c1 c6 cb d0 d5 da df e4 e9 ee f3 f8 fd 02 07 0c 11 16 1b 20 25 2a 2f 34 39 3e 43 48 4d 52 57 5c 61 66 6b 70 75 7a 7f 84 89 8e 93 98 9d a2 a7 ac b1 b6 bb c0 c5 ca cf d4 d9 de e3 e8 ed f2 f7 fc 01 06 0b 10 15 1a 1f
Which has the required sequence of 0s and 1s and begins: 20 (0s), 25 (1s), 26 (0s), 26 (1s), 25 (0s). So, at this stage I've identified a pattern and a way to generate that sequence using the encryption scheme used in part 2.
The surprising thing that happens when you decrypt those three blocks using that key stream is that not only is it 7 bit (which was bound to happen because of the way the key was generated) but the entire thing is printable ASCII. No characters below 0x20 and no 0x7F. Here are the three decrypted blocks in ASCII (I added line breaks):
b;<N~uo?Ik<F6:c<(`;p5:?t|("(|Uac|4I["Z_xZyU{a+5cE}|K?SD.Y85sjvz:
\*^[email protected],Dd=83;?e0bnP3R$ZF:V,L~O 5wS&[km?6x5M;7A+X-(^.?,%Ugu;O4x;"?
<Dh<uy<tTPYcO|ui:9S-5YhY0!vU(k3e`rL.5ms;C`~o`6hW]TA[fqh_k4smCk}{
$a;gY9_y?z76gZ'n0AOi3eY6Li\b8W%(J~cHR)j*2I3`&bu!gV;L9j4pA%^eB::{
0%qjE)RcVax:/xsk}*.=u[\[email protected]/N7aHpA_$5H'LCW76XHtRA*krs|WZxu|U;
d^&!]V"',16;@EJ
The fact that this decrypts as printable ASCII reinforces my belief that this is correct. Of course, it looks like gibberish, but it's printable. There are 89 characters used from the 95 possible. The missing characters are # 1 > G Q l.
It seems unlikely that the sequence of steps I have taken would have resulted in a block of perfectly printable ASCII characters had this not been the correct thing to do.
Perhaps this is a base-89 encoding of some other binary data. Whatever it is it has high-entropy and could be encrypted with some robust scheme, or it could be the result of compression. Of course, it could be that this data is random and was encrypted by GCHQ as I specified as a sort of dead end.
PS If anyone from GCHQ is reading... can you email me a simple 'carry on' or 'stop wasting your time'. Need to sleep...
Early this morning I blogged that I thought there was a hidden part 4 to the GCHQ Code Challenge because of the amount of non-random data in part 2. Through someone who knows someone who knows someone at GCHQ I got the response the data was "a random filler and that they ran out of time to do anything interesting".
Well, I don't believe that's the whole story. There's a distinct pattern worth investigating.
Firstly, here are the three blocks of data that are untouched in part 2:
0150: 7d 1f 15 60 4d 4d 52 7d 0e 27 6d 10 6d 5a 06 56 }..`MMR}.'m.mZ.V 0160: 47 14 42 0e b6 b2 b2 e6 eb b4 83 8e d7 e5 d4 d9 G.B............. 0170: c3 f0 80 95 f1 82 82 9a bd 95 a4 8d 9a 2b 30 69 .............+0i 0180: 4a 69 65 55 1c 7b 69 1c 6e 04 74 35 21 26 2f 60 JieU.{i.n.t5!&/` 0190: 03 4e 37 1e 33 54 39 e6 ba b4 a2 ad a4 c5 95 c8 .N7.3T9......... 01a0: c1 e4 8a ec e7 92 8b e8 81 f0 ad 98 a4 d0 c0 8d ................ 01b0: ac 22 52 65 7e 27 2b 5a 12 61 0a 01 7a 6b 1d 67 ."Re~'+Z.a..zk.g 0200: 37 7a 07 11 1f 1d 68 25 32 77 1e 62 23 5b 47 55 7z....h%2w.b#[GU 0210: 53 30 11 42 f6 f1 b1 e6 c3 cc f8 c5 e4 cc c0 d3 S0.B............ 0220: 85 fd 9a e3 e6 81 b5 bb d7 cd 87 a3 d3 6b 36 6f .............k6o 0230: 6f 66 55 30 16 45 5e 09 74 5c 3f 29 2b 66 3d 0d ofU0.E^.t\?)+f=. 0240: 02 30 28 35 15 09 15 dd ec b8 e2 fb d8 cb d8 d1 .0(5............ 0250: 8b d5 82 d9 9a f1 92 ab e8 a6 d6 d0 8c aa d2 94 ................ 0260: cf 45 46 67 20 7d 44 14 6b 45 6d 54 03 17 60 62 .EFg }D.kEmT..`b 0270: 55 5a 4a 66 61 11 57 68 75 05 62 36 7d 02 10 4b UZJfa.Whu.b6}..K 0280: 08 22 42 32 ba e2 b9 e2 d6 b9 ff c3 e9 8a 8f c1 ."B2............ 0290: 8f e1 b8 a4 96 f1 8f 81 b1 8d 89 cc d4 78 76 61 .............xva 02a0: 72 3e 37 23 56 73 71 79 63 7c 08 11 20 69 7a 14 r>7#Vsqyc|.. iz. 02b0: 68 05 21 1e 32 27 59 b7 cf ab dd d5 cc 97 93 f2 h.!.2'Y......... 02c0: e7 c0 eb ff e9 a3 bf a1 ab 8b bb 9e 9e 8c a0 c1 ................ 02d0: 9b 5a 2f 2f 4e 4e 00 00 00 00 00 00 00 00 00 00 .Z//NN..........I say three (rather than two blocks), because as you'll see below there's a real pattern that indicates that we are looking at three blocks of data encrypted using the same method and the same key.
The decryption scheme used for the other parts of part 2 (that actually get decrypted) is based on calculating the following where xn is the nth byte (0 based) of the block to be descrypted: f(xn) = xn xor ( a * n + b ). a and b are constants. The first decryption uses a = 1 and b = 0xAA. The second decryption uses a = 3 and b = 0x32.
Working under the assumption that a similar scheme was used for these three blocks I ran various tests and noticed a peculiar rhythmic pattern where whole runs of random ASCII characters would appear followed by runs of non-printable characters. It occurred to me that this could imply that (a) the underlying data is in ASCII (using printable characters) and that (b) the pattern I was seeing was a pattern in the top bit of key. If the key is following the scheme above it would go through runs of 0s followed by 1s and if the plain text is ASCII the top bit is always 0 and so for that one bit the 'key' comes through.
A quick calculation of just the top bits of the three blocks above showed the following pattern:
0150: 000000000000000000001111111111111111111111111 0000000000000000000000000011111111111111111111111111 000000000000000 0200: 000000000000000000001111111111111111111111111 0000000000000000000000000011111111111111111111111111 000000000000000 0270: 000000000000000000001111111111111111111111111 0000000000000000000000000011111111111111111111111111 00000Immediately, it's apparent this this data is not random (at least in the top bit), and that it follows a pattern of 0s and 1s alternating with the following counts:
150: 20 (0s), 25 (1s), 26 (0s), 26 (1s), 15 (0s) 200: 20 (0s), 25 (1s), 26 (0s), 26 (1s), 15 (0s) 270: 20 (0s), 25 (1s), 26 (0s), 26 (1s), 15 (0s)(Note that the last block is short because the data is truncated and ends with 0s). It also might indicate that there are three separate blocks of data encrypted using the same key.
Theorizing that that was generated using the f(xn) function above meant finding a and b that fit the bill. It's not hard to calculate this spotting that the 25 occurs when the 0xFF boundary is passed and that the 20 occurs because we start in the middle of a 26 block. The actual coefficents are a = 5 and b = 0x1F. That generates the following sequence to use as the key:
1f 24 29 2e 33 38 3d 42 47 4c 51 56 5b 60 65 6a 6f 74 79 7e 83 88 8d 92 97 9c a1 a6 ab b0 b5 ba bf c4 c9 ce d3 d8 dd e2 e7 ec f1 f6 fb 00 05 0a 0f 14 19 1e 23 28 2d 32 37 3c 41 46 4b 50 55 5a 5f 64 69 6e 73 78 7d 82 87 8c 91 96 9b a0 a5 aa af b4 b9 be c3 c8 cd d2 d7 dc e1 e6 eb f0 f5 fa ff 04 09 0e 13 18 1d 22 27 2c 31 36 3b 40 45 4a 4f 54 59 5e 63 68 6d 72 77 7c 81 86 8b 90 95 9a 9f a4 a9 ae b3 b8 bd c2 c7 cc d1 d6 db e0 e5 ea ef f4 f9 fe 03 08 0d 12 17 1c 21 26 2b 30 35 3a 3f 44 49 4e 53 58 5d 62 67 6c 71 76 7b 80 85 8a 8f 94 99 9e a3 a8 ad b2 b7 bc c1 c6 cb d0 d5 da df e4 e9 ee f3 f8 fd 02 07 0c 11 16 1b 20 25 2a 2f 34 39 3e 43 48 4d 52 57 5c 61 66 6b 70 75 7a 7f 84 89 8e 93 98 9d a2 a7 ac b1 b6 bb c0 c5 ca cf d4 d9 de e3 e8 ed f2 f7 fc 01 06 0b 10 15 1a 1f
Which has the required sequence of 0s and 1s and begins: 20 (0s), 25 (1s), 26 (0s), 26 (1s), 25 (0s). So, at this stage I've identified a pattern and a way to generate that sequence using the encryption scheme used in part 2.
The surprising thing that happens when you decrypt those three blocks using that key stream is that not only is it 7 bit (which was bound to happen because of the way the key was generated) but the entire thing is printable ASCII. No characters below 0x20 and no 0x7F. Here are the three decrypted blocks in ASCII (I added line breaks):
b;<N~uo?Ik<F6:c<(`;p5:?t|("(|Uac|4I["Z_xZyU{a+5cE}|K?SD.Y85sjvz:
\*^[email protected],Dd=83;?e0bnP3R$ZF:V,L~O 5wS&[km?6x5M;7A+X-(^.?,%Ugu;O4x;"?
<Dh<uy<tTPYcO|ui:9S-5YhY0!vU(k3e`rL.5ms;C`~o`6hW]TA[fqh_k4smCk}{
$a;gY9_y?z76gZ'n0AOi3eY6Li\b8W%(J~cHR)j*2I3`&bu!gV;L9j4pA%^eB::{
0%qjE)RcVax:/xsk}*.=u[\[email protected]/N7aHpA_$5H'LCW76XHtRA*krs|WZxu|U;
d^&!]V"',16;@EJ
The fact that this decrypts as printable ASCII reinforces my belief that this is correct. Of course, it looks like gibberish, but it's printable. There are 89 characters used from the 95 possible. The missing characters are # 1 > G Q l.
It seems unlikely that the sequence of steps I have taken would have resulted in a block of perfectly printable ASCII characters had this not been the correct thing to do.
Perhaps this is a base-89 encoding of some other binary data. Whatever it is it has high-entropy and could be encrypted with some robust scheme, or it could be the result of compression. Of course, it could be that this data is random and was encrypted by GCHQ as I specified as a sort of dead end.
PS If anyone from GCHQ is reading... can you email me a simple 'carry on' or 'stop wasting your time'. Need to sleep...
Comments
1f 24 29 2e 33 38 3d 42 47 4c 51 56 5b 60 65 6a
6f 74 79 7e 83 88 8d 92 97 9c a1 a6 ab b0 b5 ba
bf c4 c9 ce d3 d8 dd e2 e7 ec f1 f6 fb 00 05 0a
0f 14 19 1e 23 28 2d 32 37 3c 41 46 4b 50 55 5a
5f 64 69 6e 73 78 7d 82 87 8c 91 96 9b a0 a5 aa
af b4 b9 be c3 c8 cd d2 d7 dc e1 e6 eb f0 f5 fa
ff 04 09 0e 13 18 1d 22 27 2c 31 36 3b 40 45 4a
4f 54 59 5e 63 68 6d 72 77 7c 81 86 8b 90 95 9a
9f a4 a9 ae b3 b8 bd c2 c7 cc d1 d6 db e0 e5 ea
ef f4 f9 fe 03 08 0d 12 17 1c 21 26 2b 30 35 3a
3f 44 49 4e 53 58 5d 62 67 6c 71 76 7b 80 85 8a
8f 94 99 9e a3 a8 ad b2 b7 bc c1 c6 cb d0 d5 da
df e4 e9 ee f3 f8 fd 02 07 0c 11 16 1b 20 25 2a
2f 34 39 3e 43 48 4d 52 57 5c 61 66 6b 70 75 7a
7f 84 89 8e 93 98 9d a2 a7 ac b1 b6 bb c0 c5 ca
cf d4 d9 de e3 e8 ed f2 f7 fc 01 06 0b 10 15 1a
1f
(Apologies for formatting; Blogger does not allow code or pre tags)
An additional observation was: "One firmware value is 0xd2ab1f05 = 1F=31 05=5 -> values for above algorithm. Coincidence?"
The other two bytes could indicate where to start and loop how many times?
An additional observation was: "One firmware value is 0xd2ab1f05 = 1F=31 05=5 -> values for above algorithm. Coincidence?"
The other two values could be the starting point and how many times to loop?
An additional observation was: "One firmware value is 0xd2ab1f05 = 1F=31 05=5 -> values for above algorithm. Coincidence?"
The other two values could be the starting point and how many times to loop?