Skip to main content

A back channel confirms that I'm right... sort of

Through a little circuitous back channel I received an unofficial follow up to a blog post about the unused machine code in the GCHQ Code Challenge part 2. The follow up assured me that the unused code was left over after some clean up was done, and that the rest of the data in the file was random filler (as I'd already heard).

But as I'd already determined that it was not actually random (at least at one level) I sent a carrier pigeon out with a question about my follow up blog post.

Listening for secret messages transmitted during The Archers I received a reply indicating that I had indeed 'broken' the encryption on that stream of ASCII data (by guessing the algorithm and reverse engineering the key stream), but that the underlying data was actually random ASCII generated and encrypted using the following Python code:
b = ''.join([chr(randint(0x20, 0x7f)) for i in range(0, 16 * 7)])
c = codec(b, 0x1f, s=5)
The first part generates 112 bytes of random ASCII text and the codec function is apparently the encryption function with the key I had identified (see the 0x1f and 5).

There are a couple of oddities about this code. The randint could generate an 0x7F which is non-printable (and doesn't appear in the decrypted text) and it generates precisely 112 bytes of data (whereas the part 2 actually contains two blocks of 112 and one of 102).

I questioned that by leaving a note in a tree in St. James's Park and received a response by exchanging briefcases at King's Cross Station to the effect that the 102 byte block had simply been truncated to make the code look interesting.

Of course, that could all be completely non-suspicious, but having been told that first it was random filler and then that actually it was blocks of encrypted random printable ASCII filler it wouldn't surprise me if the truth was even more complex (and perhaps rather mundane).

But why go to all this trouble on 'random filler' text? And why update the web site to say "The Challenge Continues"?

I am left slightly baffled, which I'd imagine is just how people working in secret places would like it to be.

Comments

ste9an05 said…
Some of us are considering possible steganography in images/codebreaker.jpg as it has symmetry, which is a little unexpected
Steve said…
I thought I'd drop you a comment on your interesting blog entries.

First up, I was bothered too about the unused data in part 2. I've independently verified that the "h75 h10 h01" sequence at h0132 makes sense as it allows decryption to continue beyond the end of the segment, should the message have been longer than the remaining 4 segments (64 bytes). I was a little more perplexed by "hCC" at h0140 - but, perhaps, that invalid byte is there to mark a segment that should never be executed... but, if so, the zero-bytes that follow it make no sense. Similarly, it makes little sense that the data it decodes should start at 01c0 - as the unencrypted byte-code in the first two segments decode only segments h10-h14.

I concur with your view that the remaining sections (h0150-h01bf and h0200-h2ff) contain non-random data. In addition to the cyclic top-bit pattern you've documented, I note:

* The 'premature' end to non-zero data at h02d6 is reminiscent of the end-of-message marker that is preserved in memory from h01f3-h01ff.
* I'm suspicious about the fact that there are 26 trailing zeros - for two reasons... firstly because it matches the equal number of top-bit-clear;top-bit-set bits in your analysis of the three sections... and also because (simply) 26 is the number of letters in our alphabet - possibly hinting some alphabetic code.
* The vast majority of the data has no two successive bytes equal. But, in the last 11 non-zero bytes there are three adjacent equal pairs... h9e at h02cb-h02cc and h2f at h02d2-h02d3 and h4e at h02d4-h02d5. It is also unexpected that the non-zero data terminates with two repeated pairs. This observation makes it very hard for me to conclude that a random source was encoded and truncated arbitrarily as 'filler'.

An avenue I explored today was to note, as you did, that the byte code decodes by exclusive or with a sequence which could be generated from init+step*i, where i indicates the index of the byte and init and step are parameter bytes... the first decode can be parametrised init=hAA step=1 and the second decode init=0 and step=3. I brute-force searched this space looking for strings with significant sequences of printable ASCII... but found nothing interesting. I concluded that, if these sections do contain messages that can be decoded, I don't think it uses the same encryption scheme.

Like yourself, I'm a bit baffled. I don't think the data is random; it seems odd to include these sections where all the rest of the information in stages one and two are used by the end of stage three. I can't believe that the data is there to make it harder to identify the position of the message - as the location of the message can be immediately obtained by identifying the first string of zero bytes.
Thanks John. Bit disappointed at the challenge, in a way -- but at least, like you, I can now get some sleep and return to the day job. :).

Was hoping it might offer a number of different routes, rather than a "one track" approach, with differing resultant keywords, to make it more inter-disciplinary and to sort out the high fliers from the 99.9% of candidates who -- like me -- were "also rans".

Got totally the wrong skill set for them. vb.net helped a bit; php, js/ajax, mysql, firebird, apache .... Not much use here.

Was also hoping that the exe itself (which can be overwritten, of course, to get it to work, or fetched from localhost) would manipulate the "supposed" keyword sent in the clear ... Nada.
Steve said…
I previously said: "I'm suspicious about the fact that there are 26 trailing zeros" - but, evidently, I can't count - there are 42. This completely undermines my 'alphabetic cypher' hypothesis - but the other points, I think, were valid.
Scrub that last comment. Got the stage 2 VM working in PHP on my server, and ste9an05 found a great link to a graphical implementation in JS. Don't know why that idea totally passed me by? :)
This comment has been removed by the author.
Junk said…
Hello guys,
I'm amazed with your technical skills. It's really beautiful example of hacker thinking. See what hit my eyes...

What can these quests tell us about the autor?
I watched videos from Dr Gareth Owen and knowledge to crack this is even far beyond thinking of technical person.
- Use of Assembler-low level to Java-high level programing. (So wide programing language skills are very rare)
- Whole contest is in English. (No Chineese, no Russian. Seeking English person)
- Use of VM, which is quite high tech
- No brute force needed to solve quests
- No cloud quest (Interesting since there is trend in using it)
- Presence of Facebook, Twitter, Google+. (The contest is set to be spread)
- All quests are connected through web (Which is major fault)

As some of you may have seen the Mercury Rising movie, this quest may not be question to get a job, but How many people can crack it?



Different approach to get to the end

As I have previously described there is major fault that all quests are linked via canyoucrackit.co.uk web. Means that compromising this server will get you to the end, even without solving the quests.
- By entering dummy code on http://canyoucrackit.co.uk you will get /index.asp hint. This get's you information that server is running on Microsoft ASP scripting. And we all know that it's hackable.
- The next step might be to run Eeye Retina scan on canyoucrackit.co.uk, find vulnerabilities and get to the web.

Next flaw is that results on the web are static and all solutions leads to one link.

By putting http://canyoucrackit.co.uk into W3C validator http://validator.w3.org/ you will get non valid code... To my surprise!

And even more by doing Google search for "site:canyoucrackit.co.uk" you can find links to all quests
http://canyoucrackit.co.uk/soyoudidit.asp (1st page)
http://www.canyoucrackit.co.uk/15b436de1f9107f3778aad525e5d0b20.js (1st page)
http://canyoucrackit.co.uk/hqDTK7b8K2rvw/a3bfc2af/d2ab1f05/da13f110/key.txt (3rd page)
- No matter how stupid this is... The most easier solution is often the best.

Interesting is also that there is robots.txt, but web is indexed by Google. Possibly they added it later.

This leads me to think that there are 2 teams working on this task.
a) Quest team - very high knowledge assembler guy, java guy, VM guy, Wireshark(LAN) guy
b) Web team - which is very sloppy

And finally you don't even need to complete the quest or do the Google search to get to the end.
- On the GCHQ site http://www.gchq.gov.uk click on "Careers" then "Click here to visit our recruitment portal"
- Then "Jobs" - "Cyber Security Specialists" and you are there!

The end is that this all is just PR... so sad :-(
NivagSwerdna said…
Having solved this myself over the weekend I noted with interest your observation that there might be further data. Along the same lines as Steve I note that the encryption is a simple XOR with a linear increase to the key; by scanning the bytes in the VM memory and looking for suitable candidate combinations of 3 letters I find only the previously discovered Part 2 plaintext. The absence of a 42 42 42 42 signature implies that it doesn't warrant a return to the deadbeef either.
I think I conclude that there is no further message. I hope I'm proven wrong in a few days.
Regards
Nivag

Popular posts from this blog

Your last name contains invalid characters

My last name is "Graham-Cumming". But here's a typical form response when I enter it:


Does the web site have any idea how rude it is to claim that my last name contains invalid characters? Clearly not. What they actually meant is: our web site will not accept that hyphen in your last name. But do they say that? No, of course not. They decide to shove in my face the claim that there's something wrong with my name.

There's nothing wrong with my name, just as there's nothing wrong with someone whose first name is Jean-Marie, or someone whose last name is O'Reilly.

What is wrong is that way this is being handled. If the system can't cope with non-letters and spaces it needs to say that. How about the following error message:

Our system is unable to process last names that contain non-letters, please replace them with spaces.

Don't blame me for having a last name that your system doesn't like, whose fault is that? Saying "Your last name …

Importing an existing SSL key/certificate pair into a Java keystore

I'm writing this blog post in case anyone else has to Google that. In Java 6 keytool has been improved so that it now becomes possible to import an existing key and certificate (say one you generated outside of the Java world) into a keystore.

You need: Java 6 and openssl.

1. Suppose you have a certificate and key in PEM format. The key is named host.key and the certificate host.crt.

2. The first step is to convert them into a single PKCS12 file using the command: openssl pkcs12 -export -in host.crt -inkey host.key > host.p12. You will be asked for various passwords (the password to access the key (if set) and then the password for the PKCS12 file being created).

3. Then import the PKCS12 file into a keystore using the command: keytool -importkeystore -srckeystore host.p12 -destkeystore host.jks -srcstoretype pkcs12. You now have a keystore named host.jks containing the certificate/key you need.

For the sake of completeness here's the output of a full session I performe…

More fun with toys: the Ikea LILLABO Train Set

As further proof of my unsuitability to be a child minder (see previous post) I found myself playing with an Ikea LILLABO 20-piece basic set train.


The train set has 16 pieces of track (12 curves, two straight pieces and a two part bridge) and 4 pieces of train. What I wondered was... how many possible looping train tracks can be made using all 16 pieces?

The answer is... 9. Here's a picture of the 9 different layouts.


The picture was generated using a little program written in Processing. The bridge is red, the straight pieces are green and the curves are blue or magenta depending on whether they are oriented clockwise or anticlockwise. The curved pieces can be oriented in either way.

To generate those layouts I wrote a small program which runs through all the possible layouts and determines which form a loop. The program eliminates duplicate layouts (such as those that are mirror images of each other).

It outputs a list of instructions for building loops. These instructions con…