Wednesday, February 13, 2008

Tonight, I'm going to write myself an Aston Martin

This is the story of my attempt to 'cheat' in an on-line spot-the-ball competition to win an Aston Martin. It's also the story of my failure, but you get free source code that implements automatic detection of image alteration using copy/paste or tools like the Clone Tool in Photoshop.

First, take a look at this photo:



Notice anything strange? In fact this image has been tampered with to cover up a truck. The truck is completely hidden by foliage. Here's the original:



Wouldn't it be nice to be able to detect that automatically? It is possible. Here's an image automatically generated by my code showing what was moved. All of the red was moved to the blue (or the other way around).



I was motivated to work on this program by greed (or at least my never-ending love of having a little flutter on things). Best of the Best runs spot-the-ball competitions in airports to win very expensive cars. But they also run the same competition online. That meant I could get my hands on the actual image used... could I process it to discover where the ball had been removed? (In reality, this isn't the right way to win because the actual ball position is not governed by where it actually was, but where a judge thinks it was).

Would it be cheating if I could? Apparently not, the competition rules say I should use my skill and judgment in determining the ball position. Surely, skill covers my programming ability.

So, I went looking for tampering algorithms and eventually came across Detection of Copy-Move Forgery in Digital Images written by Jessica Fridrich at SUNY Binghamton. The paper describes an algorithm for detecting just the sort of changes I thought I was looking for.

Unfortunately, I know nothing about image processing. Fortunately, the paper is written in a very clear style and a bit of Internet research enabled me to track down the knowledge I didn't have. (Also, thanks to Jessica for sending me the original images she used to test my implementation).

In brief the algorithm does the following:
  1. Slide a 16x16 block across the entire image from left hand corner to bottom right hand corner. For each 16x16 block perform a discrete cosine transform (DCT) on it and then quantize the 16x16 block using an expanded version of the standard JPEG quantization matrix.

  2. Each quantized DCT transformed block is stored in a matrix with one row per (x,y) position in the original image (the (x,y) being the upper left hand corner of the 16x16 block being examined).

  3. The resulting matrix is lexicographically sorted and then rows that match in the matrix are identified. For each pair of matching rows (x1,y1) and (x2,y2) the shift vector (x1-x2,y1-y2) (normalized by swapping if necessary so that the first value is +ve) is computed and for each shift vector a count is kept of the number of times it is seen.

  4. Finally the shift vectors with a count > some threshold are examined, the corresponding pair of positions in the image are found and the 16x16 blocks they represent are highlighted.

Here's another picture showing a golfing image that's been touched up to remove something from the grass:






To get access to image data I used the FreeImage library and wrote a small C program that implements Jessica's algorithm. You can download the source here; it's released to you under the GNU GPL.

The program has two key parameters that affect how the image is processed: the quality factor and the threshold.

The quality factor is a number used to 'blur' the image (actually it changes the quantization): the higher the factor the more blurring and hence more 16x16 blocks are likely to seem the same to the algorithm. Increasing the quality factor will tend to increase the false matches.

The threshold is simply the number of blocks that have to appear to have been copied together. This prevents us from seeing a single 16x16 block as evidence of copying. Increasing the threshold means ever larger groups of blocks have to be identified together before they are identified as copying.

Back at Best of the Best I grabbed the image for Supercar Competition (SC-272), cut out a section that I thought the ball had to be in (just to speed up processing) and ran the algorithm. After some parameter tweaking the algorithm came up only with what look like false matches to me (along the bar where it's all one color):



And, of course, that's not where the judge thought the ball was. So, I guess I won't be driving home in the V8 Vantage, but what geek needs that when they've got a cool piece of software that detects copy/move forgery in images?

Which leaves me with one question: how are spot-the-ball images generated? Is this an algorithm problem, a problem because they use JPG (which is already transformed) for their images, or are these images generated in some other way?

75 comments:

Geoff said...

John,

Thanks for an entertaining article. I love image processing code, but I work mostly in c#. Would you mind sharing the steps (or command line) that you use to compile?

John Graham-Cumming said...

Here's the Makefile:

copymove: LDLIBS += -lfreeimage -lstdc++
copymove: copymove.o

Geoff said...

Thanks John. I am having trouble compiling. Any ideas?
gcc gives me:
cm.cpp: In function `int main(int, char**)':
cm.cpp:162: error: invalid conversion from `void*' to `BYTE*'
cm.cpp:250: error: invalid conversion from `void*' to `int*'
cm.cpp:253: error: invalid conversion from `void*' to `position*'
cm.cpp:364: error: invalid conversion from `void*' to `int (*)(const void*, const void*)'
cm.cpp:364: error: initializing argument 4 of `void qsort(void*, size_t, size_t, int (*)(const void*, const void*))'
cm.cpp:371: error: invalid conversion from `void*' to `int*'

Darren said...

When you load the image file with fopen, you should be using "rb" for a binary file instead of "r". It doesn't make much difference under Linux but on OSes which use CR/LF pairs, the load will fail.

jjwiseman said...

What quality and threshold settings did you use for your example images?

Darren said...

With a little bit of work, I've got it compiling on MSVC++ 2003, and it works great.

The inner loop during "Building DCT transformed matrix" is a bit slow. I was able to get a huge speedup by precalculating all the pixel values for the image beforehand (i.e. the values of variable "pixel"), thus eliminating a lot of calls to FreeImage_GetScanLine() and round(), which are quite expensive.

Altogether it's a very effective algorithm! I've been very impressed by the results.

For the record, the main changes required for MSVC are casting the void* values to the appropriate pointer types, and also implementing round(x) as floor(x+0.5).

John Graham-Cumming said...

@geoff: see darren's comments about getting it working with MSVC.

@darren: I wondered about optimizing that but ran out of time to work further on the code. Glad to know you've improved it.

Geoff said...

The changes Darrin suggested provide an error-free compile on MSVC 9.

The algorithm works well. My wife is a wedding photographer and she's always retouching blemishes in the images. I'm going through her portfolio looking for evidence.

Darrin, any chance you'd be willing to post your speedup code?

John Graham-Cumming said...

@geoff. If you find anything cool then let people know!

Gregory said...

I'm trying to implement your code in Python. I'm having trouble understanding how you're doing the DCT. Any advice?

I'm also not sure what the purpose of the standard JPEG chrominance quantization matrix is for?

Gregory said...

Well I tried implementing it in Python. I got to the point of calculating the DCT stuff for each block, but it seems to run forever. (Perhaps that's why you did it in C?)

Here's the half finished code if anyone is interested in picking up the torch.

jjwiseman said...

I've posted the code with Darren's speedup, and tried it out on some of the altered photos by ex-Reuters photographer Adnan Hajj.

John Graham-Cumming said...

I love the posting in the previous comment about the Reuters photos. Very neat!

Hypermechanic said...

Do you have a windows compiled version?

redeye said...

Wow, I love this. Would it be possible to have this running server side with an interface on the web for users to check photos?

We could do something similar to the guy who set up that site that showed you who updated Wikipedia...

This would scare a lot media companies... :)

John Graham-Cumming said...

Sorry Hypermechanic I don't have a Windows compiled version, but others have mentioned building this on Windows.

JL said...

"...changes required for MSVC are casting the void* values to the appropriate pointer types..."

I'm more photographer than C programmer - could someone expand a bit on this (or better yet explicitly list the changes required)?

Lars said...

I'm guessing they create the spot-the-ball photos by taking two photos, offset in time, with a static camera. Then a careful copy - paste with a soft edge from one image to the other to remove the ball. That wouldn't show up as a clone operation.

Recent versions of Photoshop have built-in tools to automatically merge multiple images. You load up a series of images and mask / mark areas to keep or remove in each, and it blends it all into one "seamless" image. It's a handy trick to get a clean image of something when you're in a crowd and people keep getting in your shot, for example.

Here's a free tool from MS to do it: http://research.microsoft.com/projects/GroupShot/
or http://www.snapmania.com/info/en/trm/index.html

FayedYousry said...

Darren You have said that you were able to run it on VS2003
I am using VS2005 over the dotnet framework 2.0/3.5 and this piece of code is driving me nuts.
I do confess that my C is very rusty i have migrated to c# 6 years go. i am facing problems with the "stat" function as it produces a compile time error stating that there is no overload that takes 2 parameters! whats crazy is that intellisence saais it needs to parameters so does MSDN. If it is not too much trouble help me out on this or better yet uplod the code you were able to run of MSVS.
thank you

Jonathan said...

I believe there's a very simple explanation - they didn't duplicate any part of that picture.

Your algorithm assumes that the image was made using the clone tool, rather than another image or any other technique.

Rajesh said...

i too was trying to implement the paper on matlab. could u help me to get those forged images on which i can work on.. that will be a great help to me.

Ghiayas said...

hello i am undergraduate student working on this paper for my final semester project. can you please give me Matlab code for this paper, that will help me to complete my engineering. will be waiting for for kind response.

John Graham-Cumming said...

I wrote the code in C. I do not have a MATLAB version.

Ghiayas said...

can you help me on how to convert c code into matlab code,

Ghiayas said...

can you give me the reference images on which you hace tested youe c code that will help to varify the results,
regards , ghiayas

John Graham-Cumming said...

Sorry, can't help. I know nothing about MATLAB.

Ghiayas said...

ok thank you for your kind response , can you please tell me how to get the images on which you have tested your c code, i mean the images which fridrich has gives to you ,kindly if you can mail them to me at
[email protected]

Ghiayas said...

thank you for your response i got the images, need to know one last thing
is your c code exactly follows the paper of fridrich? because i will try to translate your c code into matlab.

John Graham-Cumming said...

It was intended as a faithful reproduction of the algorithm in her paper.

Ghiayas said...

you have implemented the complete paper or just the robust match approach?

John Graham-Cumming said...

I implemented the Robust Match. The Exact Match didn't seem like it was worth the effort.

Ghiayas said...

thank you for your kind response, you have done a good job. i will be working on to translate your c code into matlab, if a find difficulty in understanding your code i will seek your guidance, thank you
regards , ghiayas

Ghiayas said...

what was the run time for your c code?

Ghiayas said...

hello
i have to run your code in C++ but its giving a lot of errors , can you guide me how to run your code.

Ghiayas said...

these are the errors i am getting in running your code on C++

lcc preprocessor error: D:\Program Files\MATLAB\R2008a\bin\copymove.c:66 Could not find include file "FreeImage.h"
Error D:\Program Files\MATLAB\R2008a\bin\copymove.c: 145 undeclared identifier `TRUE'
Error D:\Program Files\MATLAB\R2008a\bin\copymove.c: 155 illegal statement termination
Error D:\Program Files\MATLAB\R2008a\bin\copymove.c: 155 skipping `struct'
Error D:\Program Files\MATLAB\R2008a\bin\copymove.c: 155 syntax error; found `file_stat' expecting `;'
Error D:\Program Files\MATLAB\R2008a\bin\copymove.c: 155 undeclared identifier `file_stat'
Warning D:\Program Files\MATLAB\R2008a\bin\copymove.c: 155 Statement has no effect
Warning D:\Program Files\MATLAB\R2008a\bin\copymove.c: 156 assignment of pointer to struct stat to pointer to int
Error D:\Program Files\MATLAB\R2008a\bin\copymove.c: 157 illegal statement termination
Error D:\Program Files\MATLAB\R2008a\bin\copymove.c: 157 skipping `int'
Error D:\Program Files\MATLAB\R2008a\bin\copymove.c: 157 undeclared identifier `file_size'
Error D:\Program Files\MATLAB\R2008a\bin\copymove.c: 157 left operand of . has incompatible type `int'
Error D:\Program Files\MATLAB\R2008a\bin\copymove.c: 159 illegal use of type name `FILE'
Error D:\Program Files\MATLAB\R2008a\bin\copymove.c: 159 undeclared identifier `handle'
Error D:\Program Files\MATLAB\R2008a\bin\copymove.c: 159 operands of * have illegal types `struct _iobuf' and `int'
Error D:\Program Files\MATLAB\R2008a\bin\copymove.c: 159 operands of = have illegal types `int' and `pointer to struct _iobuf'
Error D:\Program Files\MATLAB\R2008a\bin\copymove.c: 159 lvalue required
Error D:\Program Files\MATLAB\R2008a\bin\copymove.c: 161 undeclared identifier `BYTE'
Error D:\Program Files\MATLAB\R2008a\bin\copymove.c: 161 undeclared identifier `bytes'
Error D:\Program Files\MATLAB\R2008a\bin\copymove.c: 161 operands of = have illegal types `int' and `pointer to void'
Error D:\Program Files\MATLAB\R2008a\bin\copymove.c: 161 lvalue required
Error D:\Program Files\MATLAB\R2008a\bin\copymove.c: 162 illegal statement termination
Error D:\Program Files\MATLAB\R2008a\bin\copymove.c: 162 too many errors

Ghiayas said...

can you please help me on how to run your code?

John Graham-Cumming said...

The first error you are getting is "lcc preprocessor error: copymove.c:66 Could not find include file "FreeImage.h"

This is indicating that it can't find the FreeImage header file. As indicated in the source code FreeImage must be installed:

// Uses the FreeImage library (http://freeimage.sf.net/) to access
// image data

Ghiayas said...

hello , i have implemented this algorithm in matlab till the point where lexicographical sorting is done. can you please explain the next point in more detail , where rows of matrix A(lexicographically sorted matrix) are compared and shift vector counter was used , i can not understand that point.

John Graham-Cumming said...

I wish I had the time to help you with this, but I have too much else to do. Good luck.

The best I can offer is that you examine my code to see how it operates.

Bill Smith said...

What a neat tool. Next step: something that crawls the web (or maybe just news sites) looking for touched-up photos.

Ultimate Privacy said...

Wow, thats way cool dude.

Lou
www.anon-surfing.at.tc

pradeep kumar said...

Hi John,
Thanks for the method on detecting manipulated images(copy-paste).I have a question regarding on lexicographically sorted.can you help me about lexicographically sorted.is it possible to give me an example?.so, i can implement in matlab.
Thanks & Regards,
prd johnny

pradeep kumar said...

Hi John,
Thanks for the method on detecting manipulated images(copy-paste).I have a question regarding on lexicographically sorted.can you help me about lexicographically sorted.is it possible to give me an example?.so, i can implement in matlab.
Thanks & Regards,
prd johnny

pradeep kumar said...

Hi John,
Thanks for the method on detecting manipulated images(copy-paste).I have a question regarding on lexicographically sorted.can you help me about lexicographically sorted.is it possible to give me an example?.so, i can implement in matlab.
Thanks & Regards,
prd johnny

John Graham-Cumming said...

Pradeep:

The sort is done using qsort:

qsort( &index[0], w16 * h16,
sizeof( struct position ), (void*)&compare );

The compare function is as follows:

int compare( struct position * a, struct position * b )
{
int * m_a = &matrix[ a->i * 16 * 16 ];
int * m_b = &matrix[ b->i * 16 * 16 ];

int i;

for ( i = 0; i < 16 * 16; ++i ) {
if ( m_a[i] < m_b[i] ) {
return -1;
}
if ( m_a[i] > m_b[i] ) {
return 1;
}
}

return 0;
}

Ghiayas said...

can you please tell me how much time youe code take to complete the results. thanx

Wedding Reporter said...

Very interesting post - I did nt even know such tool existed!

helium said...

I used your program here:
https://sites.google.com/site/elsamuko/forensics/clone-detection
Thanks for sharing.

John Graham-Cumming said...

Awesome!

tante_gugl said...

Hi!

For what it's worth:

I converted your code to a MS Visual Studio 2008 solution and removed the warnings.

It compiles flawlessly now but still seems to have a few problems with
a) larger images [int matrix_size = w16 * h16 * 16 * 16 * sizeof( int ) might be a real issue for everything larger than 1024x768px] and
b) strange encodings and file formats (i. e. .png or .jpg having YCbCr 2x2 1x1 1x1).

I also wasn't able to detect an obvious clone operation with just the standard parameters.

The archive can be downloaded at

http://rapidshare.com/files/424645143/copymove.zip

- so if you like, play around with it, embed improvements and repost.

Good luck!

Volker

helium said...

FYI: I put your code into a GIMP plugin and added multithreading. Also I replaced some of the C arrays with C++ vectors, which makes it more stable handling bigger images.
Same URL:
https://sites.google.com/site/elsamuko/forensics/clone-detection

John Graham-Cumming said...

Thanks helium.

rain said...

Hello, thanks or your detailed explanation, i have run your code . would you please send me the images fridrich sent for you? I really need those images to test the code.

[email protected]

Thank you

Robert said...

One possibility is that they took a larger picture, cropped it, and covered up the ball with a piece from the cropped-out section. Then the algorithm wouldn't work.

langthangtrenmang said...

Hi, thanks for your great tools. Actually several papers are based on this concept, they just change the features to change the approach (DCT, wavelet coefficient, SVD,...)

However I ran your code, the output are 2 .png images and they're just images with 1 red strip at the top and 1 blue strip at the bottom. The rest are just black. I couldn't get the result like the given image here. What does it mean? I try to read the source code but maybe it's related to some routine of the 3rd party lib.

ganesan said...

Please give me tips for how to convert this code to MATLAB code...
please give details explanation...............

ganesan said...

please explain the following code....


for ( u = 0; u < 16; ++u ) {
for ( v = 0; v < 16; ++v ) {
for ( j = 0; j < 16; ++j ) {
BYTE * bits = FreeImage_GetScanLine( color, j + d );
bits += bpp * a;
for ( i = 0; i < 16; ++i ) {
double pixel = (double)bits[FI_RGBA_RED] * 0.299
+ (double)bits[FI_RGBA_GREEN] * 0.587
+ (double)bits[FI_RGBA_BLUE] * 0.114;
pixel -= 128;
pixel = round(pixel);

dct[u][v] += pixel * pre[u][v][i][j];
bits += bpp;
}
}
}
}


and

if ( compare( &index[i], &index[i+1] ) == 0 ) {
int sx = index[i].x - index[i+1].x;
int sy = index[i].y - index[i+1].y;

if ( sx < 0 ) {
sx = -sx;
sy = -sy;
}

sy += h;

++shift[sy * w + sx];
}
}

ganesan said...

hi,Aston Martin please explain above code....please....

Jay said...

Hi
I want to get java code of Detecting copy-paste forgery of JPEG image.or some basic idea to get solution of this problem.

Jay said...

Hi
I am working on project(Detecting copy-paste forgery of JPEG image).So which method or algorithm is best for this detecting forgery and also i want to get code in java/c/c++.

please help me!!

John Graham-Cumming said...

@Jay The C source code is linked from the blog post. I know very little about Java coding and cannot help you convert it.

Jay said...

Jay

if you have matlab code for detection copy paste forgery of JPEG image then plz send me..

Yasir Javed Kiani said...

please if possible can you send the images to verify the code at [email protected]

I am using the images from website , but your code is not detecting any clone. the input image and cloned image are same, if there is something missing , please also guide.

Yasir Javed Kiani said...

please if possible can you send the images to verify the code at [email protected]

I am using the images from website , but your code is not detecting any clone. the input image and cloned image are same, if there is something missing , please also guide.

shivaprasad k said...

can i get matlab code for copy-move forgery detection by DCT....Please any one help me...
Thanking u

shivaprasad k said...

can i get MATLAB code for copy-move forgery detection by using DCT....
Please any 1 help me....
Thanking u...
[email protected]

aman kaur said...

Can any one help me by providing this code in MATLAB. i have been implementing this since so many days and but now i think i need some help. Please its really urgent.

aman kaur said...

Can any 1 help provide me this code in MATLAB. i have been trying to implement it since so many days and now i think i need some help. Please its really urgent my Thesis is on stake :(

joshna rence said...

hey pls send me the code for digital image forgeries using key point matching

poonam said...

pls...send me a matlab code for cloning image detection, id- [email protected]

Harpreet Kaur said...

Plz help me to detect the image forgery on matlab platform using any block based method....

[email protected]

thnks in advance..

mansi soni said...

Do u implement this code in matlab? I have find some problem to run it in matlab.Can u help me?

mansi soni said...

Do u implement this code in matlab? I have find some problem to run it in matlab.Can u help me?

Stefan Froelich said...

What are you guys studying over in India?

me said...

Pointless exercise since (as John himself admitted) the untampered ball position is irrelevant in this competition.