This is the story of my attempt to 'cheat' in an on-line spot-the-ball competition to win an Aston Martin. It's also the story of my failure, but you get free source code that implements automatic detection of image alteration using copy/paste or tools like the Clone Tool in Photoshop.
First, take a look at this photo:

Notice anything strange? In fact this image has been tampered with to cover up a truck. The truck is completely hidden by foliage. Here's the original:

Wouldn't it be nice to be able to detect that automatically? It is possible. Here's an image automatically generated by my code showing what was moved. All of the red was moved to the blue (or the other way around).

I was motivated to work on this program by greed (or at least my never-ending love of having a little flutter on things). Best of the Best runs spot-the-ball competitions in airports to win very expensive cars. But they also run the same competition online. That meant I could get my hands on the actual image used... could I process it to discover where the ball had been removed? (In reality, this isn't the right way to win because the actual ball position is not governed by where it actually was, but where a judge thinks it was).
Would it be cheating if I could? Apparently not, the competition rules say I should use my skill and judgment in determining the ball position. Surely, skill covers my programming ability.
So, I went looking for tampering algorithms and eventually came across Detection of Copy-Move Forgery in Digital Images written by Jessica Fridrich at SUNY Binghamton. The paper describes an algorithm for detecting just the sort of changes I thought I was looking for.
Unfortunately, I know nothing about image processing. Fortunately, the paper is written in a very clear style and a bit of Internet research enabled me to track down the knowledge I didn't have. (Also, thanks to Jessica for sending me the original images she used to test my implementation).
In brief the algorithm does the following:
Here's another picture showing a golfing image that's been touched up to remove something from the grass:
To get access to image data I used the FreeImage library and wrote a small C program that implements Jessica's algorithm. You can download the source here; it's released to you under the GNU GPL.
The program has two key parameters that affect how the image is processed: the quality factor and the threshold.
The quality factor is a number used to 'blur' the image (actually it changes the quantization): the higher the factor the more blurring and hence more 16x16 blocks are likely to seem the same to the algorithm. Increasing the quality factor will tend to increase the false matches.
The threshold is simply the number of blocks that have to appear to have been copied together. This prevents us from seeing a single 16x16 block as evidence of copying. Increasing the threshold means ever larger groups of blocks have to be identified together before they are identified as copying.
Back at Best of the Best I grabbed the image for Supercar Competition (SC-272), cut out a section that I thought the ball had to be in (just to speed up processing) and ran the algorithm. After some parameter tweaking the algorithm came up only with what look like false matches to me (along the bar where it's all one color):

And, of course, that's not where the judge thought the ball was. So, I guess I won't be driving home in the V8 Vantage, but what geek needs that when they've got a cool piece of software that detects copy/move forgery in images?
Which leaves me with one question: how are spot-the-ball images generated? Is this an algorithm problem, a problem because they use JPG (which is already transformed) for their images, or are these images generated in some other way?
First, take a look at this photo:

Notice anything strange? In fact this image has been tampered with to cover up a truck. The truck is completely hidden by foliage. Here's the original:

Wouldn't it be nice to be able to detect that automatically? It is possible. Here's an image automatically generated by my code showing what was moved. All of the red was moved to the blue (or the other way around).

I was motivated to work on this program by greed (or at least my never-ending love of having a little flutter on things). Best of the Best runs spot-the-ball competitions in airports to win very expensive cars. But they also run the same competition online. That meant I could get my hands on the actual image used... could I process it to discover where the ball had been removed? (In reality, this isn't the right way to win because the actual ball position is not governed by where it actually was, but where a judge thinks it was).
Would it be cheating if I could? Apparently not, the competition rules say I should use my skill and judgment in determining the ball position. Surely, skill covers my programming ability.
So, I went looking for tampering algorithms and eventually came across Detection of Copy-Move Forgery in Digital Images written by Jessica Fridrich at SUNY Binghamton. The paper describes an algorithm for detecting just the sort of changes I thought I was looking for.
Unfortunately, I know nothing about image processing. Fortunately, the paper is written in a very clear style and a bit of Internet research enabled me to track down the knowledge I didn't have. (Also, thanks to Jessica for sending me the original images she used to test my implementation).
In brief the algorithm does the following:
- Slide a 16x16 block across the entire image from left hand corner to bottom right hand corner. For each 16x16 block perform a discrete cosine transform (DCT) on it and then quantize the 16x16 block using an expanded version of the standard JPEG quantization matrix.
- Each quantized DCT transformed block is stored in a matrix with one row per (x,y) position in the original image (the (x,y) being the upper left hand corner of the 16x16 block being examined).
- The resulting matrix is lexicographically sorted and then rows that match in the matrix are identified. For each pair of matching rows (x1,y1) and (x2,y2) the shift vector (x1-x2,y1-y2) (normalized by swapping if necessary so that the first value is +ve) is computed and for each shift vector a count is kept of the number of times it is seen.
- Finally the shift vectors with a count > some threshold are examined, the corresponding pair of positions in the image are found and the 16x16 blocks they represent are highlighted.
Here's another picture showing a golfing image that's been touched up to remove something from the grass:
![]() | ![]() |
To get access to image data I used the FreeImage library and wrote a small C program that implements Jessica's algorithm. You can download the source here; it's released to you under the GNU GPL.
The program has two key parameters that affect how the image is processed: the quality factor and the threshold.
The quality factor is a number used to 'blur' the image (actually it changes the quantization): the higher the factor the more blurring and hence more 16x16 blocks are likely to seem the same to the algorithm. Increasing the quality factor will tend to increase the false matches.
The threshold is simply the number of blocks that have to appear to have been copied together. This prevents us from seeing a single 16x16 block as evidence of copying. Increasing the threshold means ever larger groups of blocks have to be identified together before they are identified as copying.
Back at Best of the Best I grabbed the image for Supercar Competition (SC-272), cut out a section that I thought the ball had to be in (just to speed up processing) and ran the algorithm. After some parameter tweaking the algorithm came up only with what look like false matches to me (along the bar where it's all one color):

And, of course, that's not where the judge thought the ball was. So, I guess I won't be driving home in the V8 Vantage, but what geek needs that when they've got a cool piece of software that detects copy/move forgery in images?
Which leaves me with one question: how are spot-the-ball images generated? Is this an algorithm problem, a problem because they use JPG (which is already transformed) for their images, or are these images generated in some other way?
Comments
Thanks for an entertaining article. I love image processing code, but I work mostly in c#. Would you mind sharing the steps (or command line) that you use to compile?
copymove: LDLIBS += -lfreeimage -lstdc++
copymove: copymove.o
gcc gives me:
cm.cpp: In function `int main(int, char**)':
cm.cpp:162: error: invalid conversion from `void*' to `BYTE*'
cm.cpp:250: error: invalid conversion from `void*' to `int*'
cm.cpp:253: error: invalid conversion from `void*' to `position*'
cm.cpp:364: error: invalid conversion from `void*' to `int (*)(const void*, const void*)'
cm.cpp:364: error: initializing argument 4 of `void qsort(void*, size_t, size_t, int (*)(const void*, const void*))'
cm.cpp:371: error: invalid conversion from `void*' to `int*'
The inner loop during "Building DCT transformed matrix" is a bit slow. I was able to get a huge speedup by precalculating all the pixel values for the image beforehand (i.e. the values of variable "pixel"), thus eliminating a lot of calls to FreeImage_GetScanLine() and round(), which are quite expensive.
Altogether it's a very effective algorithm! I've been very impressed by the results.
For the record, the main changes required for MSVC are casting the void* values to the appropriate pointer types, and also implementing round(x) as floor(x+0.5).
@darren: I wondered about optimizing that but ran out of time to work further on the code. Glad to know you've improved it.
The algorithm works well. My wife is a wedding photographer and she's always retouching blemishes in the images. I'm going through her portfolio looking for evidence.
Darrin, any chance you'd be willing to post your speedup code?
I'm also not sure what the purpose of the standard JPEG chrominance quantization matrix is for?
Here's the half finished code if anyone is interested in picking up the torch.
We could do something similar to the guy who set up that site that showed you who updated Wikipedia...
This would scare a lot media companies... :)
I'm more photographer than C programmer - could someone expand a bit on this (or better yet explicitly list the changes required)?
Recent versions of Photoshop have built-in tools to automatically merge multiple images. You load up a series of images and mask / mark areas to keep or remove in each, and it blends it all into one "seamless" image. It's a handy trick to get a clean image of something when you're in a crowd and people keep getting in your shot, for example.
Here's a free tool from MS to do it: http://research.microsoft.com/projects/GroupShot/
or http://www.snapmania.com/info/en/trm/index.html
I am using VS2005 over the dotnet framework 2.0/3.5 and this piece of code is driving me nuts.
I do confess that my C is very rusty i have migrated to c# 6 years go. i am facing problems with the "stat" function as it produces a compile time error stating that there is no overload that takes 2 parameters! whats crazy is that intellisence saais it needs to parameters so does MSDN. If it is not too much trouble help me out on this or better yet uplod the code you were able to run of MSVS.
thank you
Your algorithm assumes that the image was made using the clone tool, rather than another image or any other technique.
regards , ghiayas
[email protected]
is your c code exactly follows the paper of fridrich? because i will try to translate your c code into matlab.
regards , ghiayas
i have to run your code in C++ but its giving a lot of errors , can you guide me how to run your code.
lcc preprocessor error: D:\Program Files\MATLAB\R2008a\bin\copymove.c:66 Could not find include file "FreeImage.h"
Error D:\Program Files\MATLAB\R2008a\bin\copymove.c: 145 undeclared identifier `TRUE'
Error D:\Program Files\MATLAB\R2008a\bin\copymove.c: 155 illegal statement termination
Error D:\Program Files\MATLAB\R2008a\bin\copymove.c: 155 skipping `struct'
Error D:\Program Files\MATLAB\R2008a\bin\copymove.c: 155 syntax error; found `file_stat' expecting `;'
Error D:\Program Files\MATLAB\R2008a\bin\copymove.c: 155 undeclared identifier `file_stat'
Warning D:\Program Files\MATLAB\R2008a\bin\copymove.c: 155 Statement has no effect
Warning D:\Program Files\MATLAB\R2008a\bin\copymove.c: 156 assignment of pointer to struct stat to pointer to int
Error D:\Program Files\MATLAB\R2008a\bin\copymove.c: 157 illegal statement termination
Error D:\Program Files\MATLAB\R2008a\bin\copymove.c: 157 skipping `int'
Error D:\Program Files\MATLAB\R2008a\bin\copymove.c: 157 undeclared identifier `file_size'
Error D:\Program Files\MATLAB\R2008a\bin\copymove.c: 157 left operand of . has incompatible type `int'
Error D:\Program Files\MATLAB\R2008a\bin\copymove.c: 159 illegal use of type name `FILE'
Error D:\Program Files\MATLAB\R2008a\bin\copymove.c: 159 undeclared identifier `handle'
Error D:\Program Files\MATLAB\R2008a\bin\copymove.c: 159 operands of * have illegal types `struct _iobuf' and `int'
Error D:\Program Files\MATLAB\R2008a\bin\copymove.c: 159 operands of = have illegal types `int' and `pointer to struct _iobuf'
Error D:\Program Files\MATLAB\R2008a\bin\copymove.c: 159 lvalue required
Error D:\Program Files\MATLAB\R2008a\bin\copymove.c: 161 undeclared identifier `BYTE'
Error D:\Program Files\MATLAB\R2008a\bin\copymove.c: 161 undeclared identifier `bytes'
Error D:\Program Files\MATLAB\R2008a\bin\copymove.c: 161 operands of = have illegal types `int' and `pointer to void'
Error D:\Program Files\MATLAB\R2008a\bin\copymove.c: 161 lvalue required
Error D:\Program Files\MATLAB\R2008a\bin\copymove.c: 162 illegal statement termination
Error D:\Program Files\MATLAB\R2008a\bin\copymove.c: 162 too many errors
This is indicating that it can't find the FreeImage header file. As indicated in the source code FreeImage must be installed:
// Uses the FreeImage library (http://freeimage.sf.net/) to access
// image data
The best I can offer is that you examine my code to see how it operates.
Lou
www.anon-surfing.at.tc
Thanks for the method on detecting manipulated images(copy-paste).I have a question regarding on lexicographically sorted.can you help me about lexicographically sorted.is it possible to give me an example?.so, i can implement in matlab.
Thanks & Regards,
prd johnny
Thanks for the method on detecting manipulated images(copy-paste).I have a question regarding on lexicographically sorted.can you help me about lexicographically sorted.is it possible to give me an example?.so, i can implement in matlab.
Thanks & Regards,
prd johnny
Thanks for the method on detecting manipulated images(copy-paste).I have a question regarding on lexicographically sorted.can you help me about lexicographically sorted.is it possible to give me an example?.so, i can implement in matlab.
Thanks & Regards,
prd johnny
The sort is done using qsort:
qsort( &index[0], w16 * h16,
sizeof( struct position ), (void*)&compare );
The compare function is as follows:
int compare( struct position * a, struct position * b )
{
int * m_a = &matrix[ a->i * 16 * 16 ];
int * m_b = &matrix[ b->i * 16 * 16 ];
int i;
for ( i = 0; i < 16 * 16; ++i ) {
if ( m_a[i] < m_b[i] ) {
return -1;
}
if ( m_a[i] > m_b[i] ) {
return 1;
}
}
return 0;
}
https://sites.google.com/site/elsamuko/forensics/clone-detection
Thanks for sharing.
For what it's worth:
I converted your code to a MS Visual Studio 2008 solution and removed the warnings.
It compiles flawlessly now but still seems to have a few problems with
a) larger images [int matrix_size = w16 * h16 * 16 * 16 * sizeof( int ) might be a real issue for everything larger than 1024x768px] and
b) strange encodings and file formats (i. e. .png or .jpg having YCbCr 2x2 1x1 1x1).
I also wasn't able to detect an obvious clone operation with just the standard parameters.
The archive can be downloaded at
http://rapidshare.com/files/424645143/copymove.zip
- so if you like, play around with it, embed improvements and repost.
Good luck!
Volker
Same URL:
https://sites.google.com/site/elsamuko/forensics/clone-detection
[email protected]
Thank you
However I ran your code, the output are 2 .png images and they're just images with 1 red strip at the top and 1 blue strip at the bottom. The rest are just black. I couldn't get the result like the given image here. What does it mean? I try to read the source code but maybe it's related to some routine of the 3rd party lib.
please give details explanation...............
for ( u = 0; u < 16; ++u ) {
for ( v = 0; v < 16; ++v ) {
for ( j = 0; j < 16; ++j ) {
BYTE * bits = FreeImage_GetScanLine( color, j + d );
bits += bpp * a;
for ( i = 0; i < 16; ++i ) {
double pixel = (double)bits[FI_RGBA_RED] * 0.299
+ (double)bits[FI_RGBA_GREEN] * 0.587
+ (double)bits[FI_RGBA_BLUE] * 0.114;
pixel -= 128;
pixel = round(pixel);
dct[u][v] += pixel * pre[u][v][i][j];
bits += bpp;
}
}
}
}
and
if ( compare( &index[i], &index[i+1] ) == 0 ) {
int sx = index[i].x - index[i+1].x;
int sy = index[i].y - index[i+1].y;
if ( sx < 0 ) {
sx = -sx;
sy = -sy;
}
sy += h;
++shift[sy * w + sx];
}
}
I want to get java code of Detecting copy-paste forgery of JPEG image.or some basic idea to get solution of this problem.
I am working on project(Detecting copy-paste forgery of JPEG image).So which method or algorithm is best for this detecting forgery and also i want to get code in java/c/c++.
please help me!!
if you have matlab code for detection copy paste forgery of JPEG image then plz send me..
I am using the images from website , but your code is not detecting any clone. the input image and cloned image are same, if there is something missing , please also guide.
I am using the images from website , but your code is not detecting any clone. the input image and cloned image are same, if there is something missing , please also guide.
Thanking u
Please any 1 help me....
Thanking u...
[email protected]
[email protected]
thnks in advance..
image. my mail id [email protected]
plzz. can you help my my project is in copy move forgery detection using dwt the steps are
first - Algorithm for Detection of Reference and Match Blocks:
1. Read the image selected by the user as input.
2. If the input image is not a gray scale image then
convert it into a gray scale image.
3. Apply wavelet transform up to specified level ‘L’ to
the gray image.
4. For each overlapping b × b block in the LLL image
4.1. Form a matrix A of dimension b2 columns and
(M-b+1) × (N-b+1) rows by extracting the
resulting pixel values by rows into a row of A.
4.2. Form another matrix B same as A with two
additional columns for storing top-left coordinates.
5. End
6. Ignore blocks where contrast is minimum.
7. Sort matrix A lexicographically.
8. For each row of A
8.1.Compute the phase correlation for the block
corresponding to the current row with the blocks
corresponding to ‘p’ rows above and below the
current row.
8.2. If the computed maximum phase correlation value
exceeds a preset threshold value‘t’, then store the
top left coordinates of the corresponding reference
block and the matching block from B matrix in a
new row of a matrix.
9. End
second: Algorithm for Comparison of Reference and Matching
Blocks:
1. For LLL-1 level in the image pyramid
1.1. For each row of the matrix
1.1.1. Form a reference region by padding ‘m’
pixels on all the sides of the b × b
reference block.
1.1.2. Form a matching region by padding ‘m’
pixels on all the sides of the b × b matching
block.
1.1.3. For each b × b overlapping of the reference
region.
1.1.3.1. Find corresponding match in
matching region based on Phase
correlation but search process has to
be opted for selected part of matching
region.
1.1.3.2. If the computed maximum phase
correlation value exceeds a preset
threshold value, then the top left
coordinates of the corresponding
reference block and the matching
block are stored in a new row of a
matrix.
1.2. End
2. End
3. For LLL-2 level to original image in the image pyramid
3.1. For each row of the matrix
3.1.1. Form a reference region by padding ‘m’
pixels on all the sides of the b × b reference
block.
3.1.2. Form a matching region by padding ‘m’
pixels on all the sides of the b × b matching
block.
3.1.3. Compare them using Phase Correlation.
3.1.4. If the computed maximum phase correlation
value exceeds a preset threshold value, then
store the top left coordinates of the
corresponding reference block and the
matching block in a new row of a matrix.
3.2. End
4. End
5. Plot the blocks as copied and pasted regions on the
given input image
i can do this step (Ignore blocks where contrast is minimum.)
We remove all blocks with low
contrast. We tell that a block has
low contrast if the difference between
its maximum intensity pixel and its
minimum intensity pixel is lower than
some predefined threshold T.
This helps to prevent noisy results, such
as 2 identical blue patches of the sky
how can I choose the optimal threshold T to preform this step plzz help me.