DFI News


Find Incriminating Contraband in Images

By Chet Hosmer Article Posted: June 19, 2009

Printer Friendly Forward to a Friend Share this
Chet Hosmer

Today criminals covertly communicate, secretly exchange information, and conceal incriminating contraband (i.e. child pornography) using a variety of methods. Forensic investigators must keep abreast of the latest techniques in order to thoroughly investigate the seen and unseen.

The purpose of steganography is not to simply keep information private, but rather to hide the mere existence of such information or communications while also keeping them private. If you hope to detect this hidden data, it is important to understand the types of carrier files that exist and how they are used.

True Color Image (example using a 24 Bit BMP File)
True color images represent the simplest carrier type to both explain and to hide information in. A true color image is made up of rows and columns of pixels, where each of these pixels contains 3 color values that define the color of the pixel. R=Red, B=Blue, and G=Green. In 24 bit true color images the intensity of each RGB triplet contains 3 bytes that correlate to the intensity of each color—1 byte for red, green, and blue. The value then can have values ranging from 00 = No Intensity to FF (255) = High Intensity. Figure xx depicts the range of values for a RED only pixel.

range values for red pixel

The combination of R,G, and B values determines the actual color of the pixel producing:

28 x 28 x 28= 16,777,216 colors per pixel

In true color images the steganography is applied by altering the least significant bit or LSB (in more aggressive methods bits) of each color to encode the hidden information. Using this method has no effect on the size of the image as bits are merely altered and not added or replaced. Furthermore using this method one can calculate the maximum payload size in 8 bit bytes that a true color image can hold (assuming alteration of all LSB’s). The formula is:

(Pixels x 3) / 8

For a 1024 x 768 pixel image the result is:

(1024 x 768 x 3) / 8 = 294,912 bytes of hidden space

| | |__ 3 values per pixel
| |__ Rows
|__ Columns

The actual amount of data storage can be increased if the original payload to be stored is compressed.

There are many examples of steganography programs that perform this type of embedding, and the variations have different features. For example:

  • Automatic payload compression
  • Encryption options – algorithm / key length selection
  • Payload randomization – Most modern programs randomize the storage of the payload throughout the image in order to avoid creating a linear embedding pattern
  • Isolation of unsuitable image areas – certain areas of an image are less suitable for steganography embedding. These are typically low intensity solid background areas of the images that make detection of such hiding easier, or would create distortion that might be humanly visible.

The embedding process relies on a substitution algorithm that takes information from the payload (most commonly a file) and breaks each byte of the file down into individual bits and then substitutes the LSB of the carrier with the LSB of the payload. Since a direct substitution is made, the probability of the original carrier bit and the desired payload bit being the same are effectively 50/50. This is aided by the fact that virtually all steganography programs compress, and then encrypt the payload prior to embedding (producing random data). Thus the 50/50 heuristic is quite accurate. This means when embedding steganography using LSB substitution, one only needs to modify or alter 50% of the bits, thus reducing the impact on the image.

payload in carrier image

JPEG Images
Joint Photographic Experts Group (JPEG) is the most common form of compressed photographic images. Unique to JPEG is the retention of high quality visual characteristics combined with dramatic file size reduction. A simplistic JPEG process is depicted below. Note that what is important for this article is not JPEG itself, but rather how steganography is applied to JPEG images and why.

most common insertion point for illicit contnet

In JPEG environments steganography insertion is made after the Lossy stage (compresses data in such a way that the resulting decompression will produce an approximate representation of the data that is “good enough” for many applications) and prior to the Lossless stage of compression (compresses data in such a way that the resulting decompression produces the original). If you attempt to hide information before the Lossy stage, chances are that critical information will be destroyed or lost during the Lossy stage. Since most steganography applications compress and then encrypt the payload prior to insertion, the loss of even a single bit of data would have direct impact on the integrity and usability of the hidden information.

The most common method of applying steganography to a JPEG is to make modifications to the values stored in the quantized DCT, (discrete cosine transform) matrix. After DCT processing has been completed, the final step in the Lossy stage is to “Quantize the DCT” a fancy phrase meaning to divide the DCT values such that the result produces a large number of zero values that have little effect in most cases on recognizable image quality. These zero values will be compressed out using run length encoding during the Lossless or Huffman (entropy encoding algorithm) stage. The non-zero values remain, providing an opportunity to make small (+1 or -1) variations to the quantized values which if performed sparingly also create only slight alterations in the visual image (in other words not perceptible to humans). Reversing the processes during decompression, the steganography program can recover the hidden information without access to the original image. The histogram below (Figure 1) depicts the quantized values of both before and after steganography is applied. Notice that the histograms look similar. However, by examining the peak values we see a large variation. In figure 2 we narrow in on the peak areas and can visually see the modification of the quantized values. Notice that in the original image (histogram on the left) 874 occurrences of the value 169 were present, yet in the stego’d image (histogram on the left) only 610 occurrences of the value 169 are present. More importantly notice the changes of the adjacent value +1 and -1 or 170 and 168 respectively; these represent some of the modified values that occurred during the stego process.

before and after histograms of Quantizised DCT values

Peak Quantizied DCT values
 

Chet Hosmer, is the co-founder and Chief Scientist at WetStone. His research into advanced forms of steganography spans over a decade. Chet can be reached via e-mail chet@wetstonetech.com or send Chet a tweet @ChetHosmer.

Related Topics: