Friday, August 30, 2013

Encryption, Compression and Randomness - Part 2

In Part 1 we took a look at compression and encryption of text data with the aim of illustrating why compression should be done prior to encryption. A natural follow up is to try similar things on binary data and see what we find.



An uncompressed bitmap image depicting a simple repeated pattern is one of the simplest candidates that come to mind.  Image compression is an elaborate topic and there are lots of details including lossy compression (e.g. jpeg). But for our purpose, which is mostly to try out compression and encryption experiments on binary non-ASCII data, even gzip compression should be fine. The bitmap image would be treated simply as just another data bitstream by the gzip program.

We use the below jpeg image of a chessboard (chessboard.jpg) as our starting point.

chessboard.jpg

The first thing to do would be to convert it into an uncompressed bitmap image and we do it as below.
> convert chessboard.jpg chessboard.pbm

PBM is a black and white Portable Bit Map format where each bit represents a pixel with a 1 for white and a 0 for black.

Let us see if the conversion process did its work.

> identify chessboard.jpg 
chessboard.jpg JPEG 160x160 160x160+0+0 8-bit sRGB 2.94KB 0.000u 0:00.000
> identify chessboard.pbm
chessboard.pbm PBM 160x160 160x160+0+0 1-bit Bilevel Gray 3.21KB 0.000u 0:00.019

The original image was 160x160 and so is the converted pbm. It is shown to be 1-bit BiLevel Gray which means 1 bit per pixel, black(0) or white(1). The size is 3211 bytes.

> ls -log chessboard.pbm
-rw-r--r--  1 3211 Aug 23 04:08 chessboard.pbm

160x160 bits = 3200 bytes and the first 11 bytes are the header which can be seen by opening chessboard.pbm in a text editor or by running the following command.

> head -2 chessboard.pbm
P4
160 160

The 11 bytes in the header are for 'P','4','\n','1','6','0',' ','1','6','0','\n'.

chessboard.pbm can be viewed in an ImageMagick window using the display command. We do not include it inline here since browsers do not render pbm images.

> display chessboard.pbm &



You can even try to dump an ASCII-art type version of the image for fun.

> tail -1 chessboard.pbm | xxd -bits -cols 20 -g 0 | gsed 's/^.*\: //g' | gsed 's/\.*//g' > chessboard.txt

The sed patterns remove the leading line numbers and training character dumps on each line. We dump the bits in the file using xxd -bits and set the number of columns i.e. bytes in each line to 160/8=20 as it is a 160x160 pixel image. If you open chessboard.txt in a text editor like vi and disable wrapping (set nowrap), a section of it may look something like this.

Now we compress chessboard.pbm using gzip and and separately encrypt it using AES as in the last post. We use the same encryption password as we had used there.

> gzip -c chessboard.pbm > chessboard.pbm.gz
> openssl enc -aes-256-cbc -salt -in chessboard.pbm -out chessboard.pbm.enc
enter aes-256-cbc encryption password:
Verifying - enter aes-256-cbc encryption password:
> ls -log chessboard.pbm*
-rw-r--r--  1 3211 Aug 30 11:15 chessboard.pbm
-rw-r--r--  1   93 Aug 30 11:18 chessboard.pbm.gz
-rw-r--r--  1 3232 Aug 30 11:19 chessboard.pbm.enc

The encrypted file is bigger by a few bytes and the compressed file is only 93 bytes. Not surprising since there is a lot of repetition in the image.

Next come the image representations of the encrypted and compressed files. As before we determine n but no cropping is needed as the images are quite small.

> convert -size 160x160 mono:chessboard.pbm.enc chessboard.pbm.enc-mono.png
> convert -size 24x24 mono:chessboard.pbm.gz chessboard.pbm.gz-mono.png

The resulting images are shown below.

chessboard.pbm.enc-mono.png


chessboard.pbm.gz-mono.png

For the encrypted file, we do see a noisy visual and for the compressed file, despite the very small size of the image, we do see clear repeated patterns.

In order to get a better idea, we double the dimensions of the first image, making it 4 times in size. And we magnify the dimensions of the second image 5 times, making it 25 times its size.

> convert -size 160x160 mono:chessboard.pbm.enc -scale 320x320 chessboard.pbm.enc-mono-x2x2.png
> convert -size 24x24 mono:chessboard.pbm.gz -scale 120x120 chessboard.pbm.gz-mono-x5x5.png


chessboard.pbm.enc-mono-x2x2.png




chessboard.pbm.gz-mono-x5x5.png


The scaled up images show more clearly that the encryption produces a random noisy pattern while the gzip compression retains repeated patterns prominently.

So we find that what we saw for textual data in the previous post held for simple binary data too.

But before we stop, it would be good to try another interesting exercise. And that is to use AES ECB mode as opposed to the CBC mode which we have been using so far. ECB mode uses the same key for all input blocks unlike CBC mode which uses chaining so that the key for a block is actually generated using the encryption of the previous block. For the first block, a random IV is used for producing the key. More accurately, the key is an XOR of the input block and the previous block's ciphertext (or IV in case of the first block).

As a result, ECB leaks information about repetitive patterns in the input. Let us try to demonstrate it here.

> openssl enc -aes-256-ecb -nosalt -in chessboard.pbm -out chessboard.pbm.enc.aes-ecb
enter aes-256-ecb encryption password:
Verifying - enter aes-256-ecb encryption password:
> ls -l chessboard.pbm.enc.aes-ecb
-rw-r--r--  1 dev  dev  3216 Aug 30 12:35 chessboard.pbm.enc.aes-ecb
> convert -size 160x160 mono:chessboard.pbm.enc.aes-ecb chessboard.pbm.enc.aes-ecb-mono.png

chessboard.pbm.enc.aes-ecb-mono.png


As we can see above, the ECB mode encryption exposes much of the chessboard pattern of the original image.

No comments:

Post a Comment