Information
The second function is called decode and this is the prototype:void encode(FILE *infile, FILE *outfile);
Looking at the names of the functions, you can probably guess what they do. One function encodes some file's data, and the other function decodes the data. There are no return values from the functions. We will assume that if the files can be opened (by the driver), then you should be able to encode/decode them successfully.void decode(FILE *infile, FILE *outfile);
The encoding/decoding algorithm that you are going to implement is a lossless data compression algorithm known as the RLE algorithm, or Run-length encoding algorithm. It's a simple algorithm that even beginning programmers can implement. In a nutshell, a data compression algorithm (like those used to create .zip or .rar files) simply removes redundancy from a file. This will generally make the file smaller (sometimes, much smaller) than the original file.
As an example, suppose that you had a file (A) that contained 100 characters and they were all the letter 'A' (ASCII 65). That file would be 100 bytes in size and look like this:
Looking at it with dumpit, it looks like this:AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
However, because there are so many consecutive letters that are the same, we can encode that information in fewer bytes. So, instead of writing 100 copies of 'A' to the file, we will write one letter 'A', and then follow it with a count. The count is the actual number of letter A's that were in the original file. So, our "encoded" file (A.rle) will have only two bytes in it (shown in hexadecimal):A: 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F -------------------------------------------------------------------------- 000000 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAA 000010 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAA 000020 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAA 000030 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAA 000040 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAA 000050 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAA 000060 41 41 41 41 AAAA
Hex 41 is decimal 65 (the letter 'A') and hex 64 is decimal 100, the number of times to repeat the character. Looking at the binary file with dumpit:41 64
That's a HUGE difference! The original file required 100 bytes on the disk, but the compressed file only required 2 bytes! ("Welcome to the wonderful world of data compression!")A.rle: 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F -------------------------------------------------------------------------- 000000 41 64 Ad
Of course, at some point we will have to "decode" the file and expand it back to the original size and contents. That's what the decode function will do. It will take as input a compressed file containing just those 2 bytes, and create a new file that has 100 characters in it, all of them are the letter 'A':
Pretty cool! Of course, if the file doesn't contain a lot of redundant data, the RLE algorithm will actually make the file larger, up to twice the size. But, this can happen with other compression algorithms, although to a lesser degree. As an example, suppose our file (ABCDE.txt) contains just the letters ABCDE (5 bytes):AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Encoding this file to ABCDE.rle causes it to be twice as large (10 bytes):ABCDE.txt: 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F -------------------------------------------------------------------------- 000000 41 42 43 44 45 ABCDE
So, the more redundancy in the file, the better the compression. Again, this is true for all compression algorithms.ABCDE.rle: 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F -------------------------------------------------------------------------- 000000 41 01 42 01 43 01 44 01 45 01 A.B.C.D.E.
Here is a driver file: HTML Text
The name of your implementation file should be encoder.c and the command to compile it will look like this:
gcc -O -Werror -Wall -Wextra -ansi -pedantic main.c encoder.c -o encoder
To run the program and encode a file named A300.txt into a file named A300.rle:
To decode the encoded file:encoder A300.txt A300.rle
encoder -d A300.rle myA300.txt
Simple test files to get you started: files.zip There are two sets of files. Files starting with in are input files, and files starting with out are the encoded output files. You'll see that for each input file, there is a corresponding encoded file. Start with the smaller ones first.
After you've successfully tested your code with the simple files above, try to encode these bigger files:
Input file | Output file (encoded) | Hex output (dumpit) |
---|---|---|
A300.txt | A300.rle | A300-hex.txt |
icon1.bmp | icon1.rle | icon1-hex.txt |
smiley-bw.bmp | smiley-bw.rle | smiley-bw-hex.txt |
black.bmp | black.rle | black-hex.txt |
dude.jpg | dude.rle | dude-hex.txt |
All of the big files in one zip: big-files.zip
Approximate number of lines of code for encode: 30
Approximate number of lines of code for decode: 8
Notes
A300.rle: 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F -------------------------------------------------------------------------- 000000 41 FF 41 2D A.A-
which isn't very helpful. See then next note.file1.rle file2.rle differ: byte 492, line 2
and you will be able to see which bytes are different using diff. Hint: use a visual diff tool.dumpit file1.rle > file1-hex.txt dumpit file2.rle > file2-hex.txt diff file1-hex.txt file2-hex.txt