Binary and Text IO in C
"Every program has at least one bug and can be shortened
by at least one instruction -- from which, by induction,
it is evident that every program can be reduced to one
instruction that does not work."
-- Anonymous
Basic Output
To use the I/O facilities of C, you need to include stdio.h.
The simplest facilities for unformatted output in C are putchar and puts:
int putchar( int c );
int puts( const char *string );
Both mechanisms output the character or string at the current cursor position.
Default open stream pointers:
- stdin - Input, usually the keyboard
- stdout - Output, usually the display
- stderr - Output, usually the display
- stdaux - from DOS/Win16 (obsolete)
- stdprn - from DOS/Win16 (obsolete)
They are part of the standard I/O library in stdio.h so you don't need to declare them:
The definition of a FILE used by Microsoft's compiler:
struct _iobuf {
char *_ptr;
int _cnt;
char *_base;
int _flag;
int _file;
int _charbuf;
int _bufsiz;
char *_tmpfname;
};
typedef struct _iobuf FILE;
_CRTIMP extern FILE _iob[];
#define stdin (&_iob[0])
#define stdout (&_iob[1])
#define stderr (&_iob[2])
You will probably never have to deal with the internal structure of a FILE and can just assume
that the standard I/O devices are declared like this:
FILE *stdin;
FILE *stdout;
FILE *stderr;
Functions/macros for unformatted output to stdout or a specified FILE pointer:
// These two output to stdout
int putchar( int c ); // macro
int puts( const char *string );
// You need to specify the stream with these
int putc( int c, FILE *stream ); // macro
int fputc( int c, FILE *stream );
int fputs( const char *string, FILE *stream );
Notes:
- puts writes a newline to the output instead of the NULL character at the end.
- fputs writes only the characters up to the NULL byte.
- The following lines produce the same results:
puts("This is a line of text");
fputs("This is a line of text\n", stdout);
- Usually, "normal" output is sent to stdout.
- Usually, error messages are sent to stderr.
- You can send some text to stdout, and some to stderr:
puts("1. This line goes to stdout");
fputs("2. This line goes to stdout\n", stdout);
fputs("3. This line goes to stderr\n", stderr);
- By default, both end up on the display, so it's hard to tell what is an error and what is "normal" output.
- The OS can redirect these streams to other destinations (by using the output redirection operators in
a console window).
- This allows you to change the destination of your output after building your program.
Redirecting stdout and stderr
Suppose we had a function that did this:
void ShowOutErr(void)
{
fputs("This is going to stdout\n", stdout);
fputs("This is going to stderr\n", stderr);
}
By default, running this program would produce this output on the console (display):
This is going to stdout
This is going to stderr
We can redirect these messages to a file by using the output redirection operator at the console. (Assume the
function above compiles to a program called myprog.exe.)
myprog > out.txt
When we run the program, we see this printed to the screen:
This is going to stderr
What happened to the line "This is going to stdout"? Well, it was redirected to a file named out.txt. If
we look at the contents of this file we see:
This is going to stdout
To redirect stderr, we need to do this:
myprog 2> err.txt
This produces the output:
This is going to stdout
The redirection also created a file named err.txt that contains the other line of text.
To redirect both, we do this:
myprog > out.txt 2> err.txt
which produces no output on the screen.
Both lines of text have been redirected to their respective files (out.txt and err.txt).
If we want both stdout and stderr redirected to the same file (both.txt), we would do this:
myprog > both.txt 2>&1
Notes:
- Using the output redirection operator '>' causes the file to be overwritten with the new text.
- If you want to append to a file, use the append operator '>>'. Appending to a non-existent file will create the file.
- Also, the output redirection operator '>' is the same as '1>'. The '1' is implied. These are
all are equivalent:
myprog > out.txt 2> err.txt
myprog 1> out.txt 2> err.txt
myprog 2> err.txt > out.txt
myprog 2> err.txt 1> out.txt
Basic Input
The simplest facilities for unformatted input in C are getchar and gets:
int getchar( void );
char *gets( char *buffer );
Notes:
- getchar returns an integer; need to detect End Of File (EOF)
- The buffer passed to gets must be large enough as there is no overflow checking.
This situation has been the primary source of hacks on the Internet. These errors are known as
buffer overruns and probably still exist in a lot of code out there.
Functions/macros for unformatted input from stdin or a specified FILE pointer:
// These two input from stdin
int getchar( void ); // macro
char *gets( char *buffer );
// You need to specify the stream with these
int getc( FILE *stream ); // macro
int fgetc( FILE *stream );
char *fgets( char *string, int n, FILE *stream );
Notes:
- gets reads all characters up to and including the newline but replaces
the newline with a NULL byte before returning.
- Be sure to account for the newline/NULL when sizing the buffer for gets.
- fgets reads n - 1 characters or until the newline is reached, whichever
is less. Unlike gets, the newline is not replaced with a NULL. A NULL byte
is inserted after the newline.
- Since fgets takes the number of characters to read as a parameter, you have
more control.
- You can also redirect the input to your program so that it comes from somewhere
other than the keyboard (e.g. a file). You use the input redirection operator at the
console '<'.
File Input/Output
- C provides a standard way of reading and writing data from files (streams).
- Typically, the C standard functions (or macros) take a FILE *, indicating the source/destination
of the data.
- Streams are treated as either text or binary. (translated vs. untranslated or cooked vs. raw)
- Text streams vary between systems due to the translations that occur during reading and writing.
- No translation is performed on binary streams.
Text Files
Text streams have certain attributes that may vary among different systems:
- The contents of the file are usually limited to characters in the range of ASCII 32 - 127,
although the extended characters (> 127) can be included in some implementations.
Characters < 32 are considered control characters and are usually non-printable.
- There is a limit to the length of a line of text. The minimum is 254 characters (per the Standard)
- The End Of Line character varies, depending on the operating system:
- On Unix/Linux the EOL is the line feed. (ASCII 0x0A)
- On Macintosh, the EOL is the carriage return. (ASCII 0x0D)
- On MSDOS/Windows the EOL character is actually two characters:
a carriage return, and a line feed (ASCII 0x0D and 0x0A).
- When treating files as text files, the translation between the varying EOL characters
is done automatically. This is the primary reason for distinguishing between text and binary.
- Technically, a text file is (from the
IEEE Standard definition):
A file that contains characters organized into one or more lines. The lines do not contain NUL characters
and none can exceed {LINE_MAX} bytes in length, including the <newline>. Although IEEE Std 1003.1-2001
does not distinguish between text files and binary files (see the ISO C standard), many utilities only
produce predictable or meaningful output when operating on text files. The standard utilities that have
such restrictions always specify "text files" in their STDIN or INPUT FILES sections.
Binary files have no restrictions or limitations. It is up to the programmer to decide when to
interpret a file as a text file, and when to interpret it as a binary file.
Like most languages, reading from a file and writing to a file follow these steps:
- Open the file (fopen)
- Read/Write the file (fgetc, fgets, fputc, fputs, etc.)
- Close the file (fclose)
These two functions are required in all cases:
FILE *fopen( const char *filename, const char *mode );
int fclose( FILE *stream );
Notes:
- fopen returns a valid FILE * if the specified file was successfully opened, otherwise it
returns NULL.
- fclose returns 0 if successful and EOF if not.
- The second parameter to fopen is the mode and it can be:
| Read (input) Write (output) Append (output)
--------+----------------------------------------------
Text | "r" "w" "a"
Binary | "rb" "wb" "ab"
- The default type is text and if you add a t (e.g. "wt" for text writing) explicitly, it will
probably work. (Technically, t is not part of the Standard for C89, but I've never seen a compiler
which didn't accept it.) Text mode is also referred to as translate mode, because the EOL
characters are translated.
- Appending a + to the mode opens the file for update, which is reading and writing.
- More details:
- r Opens an existing file for reading. If the file does not exist, fopen fails and returns NULL.
- w Opens a file for writing. If the file exists, its contents are destroyed.
- a Opens for writing at the end of the file (appending); creates the file if it doesn’t exist.
- r+ Opens for both reading and writing. The file must exist.
- w+ Opens a file for both reading and writing. If the given file exists, its contents
are destroyed; creates a new file if it doesn't exist.
- a+ Opens for reading and appending; creates the file if it doesn’t exist.
This text below has been saved in 3 different formats here: Windows,
Linux, and Macintosh.
You'll have to download the files and view them in a hex editor to see the differences.
(Most web browsers can handle all types of EOL characters so they'll display it correctly.)
When in the Course of human events, it becomes necessary for one people to
dissolve the political bands which have connected them with another, and to
assume among the powers of the earth, the separate and equal station to which
the Laws of Nature and of Nature's God entitle them, a decent respect to the
opinions of mankind requires that they should declare the causes which impel
them to the separation.
Example
This example displays the contents of a given text file. (We're assuming we can interpret it as text.)
#define MAX_LINE 1024
void DisplayTextFile(void)
{
char filename[FILENAME_MAX];
FILE *infile;
// Prompt the user for a filename
puts("Enter the filename to display: ");
// Get the user's input (unsafe function call!)
gets(filename);
// Open the file for read in text/translate mode
infile = fopen(filename, "r");
// If successful, read each line and display it
if (infile)
{
char buffer[MAX_LINE];
// Until we reach the end
while (!feof(infile))
{
// Get a line and display it (safe function call)
if (fgets(buffer, MAX_LINE, infile))
fputs(buffer, stdout);
}
// Close the file (very important!)
fclose(infile);
}
else
perror(filename); // couldn't open the file
}
If the fopen fails, we call perror to display the reason. If we try to open a non-existent
file named foo.txt, we'll see this:
foo.txt: No such file or directory
Notes:
- There are no exceptions thrown (this is C), so you need to check for errors after calling
library functions.
- A return value of NULL indicates a failure; call perror if you want to display a
human-readable message.
- You can check the global variable errno, which will contain an integer representing
the error. (perror uses errno)
- See errno.h for a list of common errors.
- Descriptions for errno.h.
The same program above but reading in binary mode instead of text mode.
Binary Input/Output
- Text data is used when humans need to read/write the data.
- Binary data is used when only computers will read/write the data.
- Computers can read/write text as well, but binary is more efficient because there are no
conversions.
For binary I/O, use fread and fwrite:
size_t fread( void *buffer, size_t size, size_t count, FILE *stream );
size_t fwrite( const void *buffer, size_t size, size_t count, FILE *stream );
Info:
- buffer - A pointer to the data (to write out) or empty buffer (to read into)
- size - The size (number of bytes) of each element of data.
- count - The number of elements.
- stream - The opened (via fopen) stream.
- returns the number of full elements read/written.
Note that the return value is not the number of bytes written, but the number of
elements written. The number of actual bytes written will be the number of elements
written multiplied by the size of each element.
Examples using fread and fwrite.
Contents of a file after writing 5 integers to the file (from previous example):
Big endian
000000 12 34 56 78 12 34 56 79 12 34 56 7A 12 34 56 7B 4Vx4Vy4Vz4V{
000010 12 34 56 7C 4V|
Little endian
000000 78 56 34 12 79 56 34 12 7A 56 34 12 7B 56 34 12 xV4yV4zV4{V4
000010 7C 56 34 12 |V4
Quick endian refresher
Other Input/Output Functions
int ungetc( int c, FILE *stream );
int fflush( FILE *stream );
long ftell( FILE *stream );
int fseek( FILE *stream, long offset, int origin );
int feof( FILE *stream );
int rename( const char *oldname, const char *newname );
int remove( const char *path );
char *tmpnam( char *string );
Description:
- ungetc - Pushes a character back onto the stream.
- fflush - Forces data to be written even if the output buffer isn't full yet.
- ftell - Returns the current position in the stream.
- fseek - Moves to the specified offset in the stream.
- When using fseek, the possible values for origin and their meaning:
- SEEK_SET - offset is from the beginning of the stream and must be positive.
- SEEK_CUR - offset is from the current position in the stream and may be positive or negative.
- SEEK_END - offset is from the end of the stream and may be positive or negative.
- feof - Returns true (non-zero) if the stream is at the end, otherwise false (0).
- rename - Renames a file (Similar to ren under DOS/Windows and mv under Unix/Linux.)
- remove - Deletes a file from the disk. (Similar to del under DOS/Windows and rm under Unix/Linux.)
- tmpname - Constructs a name that can be used for a temporary file.
Notes:
- Don't rely on ungetc to be able to "undo" a lot of reading. It's generally only useful for returning
at most one character to the stream.
- Use fflush when trying to debug a crashing program with printf statements.
- Don't use fflush if speed is critical.
- ftell and fseek are mainly used (with SEEK_SET and SEEK_CUR) on binary
streams due to the EOL translation issues.
- Note that rename and remove take filenames (C-strings), not a FILE pointer and that
their implementation is system-dependent.
File I/O Documentation from MSDN.