Data Types

Integral Types

An integral data type is a type that is fundamentally an integer. That is, it has no fractional portion. Integral types come in different sizes. There are 8 different integer types and their sizes are dependent on the computer/compiler. Most of the computers and compilers we are using are running 64-bit software. This is a diagram of the relative sizes with the char data type being 1 byte (usually 8-bits):
Relative size of data types (typical 64-bit computer: LP64)

It is an unfortunate historical accident that the smallest integer type is named char, which is short for character. It's unfortunate because, too often, students and beginner programmers equate the term char with "a letter of the English alphabet". There is nothing about the char data type that implies anything about the English alphabet. A better name, used in other programming languages, would have been byte, as that is more accurately what it represents: a single byte. This is even more obvious when you consider that a char can be negative, which certainly doesn't (and can't) represent any letter. Keep this in mind when dealing with the char data type. It's simply a very small integer, no more, no less. In fact, Microsoft has made an alias for it in their core header files:

typedef unsigned char BYTE;   /* Use BYTE instead of char */

Note that the C Standard does not specify the sizes of any of the types except char which is always 1 byte. The sizes shown above are the minimum sizes that the types must support on a 64-bit computer.

This table shows the range of values for the integral types on a typical, modern 64-bit (LP64) computer:

Type Size in bytes Also called
(preferred in bold)
Range of values
(Binary)
Range of values
(Decimal)
signed char 1 char
(compiler-dependent)
-27 to 27 - 1 -128 to 127
unsigned char 1 char
(compiler-dependent)
0 to 28 - 1 0 to 255
signed short int 2 short
short int
signed short
-215 to 215 - 1 -32,768 to 32,767
unsigned short int 2 unsigned short 0 to 216 - 1 0 to 65,535
signed int 4 int
signed
-231 to 231 - 1 -2,147,483,648 to
2,147,483,647
unsigned int 4 unsigned 0 to 232 - 1 0 to 4,294,967,295
signed long int 8 long
long int
signed long
-263 to 263 - 1 -9,223,372,036,854,775,808 to
9,223,372,036,854,775,807
unsigned long int 8 unsigned long 0 to 264 - 1 0 to 18,446,744,073,709,551,615

The ranges for data types are defined in stdint.h. Here's a sample from the header file:

  /* Minimum for largest signed integral type. */
define INTMAX_MIN   (-__INT64_C(9223372036854775807)-1)
  /* Maximum for largest signed integral type. */
define INTMAX_MAX   (__INT64_C(9223372036854775807))

Technically, there is just char and int, but the int types are extended with modifiers: signed, unsigned, short, long, which gives the possible combinations listed above. The order of the type and modifiers are ignored. These all mean the same thing:

unsigned long int
unsigned int long
long int unsigned
long unsigned int
int long unsigned
int unsigned long
Also, when used with long or short the int is optional (these are the same as above):

unsigned long
long unsigned

However, most code would commonly present the above as: unsigned long int or unsigned long (You should stick to this convention.)

This table shows the sizes of long integers used in Microsoft Windows (LLP64). (Most of the rest of the world uses 8 bytes as above. Keep this in mind if you plan on working with Windows.)

Type Bytes Also called Range of values
(Binary)
Range of values
(Decimal)
signed long int 4 long
long int
signed long
-231 to 231 - 1 -2,147,483,648 to
2,147,483,647
unsigned long int 4 unsigned long 0 to 232 - 1 0 to 4,294,967,295

The only other thing that the Standard states is this:

sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long) <= sizeof(long long)

C99 introduces the long long type, which is 8-bytes on almost all modern 64-bit systems. Since the long type is also 8-bytes on most systems, this currently doesn't improve things (except when working with Microsoft Windows).

This table includes the binary values:
Type Binary Range Decimal Range
signed char 10000000
to
01111111 
-128
to
127
unsigned char 00000000
to
11111111
0
to
255
signed short 1000000000000000
to
0111111111111111
-32,768
to
32,767
unsigned short 0000000000000000
to
1111111111111111
0
to
65,535
signed int 10000000000000000000000000000000
to
01111111111111111111111111111111
-2,147,483,648
to
2,147,483,647
unsigned int 00000000000000000000000000000000
to
11111111111111111111111111111111
0
to
4,294,967,295
signed long 1 followed by 63 zeros
to
0 followed by 63 ones
-9,223,372,036,854,775,808
to
9,223,372,036,854,775,807
unsigned long [ 64 zeros ]
to
[ 64 ones ]
0
to
18,446,744,073,709,551,615
A signed integer that is 32 bits wide and can store values in the range: -2,147,483,648 to 2,147,483,647

A signed char that is 8 bits wide and can store values in the range: -128 to 127

What happens when you try to increment a value that is too large for the data type? With unsigned values, it just "wraps" back around to 0. Think of the bits being sort of like an odometer on a car. Once the odometer gets to 999999, it will "wrap" back around to 0. So, an unsigned char with a value of 255 will become 0:

 11111111
+       1
---------
000000000
The carry-bit is discarded.

With signed numbers, the result is undefined. It could be anything and do anything, including crashing the program. It's up to the particular system. Sometimes you may get kind of a "wrapping" like with unsigned, but this is not guaranteed. The difference is that instead of going from the largest positive value back to 0, the bits go from the largest positive value to the smallest negative value. (e.g. 127 + 1 is -128).

Overflow example

64-bit data models

Here's a sample.

Why did Microsoft choose the LLP64 model instead of LP64 like everyone else? Incidentally, that article is written by Raymond Chen. He is the Best. Microsoft. Blogger. Ever. If you want to become an expert Windows programmer, you need to know this!

Other interesting information for the curious student: here, here, and here (PDF).

Note: There isn't really much difference in memory use by choosing a 4-byte integer instead of a 2-byte short integer or 1-byte char to store the number 65. However, when you have very large arrays (thousands, millions, billions), it makes a real difference. If you aren't careful, you can run out of memory or have your program run slower.

Literal Constants

We know that a literal constant like 42 is an int and that a literal constant like 42.0 is a double. Don't forget that when you are reading/writing (scanf and printf) shorts and longs that you need to use modifiers on the type. h for short and l (lowercase L) for long. Refer to them here.


Usually, we write literal integral values using decimal (base 10) notation. C provides two other forms: octal (base 8) and hexadecimal (base 16) (C++14 adds a binary literal. Woohoo!)

Be careful not to include a leading 0 (zero) when writing decimal integers. In mathematics, these numbers both have the same value: 77 and 077. However, in C, the number 077 is actually an octal number and has the decimal value of 63.

Floating Point Types

Unlike the integral types, floating point types are not divided into signed and unsigned. All floating point types are signed only. Floating point numbers follow the IEEE-754 Floating Point Standard. Here are the approximate ranges of the IEEE-754 floating point numbers on Intel x86 computers:
Type Size Smallest Postive Value Largest Positive Value Precision
float 4 1.1754 x 10-38 3.4028 x 1038 6 digits
double 8 2.2250 x 10-308 1.7976 x 10308 15 digits
long double 10* 3.3621 x 10-4932 1.1897 x 104932 19 digits
Some floating point constants. These are all of type double:
42.0  42.0e0  42.  4.2e1  4.2E+1  .42e2  420.e-1  42e0  42.E0
To indicate that the type is float, you must append the letter f or F:
42.0f  42.0e0f  42.F  4.2e1F  etc...
To indicate that the type is long double, you must append the letter l (lowercase 'L') or L:
42.0L  42.0e0L  42.l  4.2e1l  etc...
In practice, NEVER use the lowercase L (which looks very similar to the number one: 1), as it will certainly cause confusion. (See above.)

A literal float, double, or long double must contain at least one decimal point or be written in scientific notation (e.g. with e notation) as the examples above show.

*Here are the sizes of floating point numbers on various C compilers under 32-bit:
GNU gccBorlandMicrosoft
sizeof(42.0)  is  8
sizeof(42.0F) is  4
sizeof(42.0L) is 12
sizeof(42.0)  is  8
sizeof(42.0F) is  4
sizeof(42.0L) is 10
sizeof(42.0)  is  8
sizeof(42.0F) is  4
sizeof(42.0L) is  8
With 64-bit, all of the sizes shown above are the same except the long double which is 16 bytes for GCC/MinGW/Clang.

Partial float.h listing from Microsoft's compiler.

Another toy

Here's another sample with all types.

See this refresher on IEEE-754 notation for more information.

Comparison of data sizes with various compilers:

Compilercharshortintlonglong longfloatdoublelong doublevoid*size_tintptr_t
32-bit Microsoft12448488444
32-bit GNU gcc124484812444
32-bit Clang124484812444
64-bit Microsoft12448488888
64-bit GNU gcc124884816888
64-bit Clang124884816888
64-bit MinGW124484816888

Promotions and Conversions

When mixing data types, C (and C++) have extensive rules that determine which operands are promoted/converted and how. These rules are from "The Annotated C++ Reference Manual" by Bjarne Stroustrup. However, they also apply to C, since C++ is based on C.

The Integral Promotions

The Usual Arithmetic Conversions
  1. If either operand is of type long double, the other is converted to long double.
  2. Otherwise, if either operand is double, the other is converted to double.
  3. Otherwise, if either operand is float, the other is converted to float.
  4. Otherwise, the integral promotions are performed on both operands.
  5. Then, if either operand is unsigned long, the other is converted to unsigned long.
  6. Otherwise, if one operand is long int and the other unsigned int, then if a long int can represent all the values of an unsigned int, the unsigned int is converted to a long int; otherwise both operands are converted to unsigned long int.
  7. Otherwise, if either operand is long, the other is converted to long.
  8. Otherwise, if either operand is unsigned, the other is converted to unsigned.
  9. Otherwise, both operands are int.

Note: You mainly need to be aware of these promotions/conversions when you are mixing data types.

The typedef Keyword

Suppose we want to add a boolean type to C. (There isn't one, so we typically use int in place of a boolean.) We've already done it using #define: here

To declare a variable, we simply do this:

int a;             /* Create an integer named a               */
unsigned char b;   /* Create an unsigned char named b         */
short int c;       /* Create a short integer named c          */
float d;           /* Create a float named d                  */
unsigned char * e; /* Create an unsigned char pointer named e */
These cause the compiler to allocate space for each variable, based on it's type.

If we want to create a new type (instead of a new variable), we add the typedef keyword:

typedef int a;             /* Create a new type named a */
typedef unsigned char b;   /* Create a new type named b */
typedef short int c;       /* Create a new type named c */
typedef float d;           /* Create a new type named d */
typedef unsigned char * e; /* Create a new type named e */
You can think of these type definitions as aliases for other types. To create a new variable of type a:
a i; /* Create an 'a' variable named i */
b j; /* Create a 'b' variable named j  */
Of course, this makes no sense whatsoever. For any real use, you need to give the typedefs meaningful names. Compare to #define:
  /* Create new types using typedef */
typedef int BOOL;
typedef unsigned char BYTE;
typedef short int FAST_INT;
typedef float CURRENCY;
typedef unsigned char * PCHAR;
  /* Create new types using #define */
#define BOOL int
#define BYTE unsigned char
#define FAST_INT short int
#define CURRENCY float
#define PCHAR unsigned char *
Examples:
BOOL playing, paused;   /* Booleans for a DVD player    */
BYTE next, previous;    /* For scanning bytes in memory */
CURRENCY tax, discount; /* To calculate total price     */
PCHAR inbuf, outbuf;    /* To manipulate strings        */
Summary: