Data Types

Integral Types

An integral data type is a type that is fundamentally an integer. That is, it has no fractional portion. Integral types come in different sizes.

The size determines how many bytes are required to store values.
The size also determines how large that value can be.
Integral types can be either signed or unsigned:

signed values can include positive and negative numbers (including zero)
unsigned values only include positive numbers (including zero)

You usually want to use a data type that relates to the values it will contain.

If the data type is too large, there is wasted space.

Although, with today's computers this is only a real concern with large arrays.

If the data type is too small, there is a loss of data.

This is the bigger problem.

There are 8 different integer types and their sizes are dependent on the computer/compiler. Most of the computers and compilers we are using are running 64-bit software. This is a diagram of the relative sizes with the char data type being 1 byte (usually 8-bits):

Relative size of data types (typical 64-bit computer: LP64)

It is an unfortunate historical accident that the smallest integer type is named char, which is short for character. It's unfortunate because, too often, students and beginner programmers equate the term char with "a letter of the English alphabet". There is nothing about the char data type that implies anything about the English alphabet. A better name, used in other programming languages, would have been byte, as that is more accurately what it represents: a single byte. This is even more obvious when you consider that a char can be negative, which certainly doesn't (and can't) represent any letter. Keep this in mind when dealing with the char data type. It's simply a very small integer, no more, no less. In fact, Microsoft has made an alias for it in their core header files:

typedef unsigned char BYTE; /* Use BYTE instead of char */

Note that the C Standard does not specify the sizes of any of the types except char which is always 1 byte. The sizes shown above are the minimum sizes that the types must support on a 64-bit computer.

This table shows the range of values for the integral types on a typical, modern 64-bit (LP64) computer:

Type Size in bytes Also called
(preferred in bold) Range of values
(Binary) Range of values
(Decimal)

signed char 1 char
(compiler-dependent) -2⁷ to 2⁷ - 1 -128 to 127

unsigned char 1 char
(compiler-dependent) 0 to 2⁸ - 1 0 to 255

signed short int 2 short
short int
signed short -2¹⁵ to 2¹⁵ - 1 -32,768 to 32,767

unsigned short int 2 unsigned short 0 to 2¹⁶ - 1 0 to 65,535

signed int 4 int
signed -2³¹ to 2³¹ - 1 -2,147,483,648 to
2,147,483,647

unsigned int 4 unsigned 0 to 2³² - 1 0 to 4,294,967,295

signed long int 8 long
long int
signed long -2⁶³ to 2⁶³ - 1 -9,223,372,036,854,775,808 to
9,223,372,036,854,775,807

unsigned long int 8 unsigned long 0 to 2⁶⁴ - 1 0 to 18,446,744,073,709,551,615

The ranges for data types are defined in stdint.h. Here's a sample from the header file:

  /* Minimum for largest signed integral type. */
define INTMAX_MIN   (-__INT64_C(9223372036854775807)-1)
  /* Maximum for largest signed integral type. */
define INTMAX_MAX   (__INT64_C(9223372036854775807))

Technically, there is just char and int, but the int types are extended with modifiers: signed, unsigned, short, long, which gives the possible combinations listed above. The order of the type and modifiers are ignored. These all mean the same thing:

unsigned long int
unsigned int long
long int unsigned
long unsigned int
int long unsigned
int unsigned long

Also, when used with long or short the int is optional (these are the same as above):

unsigned long
long unsigned

However, most code would commonly present the above as: unsigned long int or unsigned long (You should stick to this convention.)

This table shows the sizes of long integers used in Microsoft Windows (LLP64). (Most of the rest of the world uses 8 bytes as above. Keep this in mind if you plan on working with Windows.)

Type Bytes Also called Range of values
(Binary) Range of values
(Decimal)

signed long int 4 long
long int
signed long -2³¹ to 2³¹ - 1 -2,147,483,648 to
2,147,483,647

unsigned long int 4 unsigned long 0 to 2³² - 1 0 to 4,294,967,295

The only other thing that the Standard states is this:

sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long) <= sizeof(long long)

C99 introduces the long long type, which is 8-bytes on almost all modern 64-bit systems. Since the long type is also 8-bytes on most systems, this currently doesn't improve things (except when working with Microsoft Windows).

This table includes the binary values:

Type Binary Range Decimal Range

signed char 10000000
to
01111111 -128
to
127

unsigned char 00000000
to
11111111 0
to
255

signed short 1000000000000000
to
0111111111111111 -32,768
to
32,767

unsigned short 0000000000000000
to
1111111111111111 0
to
65,535

signed int 10000000000000000000000000000000
to
01111111111111111111111111111111 -2,147,483,648
to
2,147,483,647

unsigned int 00000000000000000000000000000000
to
11111111111111111111111111111111 0
to
4,294,967,295

signed long 1 followed by 63 zeros
to
0 followed by 63 ones -9,223,372,036,854,775,808
to
9,223,372,036,854,775,807

unsigned long [ 64 zeros ]
to
[ 64 ones ] 0
to
18,446,744,073,709,551,615

A signed integer that is 32 bits wide and can store values in the range: -2,147,483,648 to 2,147,483,647

A signed char that is 8 bits wide and can store values in the range: -128 to 127

What happens when you try to increment a value that is too large for the data type? With unsigned values, it just "wraps" back around to 0. Think of the bits being sort of like an odometer on a car. Once the odometer gets to 999999, it will "wrap" back around to 0. So, an unsigned char with a value of 255 will become 0:

 11111111
+       1
---------
000000000

The carry-bit is discarded.

With signed numbers, the result is undefined. It could be anything and do anything, including crashing the program. It's up to the particular system. Sometimes you may get kind of a "wrapping" like with unsigned, but this is not guaranteed. The difference is that instead of going from the largest positive value back to 0, the bits go from the largest positive value to the smallest negative value. (e.g. 127 + 1 is -128).

Overflow example

64-bit data models

Here's a sample.

Why did Microsoft choose the LLP64 model instead of LP64 like everyone else? Incidentally, that article is written by Raymond Chen. He is the Best. Microsoft. Blogger. Ever. If you want to become an expert Windows programmer, you need to know this!

Other interesting information for the curious student: here, here, and here (PDF).

Note: There isn't really much difference in memory use by choosing a 4-byte integer instead of a 2-byte short integer or 1-byte char to store the number 65. However, when you have very large arrays (thousands, millions, billions), it makes a real difference. If you aren't careful, you can run out of memory or have your program run slower.

Literal Constants

We know that a literal constant like 42 is an int and that a literal constant like 42.0 is a double.

If a literal is too large for an int, its type will be long int.
To force a smaller literal number to a particular type, append a suffix:
- Append the letter 'F' to a floating point value to make it a float.
```
42.0F
```
- Append the letter 'L' to an integral value to make it a long int. (Never use lowercase.)
```
42L vs. 42l
```
- Append the letter 'U' to an integral value to make it an unsigned int.
```
42U
```
- You can combine them to make an unsigned long int. (The order doesn't matter)
```
42UL or 42LU
```

Also, be aware that in C, the type of a literal char is an int:

char c = 'A';
printf("sizeof(c)   is %2i\n", sizeof(c));   /* char variable */
printf("sizeof('A') is %2i\n", sizeof('A')); /* char literal  */

Output:

sizeof(c)   is  1
sizeof('A') is  4

Don't forget that when you are reading/writing (scanf and printf) shorts and longs that you need to use modifiers on the type. h for short and l (lowercase L) for long. Refer to them here.

Usually, we write literal integral values using decimal (base 10) notation. C provides two other forms: octal (base 8) and hexadecimal (base 16) (C++14 adds a binary literal. Woohoo!)

Numbers in octal (base 8) can only use the digits 0..7
Octal numbers can be distinguished from decimal and hexadecimal by using a leading zero:
```
01  014  077  01472  077634L 
```
Numbers in hexadecimal (base 16) use the digits 0..9 and then the letters A..F
Hexadecimal (or just hex) numbers have a leading 0x (a zero followed by the letter X)
```
0x10  0X10  0x14  0x17AF  0xFFFF  0xabF10CD8L
```
Octal and hexadecimal are unsigned only.

Be careful not to include a leading 0 (zero) when writing decimal integers. In mathematics, these numbers both have the same value: 77 and 077. However, in C, the number 077 is actually an octal number and has the decimal value of 63.

Floating Point Types

Unlike the integral types, floating point types are not divided into signed and unsigned. All floating point types are signed only. Floating point numbers follow the IEEE-754 Floating Point Standard. Here are the approximate ranges of the IEEE-754 floating point numbers on Intel x86 computers:

Type Size Smallest Postive Value Largest Positive Value Precision

float 4 1.1754 x 10^-38 3.4028 x 10³⁸ 6 digits

double 8 2.2250 x 10^-308 1.7976 x 10³⁰⁸ 15 digits

long double 10^* 3.3621 x 10^-4932 1.1897 x 10⁴⁹³² 19 digits

Some floating point constants. These are all of type double:

42.0  42.0e0  42.  4.2e1  4.2E+1  .42e2  420.e-1  42e0  42.E0

To indicate that the type is float, you must append the letter f or F:

42.0f  42.0e0f  42.F  4.2e1F  etc...

To indicate that the type is long double, you must append the letter l (lowercase 'L') or L:

42.0L  42.0e0L  42.l  4.2e1l  etc...

In practice, NEVER use the lowercase L (which looks very similar to the number one: 1), as it will certainly cause confusion. (See above.)

A literal float, double, or long double must contain at least one decimal point or be written in scientific notation (e.g. with e notation) as the examples above show.

^*Here are the sizes of floating point numbers on various C compilers under 32-bit:

GNU gcc	Borland	Microsoft
sizeof(42.0) is 8 sizeof(42.0F) is 4 sizeof(42.0L) is 12	sizeof(42.0) is 8 sizeof(42.0F) is 4 sizeof(42.0L) is 10	sizeof(42.0) is 8 sizeof(42.0F) is 4 sizeof(42.0L) is 8

With 64-bit, all of the sizes shown above are the same except the long double which is 16 bytes for GCC/MinGW/Clang.

Partial float.h listing from Microsoft's compiler.

Another toy

Here's another sample with all types.

See this refresher on IEEE-754 notation for more information.

Comparison of data sizes with various compilers:

Compiler char short int long long long float double long double void* size_t intptr_t

32-bit Microsoft 1 2 4 4 8 4 8 8 4 4 4

32-bit GNU gcc 1 2 4 4 8 4 8 12 4 4 4

32-bit Clang 1 2 4 4 8 4 8 12 4 4 4

64-bit Microsoft 1 2 4 4 8 4 8 8 8 8 8

64-bit GNU gcc 1 2 4 8 8 4 8 16 8 8 8

64-bit Clang 1 2 4 8 8 4 8 16 8 8 8

64-bit MinGW 1 2 4 4 8 4 8 16 8 8 8

Compiler	char	short	int	long	long long	float	double	long double	void*	size_t	intptr_t
32-bit Microsoft	1	2	4	4	8	4	8	8	4	4	4
32-bit GNU gcc	1	2	4	4	8	4	8	12	4	4	4
32-bit Clang	1	2	4	4	8	4	8	12	4	4	4

64-bit Microsoft	1	2	4	4	8	4	8	8	8	8	8
64-bit GNU gcc	1	2	4	8	8	4	8	16	8	8	8
64-bit Clang	1	2	4	8	8	4	8	16	8	8	8
64-bit MinGW	1	2	4	4	8	4	8	16	8	8	8

Promotions and Conversions

When mixing data types, C (and C++) have extensive rules that determine which operands are promoted/converted and how. These rules are from "The Annotated C++ Reference Manual" by Bjarne Stroustrup. However, they also apply to C, since C++ is based on C.

The Integral Promotions

A char or short int (both signed and unsigned) may be used wherever an integer may be used. If an int can represent all the values of the original type, the value is converted to an int (signed); otherwise it is converted to unsigned int.

The Usual Arithmetic Conversions

If either operand is of type long double, the other is converted to long double.
Otherwise, if either operand is double, the other is converted to double.
Otherwise, if either operand is float, the other is converted to float.
Otherwise, the integral promotions are performed on both operands.
Then, if either operand is unsigned long, the other is converted to unsigned long.
Otherwise, if one operand is long int and the other unsigned int, then if a long int can represent all the values of an unsigned int, the unsigned int is converted to a long int; otherwise both operands are converted to unsigned long int.
Otherwise, if either operand is long, the other is converted to long.
Otherwise, if either operand is unsigned, the other is converted to unsigned.
Otherwise, both operands are int.

Note: You mainly need to be aware of these promotions/conversions when you are mixing data types.

The typedef Keyword

Suppose we want to add a boolean type to C. (There isn't one, so we typically use int in place of a boolean.) We've already done it using #define: here

To declare a variable, we simply do this:

int a;             /* Create an integer named a               */
unsigned char b;   /* Create an unsigned char named b         */
short int c;       /* Create a short integer named c          */
float d;           /* Create a float named d                  */
unsigned char * e; /* Create an unsigned char pointer named e */

These cause the compiler to allocate space for each variable, based on it's type.

If we want to create a new type (instead of a new variable), we add the typedef keyword:

typedef int a;             /* Create a new type named a */
typedef unsigned char b;   /* Create a new type named b */
typedef short int c;       /* Create a new type named c */
typedef float d;           /* Create a new type named d */
typedef unsigned char * e; /* Create a new type named e */

You can think of these type definitions as aliases for other types. To create a new variable of type a:

a i; /* Create an 'a' variable named i */
b j; /* Create a 'b' variable named j  */

Of course, this makes no sense whatsoever. For any real use, you need to give the typedefs meaningful names. Compare to #define:


/ Create new types using typedef / typedef int BOOL; typedef unsigned char BYTE; typedef short int FAST_INT; typedef float CURRENCY; typedef unsigned char * PCHAR;	/ Create new types using #define / #define BOOL int #define BYTE unsigned char #define FAST_INT short int #define CURRENCY float #define PCHAR unsigned char *

  /* Create new types using typedef */
typedef int BOOL;
typedef unsigned char BYTE;
typedef short int FAST_INT;
typedef float CURRENCY;
typedef unsigned char * PCHAR;

  /* Create new types using #define */
#define BOOL int
#define BYTE unsigned char
#define FAST_INT short int
#define CURRENCY float
#define PCHAR unsigned char *

Examples:

BOOL playing, paused;   /* Booleans for a DVD player    */
BYTE next, previous;    /* For scanning bytes in memory */
CURRENCY tax, discount; /* To calculate total price     */
PCHAR inbuf, outbuf;    /* To manipulate strings        */

Summary:

Use typedef when you want to create an alias for another type (easier to change later)

Use typedef to simplify the name of a type

  /* Each is an array of 10 unsigned char pointers */
unsigned char *a[10];
unsigned char *b[10];
unsigned char *c[10];
unsigned char *d[10];

This is the same:

typedef unsigned char *ShortString[10]; /* ShortString is a new type */

  /* An array of 10 unsigned char pointers */
ShortString a, b, c, d;

Unlike #define, the typedef keyword obeys the scope rules.

void foo(int a)
{
  typedef int BOOL1; /* Is visible only in this function            */
  #define BOOL2 int  /* Is visible in every function below this one */

  if (a)
  {
    typedef int INT32; /* Visible only in if                   */
    INT32 i32;         /* From typedef directly above          */
    BOOL1 b1;          /* From typedef above (top of function) */
    BOOL2 b2;          /* From #define above (top of function) */
    /* Other stuff */
  }

  INT32 i32; /* ERROR: Unknown type 'INT32' */
  /* Other stuff */
}

void bar(void)
{
  BOOL1 b1; /* ERROR: Not visible in this function */
  BOOL2 b2; /* From #define in function above      */
  /* Other stuff */
}

void baz(void)
{
  typedef long BOOL1; /* OK, Is visible only in this function */
  #define BOOL2 long  /* ERROR: Redefine from function above  */
  /* Other stuff */
}

#define is an unsophisticated "search and replace" by the preprocessor:

#define CPTR1 char *
typedef char * CPTR2;

CPTR1 p1, p2; /* What is the type of p1 and p2? */
CPTR2 p3, p4; /* What is the type of p3 and p4? */

printf("%i, %i\n", sizeof(p1), sizeof(p2));
printf("%i, %i\n", sizeof(p3), sizeof(p4));

Type	Size in bytes	Also called (preferred in bold)	Range of values (Binary)	Range of values (Decimal)
signed char	1	char (compiler-dependent)	-2⁷ to 2⁷ - 1	-128 to 127
unsigned char	1	char (compiler-dependent)	0 to 2⁸ - 1	0 to 255
signed short int	2	short short int signed short	-2¹⁵ to 2¹⁵ - 1	-32,768 to 32,767
unsigned short int	2	unsigned short	0 to 2¹⁶ - 1	0 to 65,535
signed int	4	int signed	-2³¹ to 2³¹ - 1	-2,147,483,648 to 2,147,483,647
unsigned int	4	unsigned	0 to 2³² - 1	0 to 4,294,967,295
signed long int	8	long long int signed long	-2⁶³ to 2⁶³ - 1	-9,223,372,036,854,775,808 to 9,223,372,036,854,775,807
unsigned long int	8	unsigned long	0 to 2⁶⁴ - 1	0 to 18,446,744,073,709,551,615

Type	Bytes	Also called	Range of values (Binary)	Range of values (Decimal)
signed long int	4	long long int signed long	-2³¹ to 2³¹ - 1	-2,147,483,648 to 2,147,483,647
unsigned long int	4	unsigned long	0 to 2³² - 1	0 to 4,294,967,295

Type	Binary Range	Decimal Range
signed char	10000000 to 01111111	-128 to 127
unsigned char	00000000 to 11111111	0 to 255
signed short	1000000000000000 to 0111111111111111	-32,768 to 32,767
unsigned short	0000000000000000 to 1111111111111111	0 to 65,535
signed int	10000000000000000000000000000000 to 01111111111111111111111111111111	-2,147,483,648 to 2,147,483,647
unsigned int	00000000000000000000000000000000 to 11111111111111111111111111111111	0 to 4,294,967,295
signed long	1 followed by 63 zeros to 0 followed by 63 ones	-9,223,372,036,854,775,808 to 9,223,372,036,854,775,807
unsigned long	[ 64 zeros ] to [ 64 ones ]	0 to 18,446,744,073,709,551,615

Type	Size	Smallest Postive Value	Largest Positive Value	Precision
float	4	1.1754 x 10^-38	3.4028 x 10³⁸	6 digits
double	8	2.2250 x 10^-308	1.7976 x 10³⁰⁸	15 digits
long double	10^*	3.3621 x 10^-4932	1.1897 x 10⁴⁹³²	19 digits