Practice Assignment

(commatize.c)

Information

In order for this practice to work on Windows, we have to use a slight extension to the C language. Actually, it's not an extension, per se, it's that we can't use the -ansi compiler option. That's because we are going to use the data type long long and we need to use printf to print those. Remember that a long in Windows is 32-bits, but a long most everywhere else is 64-bits. The code you write does NOT need to worry about this. It's the driver that is going to print out the numbers. Consult the driver to see what's going on. This is the ONLY time that a conditional define (e.g. #ifdef) is used.

  1. For this practice assignment, you will write a function that converts a long integer to a string. You've seen this before, but there are a couple of differences. First, this time, you're converting a long integer to a string. The range of a long integer is -9223372036854775808 to 9223372036854775807. The second difference is that you are going to put commas in the appropriate places. So, these numbers with commas are -9,223,372,036,854,775,808 and 9,223,372,036,854,775,807.

    This function will definitely be one that you'll want to put in your toolbox for future use because humans require commas in numbers. Don't believe me? Quick, what is 100000000? Time's up! You had to count the zeros, didn't you? Ok, what is this: 10,000,000,000? Much easier!

    The prototype for the function looks like this:

    char *commatize(long long number);
    
    Here is a driver file: HTML   Text

    The name of your implementation file should be commatize.c and the command to compile it will look like this:

    gcc -O -Werror -Wall -Wextra -pedantic main.c commatize.c -o commatize
    

    output-sample.txt for you to diff with.

    Approximate number of lines of code: 20.

Notes

  1. You cannot include any header files. (You don't need them.)
  2. You are returning a pointer to a character (i.e. a NUL-terminated string; an array of characters). However, where is this array allocated? You've got a choice to make:
    1. It's a local array in the commatize function. However, bad things are going to happen to the array when the function returns. In fact, the compiler will issue a warning if you return a pointer to an element of an array in the function:
      error: function returns address of local variable [-Werror=return-local-addr]  
      

    2. You can require the user to pass in an array and then put the commatized string into that. Sometimes that's not a bad idea, but it requires the user to make sure that the array is large enough. Not a big deal, but let's think of something better.
    3. You can dynamically allocate the array in the function and return a pointer to that. However, now you have to hope that the user is going to remember to free the array. Bad Idea™. Just take a look at the driver. The user is calling it dozens of times and never frees anything. (The user isn't aware of needing to do that.)
    4. Our solution: Mark the array with the static keyword. Here's a partial commatize.c file that demonstrates:
      /* This is the largest commatized string for a long int */  
      #define MAXLEN 27
      
      char *commatize(long long number)
      {
        static char buffer[MAXLEN]; 
      
        /* Put the rest of the function here */
      
      }
      
      This array will out-live the function itself. It is actually not stored on the stack, but in another area of memory that will still exist when this function returns. Not only does it live outside of the function, but it retains its value between function calls! Yes, each time you call the function, the previous values of this array are preserved.

      Static variables inside functions are automatically initialized to 0 when your program is first loaded into memory. Any other values you assign will persist between function calls. This is why it is safe to return a pointer to this memory as it does exist outside of the function.

      One caveat of this technique is that, if you are going to call the function multiple times, you must either use (e.g. print out) the string immediately, or copy it into another buffer. This is because each call of the function will overwrite the previous values. There is only one copy of the array, afterall.

      Finally, realize that this use of the static keyword is unrelated to the use when used outside of functions. Also, it is unrelated to the word static that we use to distinquish between a dynamically (programmer) allocated array and a static (compiler) allocated array. Yeah, the static word has been overloaded too much and is confusing. (C++ has better ways of doing similar things.)

  3. There are several ways to convert the long integer to a string with commas. Here is one way that works very well: Build the string in reverse. For example, if you were to commatize the number 12345, you would build the string like this in 5 steps:
    1. 5
    2. 45
    3. 345
    4. 2,345
    5. 12,345
    You would then return a pointer to the leading 1. Assuming that the size of the array is 27 (as described above), the original array looks like this (using X to mark NUL characters):
    XXXXXXXXXXXXXXXXXXXXXXXXXXX
                              ^
    
    The caret (^) is a "pointer" to the current location in the buffer, and you point at the last character, which will always be a NUL character. (It must be for this to be a NUL-terminated string).

    The array would change 5 times (according to the example above):

    XXXXXXXXXXXXXXXXXXXXXXXXX5X
                             ^
    XXXXXXXXXXXXXXXXXXXXXXXX45X
                            ^
    XXXXXXXXXXXXXXXXXXXXXXX345X
                           ^
    XXXXXXXXXXXXXXXXXXXXX2,345X
                         ^
    XXXXXXXXXXXXXXXXXXXX12,345X
                        ^
    
    Each time you place another character (digit or comma) in the buffer, you decrement the pointer first, and then place the character (digit or comma) where the pointer is pointing. When the loop is done (you do need a loop, preferably a while loop), you simply return the pointer to the user, as it is pointing at the beginning of the string. "Easy, peasy, lemon squeezy."
  4. The only real challenge left is getting the logic correct for where/when to put a comma in the string. I'm sure if you look closely at it, you'll figure out the pattern for when the comma is written.

For those that would like an additional challenge:

  1. Allow the function to accept a delimiter other than a comma. For example, some other locales use a period or a space. Examples:
    1,234,567,890
    1.234.567.890
    1 234 567 890
    
  2. Allow the function to specify how many digits in each group (e.g. 2, 3, 4, etc.) Look here to see how crazy this can get! Yes, it's non-trivial to support all customs and be glad that you don't have to!