Practice Assignment

(wordlen.c)

Information

As always, please be sure to read this entire web page before doing any coding or asking any questions. You'll be glad that you did.

  1. For this practice assignment, you will write a function that will calculate the average word length for a given sentence or paragraph (or longer). For example, given this text:
    This is a simple test.
    
    The average word length is 3.60 characters. This is because there are 5 words and 18 characters and 18 / 5 = 3.60. For this practice, you will distinguish between whitespace and non-whitespace.

    Whitespace includes spaces, tabs, and newlines (and a few others). ANY OTHER CHARACTER IS CONSIDERED PART OF A WORD. This means that all punctuation (non-whitespace) is valid for part of a word. In the example sentence above, the last word is followed by a period. This means that the number of characters in the last word is actually 5. Please make sure that you understand this before attempting this program.

    The prototype for the function looks like this:

    double average_word_length(const char *text);
    
    You can be guaranteed that you will only be given valid, NUL-terminated character strings.

    Here is a driver file: HTML   Text

    You will also need one these input files: quotes-lf.txt or quotes-crlf.txt as a source of input to the program. Choose the one that matches the end-of-line characters on your system.

    Typically, for Mac and Linux, it's the first one (Line Feeds only). Windows uses the second one (Carriage Return and Line Feeds). You'll have to rename the file that you download to just quotes.txt, as that is what the driver is expecting.

    The name of your implementation file should be wordlen.c and the command to compile it will look like this:

    gcc -O -Werror -Wall -Wextra -ansi -pedantic main.c wordlen.c -o wordlen
    

    Here are some sample outputs:

    Approximate number of lines of code: 12.

Notes

  1. As stated above, there are only a few lines of code required to implement this, as long as you THINK before beginning. Also, DRAW DIAGRAMS of what you're trying to accomplish.
  2. To calculate the average word length, you need to figure out how many non-whitespace characters are in the text and then divide that by the number of words in the text. A word is simply a contiguous sequence of non-whitespace characters.
  3. The easiest way to do this is to start with a pointer pointing at the first character in the string. If it is pointing at whitespace, you continuously increment the pointer (so it points at the next character) until you find a non-whitespace character. This is called "skipping over whitespace".
  4. When you encounter a non-whitespace character, you know that you've found the beginning of the next word and will increment your word counter.
  5. Now, you will continuously increment the pointer until it points at a whitespace character. While you are "searching" for this whitespace character (and, hence, the end of the word), you will be keeping a count of how many total non-whitespace characters you've found.
  6. When you find the end of the word, you will repeat the process for skipping over whitespace while looking for the next word.
  7. This process stops when you get to the NUL character at the end of the string.
  8. Here's a rudimentary diagram of how a string would be processed.

    Start by pointing at the first character. There is no leading whitespace, so you know you are at the start of a word:

    There   are five    words here.
    ^
    
    Now, increment the pointer until you get to the first space:
    There   are five    words here.
         ^
    
    At this point, you've found the end of the first word and 5 non-whitespace characters (5 in total). Continuing by skipping over whitespace:
    There   are five    words here.
            ^
    
    Now, we've found the start of the second word. Continue, looking for the end of the word:
    There   are five    words here.
               ^
    
    Now we've counted 8 non-whitespace characters. Continuing:
    There   are five    words here.
                ^
    
    We've found the start of the third word. Continuing:
    There   are five    words here.
                    ^
    
    We've found the end of the third word and 4 more non-whitespace characters (12 characters in total). Continuing by skipping over whitespace:
    There   are five    words here.
                        ^
    
    We've found the start of the fourth word. Continuing:
    There   are five    words here.
                             ^
    
    We've found the end of the fourth word and 5 more non-whitespace characters. (17 characters in total) Continuing by skipping over whitespace:
    There   are five    words here.
                              ^
    
    We've found the start of the fifth word. Continuing:
    There   are five    words here.
                                   ^
    
    We've reached the end of the string and counted 5 more characters which is 22 non-whitespace characters in total. Dividing 22 by 5 gives us and average of 4.4 characters per word.
  9. If you look closely, you'll see that your program is in one of 2 states: It is either in a sequence of whitespace characters OR it is in a sequence of non-whitespace characters. That's all there is to it.
  10. There is a function in the C standard library called:
    int isspace(intc);
    
    It returns true (non-zero if the character is considered a whitespace character) and false, if not. You can see what constitutes a whitespace character here. You must include this:
    #include <ctype.h>
    
    which you are allowed (and expected) to do. DO NOT write your own checking function.