A Simple STL map Example

The initial problem:

Given a text file, count the number of occurrences of each word in the file, then print the all of the words alphabetically with their corresponding count. For example, if a file named line.txt contained this line:

a line of text that has a word that occurs more than once in the line
the solution would be presented like this:
2 a
1 has
1 in
2 line
1 more
1 occurs
1 of
1 once
1 text
1 than
2 that
1 the
1 word
Before implementing this algorithm with an STL container, how would you implement it without using any STL containers? In other words, using only arrays, linked-lists, or other data structure that you might invent. (In other words, do it in C.) Keep in mind that the file could be any size, so hard-coding anything is out of the question.

What are the pros and cons with:


Implemented as a sorted array (Examples: std::vector, std::array):


Implemented as a linked list (Examples: std::list, std:forward_list):


Implemented as a binary search tree (Examples: std::set, std::map):


A First Attempt

The algorithm goes like this:
  1. Open a file for input
  2. While there are more words
    1. Read a word.
    2. Find the word in the container. (find)
    3. If the word is in the container, increment the associated count by 1.
    4. If the word is not in the container, add it to the container (insert in the proper position) and set its count to 1.
  3. Print out the counts and the words sorted alphabetically (by word, of course).
This example is going to show how to use a map as the container and why it's so powerful.

"If I was stranded on a Desert Island and could take only one STL container with me, it would be std::map." -- Mead

Sample code (without all of the necessary error handling to keep it simple):

void CountWords1(void)
{
    // For convenience (each word has an associated count)
  typedef std::map<std::string, int> FreqMap;

  std::string word; // the input word
  FreqMap wf;       // the frequencies of each word

    // Open some text file
  std::ifstream infile("line.txt");

    // Read all words from the file
  while (infile >> word)
  {
      // See if the key/value pair is already
      // in the map
    FreqMap::iterator it = wf.find(word);

      // If it is present, increment the count (value)
    if (it != wf.end())
      it->second++;  // Same as: (*it).second++
    else
    {
        // Create a new pair with value set to 1
      std::pair<std::string, int> pr(word, 1);
      wf.insert(pr);
    }
  }

    // Print out all of the key/value pairs
  for (FreqMap::iterator it = wf.begin(); it != wf.end(); ++it)
    std::cout << it->second << " " << it->first << std::endl;
}


A Slight Modification

We can take advantage of the subscript operator that is implemented by std::map. This has a couple of very handy features.

Note: Because of the way that the subscript operator behaves with maps, you can NEVER use the subscript operator to check for the existence of an item.


Only the loop is modified:

  // Read all words from the file
while (infile >> word)
{
    // See if the key/value pair is already
    // in the map
  FreqMap::iterator it = wf.find(word);

    // If it is present
  if (it != wf.end())
    it->second++;  // increment existing value
  else
    wf[word] = 1;  // else "add" key with value set to 1
}

Given the knowledge of the subscript operator, we can do better at this point. We don't have to call find to locate the item. We can just add it with the subscript operator and it will find it for us:

  // Read all words from the file
while (infile >> word)
{
  int count = wf[word]; // Get current value
  wf[word] = count + 1; //   and update it
}

Of course, we can go further and be more like C++:

  // Read all words from the file and update the count in the map
while (infile >> word)
  ++wf[word];
Ironically, the code to print the contents of the map is more than the code needed to build the map!

The final (so far) version:

void CountWords4(void)
{
    // For convenience
  typedef std::map<std::string, int> FreqMap;

  std::string word; // the input word
  FreqMap wf;       // the frequencies of each word

    // Open some text file
  std::ifstream infile("preamble.txt");

    // Read all words from the file and update the count in the map
  while (infile >> word)
    ++wf[word];

    // Print out all of the key/value pairs
  for (FreqMap::iterator it = wf.begin(); it != wf.end(); ++it)
    std::cout << it->second << " " << it->first << std::endl;
}
You could use the for_each algorithm to remove the last loop. Here's the function to print the pair:
void print_pair(const std::pair<std::string, int>& p)
{
  std::cout << p.second << ", " << p.first << "\n";
}
and this is the call:
  // Print out all of the key/value pairs
std::for_each(wf.begin(), wf.end(), print_pair);

or using a lambda expression in C++11 (removing std:: for readability):
std::for_each(wf.begin(), wf.end(), [] (pair<string, int> p){ cout << p.second << ", " << p.first << "\n";});
With the C++ 11 stuff and no typedef or namespaces (just to make it even shorter to read)

void CountWords5(void)
{
  map<string, int> wf; // the map of word frequencies
  string word;         // the input word

    // Open some text file
  ifstream infile("preamble.txt");

    // Read all words from the file and update the count in the map
  while (infile >> word)
    ++wf[word];

    // Print out all of the key/value pairs
  std::for_each(wf.begin(), wf.end(), [] (pair<string, int> p){ cout << p.second << ", " << p.first << "\n";});
}
Given a file containing this text:

When, in the course of human events, it becomes necessary for a people to advance from that 
subordination in which they have hitherto remained, and to assume among the powers of the 
earth, the equal and independent station to which the laws of nature and of nature's god 
entitle them, a decent respect to the opinions of mankind requires that they should declare the 
causes which impel them to the change.
We would get this (formatted with columns for the browser):

1 When,
2 a
1 advance
1 among
3 and
1 assume
1 becomes
1 causes
1 change.
1 course
1 decent
1 declare
1 earth,
1 entitle
1 equal
1 events,
1 for
1 from
1 god
1 have
1 hitherto
1 human
1 impel
2 in
1 independent
1 it
1 laws
1 mankind
1 nature
1 nature's
1 necessary
5 of
1 opinions
1 people
1 powers
1 remained,
1 requires
1 respect
1 should
1 station
1 subordination
2 that
8 the
1 them
1 them,
2 they
5 to
3 which

Future modifications: