The initial problem:
Given a text file, count the number of occurrences of each word in the file, then print the all of the words alphabetically with their corresponding count. For example, if a file named line.txt contained this line:
the solution would be presented like this:a line of text that has a word that occurs more than once in the line
Before implementing this algorithm with an STL container, how would you implement it without using any STL containers? In other words, using only arrays, linked-lists, or other data structure that you might invent. (In other words, do it in C.) Keep in mind that the file could be any size, so hard-coding anything is out of the question.2 a 1 has 1 in 2 line 1 more 1 occurs 1 of 1 once 1 text 1 than 2 that 1 the 1 word
What are the pros and cons with:
Implemented as a linked list (Examples: std::list, std:forward_list):
Implemented as a binary search tree (Examples: std::set, std::map):
Sample code (without all of the necessary error handling to keep it simple):"If I was stranded on a Desert Island and could take only one STL container with me, it would be std::map." -- Mead
void CountWords1(void)
{
// For convenience (each word has an associated count)
typedef std::map<std::string, int> FreqMap;
std::string word; // the input word
FreqMap wf; // the frequencies of each word
// Open some text file
std::ifstream infile("line.txt");
// Read all words from the file
while (infile >> word)
{
// See if the key/value pair is already
// in the map
FreqMap::iterator it = wf.find(word);
// If it is present, increment the count (value)
if (it != wf.end())
it->second++; // Same as: (*it).second++
else
{
// Create a new pair with value set to 1
std::pair<std::string, int> pr(word, 1);
wf.insert(pr);
}
}
// Print out all of the key/value pairs
for (FreqMap::iterator it = wf.begin(); it != wf.end(); ++it)
std::cout << it->second << " " << it->first << std::endl;
}
We can take advantage of the subscript operator that is implemented by std::map. This has a couple of very handy features.
// Create a map with std::strings for keys (indexes) and int values. std::map<std::string, int> MyMap; MyMap["foo"] = 10; // Add the value 10 at index "foo" MyMap["foo"] = 15; // Change the value at index "foo" to 15 std::cout << MyMap["foo"]; // Reads the value at index "foo" (15)
Note: Because of the way that the subscript operator behaves with maps, you can NEVER use the subscript operator to check for the existence of an item.
// Read all words from the file while (infile >> word) { // See if the key/value pair is already // in the map FreqMap::iterator it = wf.find(word); // If it is present if (it != wf.end()) it->second++; // increment existing value else wf[word] = 1; // else "add" key with value set to 1 }
// Read all words from the file while (infile >> word) { int count = wf[word]; // Get current value wf[word] = count + 1; // and update it }
Ironically, the code to print the contents of the map is more than the code needed to build the map!// Read all words from the file and update the count in the map while (infile >> word) ++wf[word];
The final (so far) version:
void CountWords4(void)
{
// For convenience
typedef std::map<std::string, int> FreqMap;
std::string word; // the input word
FreqMap wf; // the frequencies of each word
// Open some text file
std::ifstream infile("preamble.txt");
// Read all words from the file and update the count in the map
while (infile >> word)
++wf[word];
// Print out all of the key/value pairs
for (FreqMap::iterator it = wf.begin(); it != wf.end(); ++it)
std::cout << it->second << " " << it->first << std::endl;
}
You could use the for_each algorithm to remove the last loop. Here's the function
to print the pair:
and this is the call:void print_pair(const std::pair<std::string, int>& p) { std::cout << p.second << ", " << p.first << "\n"; }
// Print out all of the key/value pairs std::for_each(wf.begin(), wf.end(), print_pair);
With the C++ 11 stuff and no typedef or namespaces (just to make it even shorter to read)std::for_each(wf.begin(), wf.end(), [] (pair<string, int> p){ cout << p.second << ", " << p.first << "\n";});
void CountWords5(void)
{
map<string, int> wf; // the map of word frequencies
string word; // the input word
// Open some text file
ifstream infile("preamble.txt");
// Read all words from the file and update the count in the map
while (infile >> word)
++wf[word];
// Print out all of the key/value pairs
std::for_each(wf.begin(), wf.end(), [] (pair<string, int> p){ cout << p.second << ", " << p.first << "\n";});
}
Given a file containing this text:
We would get this (formatted with columns for the browser):When, in the course of human events, it becomes necessary for a people to advance from that subordination in which they have hitherto remained, and to assume among the powers of the earth, the equal and independent station to which the laws of nature and of nature's god entitle them, a decent respect to the opinions of mankind requires that they should declare the causes which impel them to the change.
1 When, 2 a 1 advance 1 among 3 and 1 assume 1 becomes 1 causes 1 change. 1 course 1 decent 1 declare 1 earth, 1 entitle 1 equal 1 events, 1 for 1 from 1 god 1 have 1 hitherto 1 human 1 impel 2 in 1 independent 1 it 1 laws 1 mankind 1 nature 1 nature's 1 necessary 5 of 1 opinions 1 people 1 powers 1 remained, 1 requires 1 respect 1 should 1 station 1 subordination 2 that 8 the 1 them 1 them, 2 they 5 to 3 which
Future modifications: