C program to count occurrences of all words in a file

Quick links

Write a C program to count occurrences of all words in a file. Logic to count occurrences of all words in a file in C program. How to count occurrences of all words in a file in C programming. C program to count occurrences of unique words in a file.

In previous post I explained how to count occurrence of a word in file. In this post we will step further and will count occurrence of all words in given file. So let’s, get started.

Required knowledge

Basic Input Output, Strings, Pointers, File Handling

Logic to count occurrences of all words in a file

Step by step descriptive logic to count occurrences of all words in a file.

  1. Open source file to count occurrences of in r (read) mode. Store its reference in fptr.
  2. Declare an array of string words[] to store list of distinct words.
  3. Declare another integer array count[] to store count of all words in file.
  4. Read a word from source file, store it in word.
  5. Convert word to lowercase using strlwr() string library function. Also remove last punctuation character from word if exists.
  6. Check if word exists in distinct words[] list or not.

    If exists then increment count[i - 1] by one (where i is index of word in distinct words list).
    If not exists then add word to distinct words list and increment its occurrence count.

  7. Repeat step 4-6 till end of file.

Program to count occurrences of all words in a file

/**
 * C program to count occurrences of all words in a file.
 */

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>

#define MAX_WORDS   1000



int main()
{
    FILE *fptr;
    char path[100];
    int i, len, index, isUnique;

    // List of distinct words
    char words[MAX_WORDS][50];
    char word[50];

    // Count of distinct words
    int  count[MAX_WORDS];


    /* Input file path */
    printf("Enter file path: ");
    scanf("%s", path);


    /* Try to open file */
    fptr = fopen(path, "r");

    /* Exit if file not opened successfully */
    if (fptr == NULL)
    {
        printf("Unable to open file.\n");
        printf("Please check you have read previleges.\n");

        exit(EXIT_FAILURE);
    }

    // Initialize words count to 0
    for (i=0; i<MAX_WORDS; i++)
        count[i] = 0;




    index = 0;
    
    while (fscanf(fptr, "%s", word) != EOF)
    {
        // Convert word to lowercase
        strlwr(word);

        // Remove last punctuation character
        len = strlen(word);
        if (ispunct(word[len - 1]))
            word[len - 1] = '\0';


        // Check if word exits in list of all distinct words
        isUnique = 1;
        for (i=0; i<index && isUnique; i++)
        {
            if (strcmp(words[i], word) == 0)
                isUnique = 0;
        }

        // If word is unique then add it to distinct words list
        // and increment index. Otherwise increment occurrence 
        // count of current word.
        if (isUnique) 
        {
            strcpy(words[index], word);
            count[index]++;

            index++;
        }
        else
        {
            count[i - 1]++;
        }
    }

    // Close file
    fclose(fptr);


    /*
     * Print occurrences of all words in file. 
     */
    printf("\nOccurrences of all distinct words in file: \n");
    for (i=0; i<index; i++)
    {
        /*
         * %-15s prints string in 15 character width.
         * - is used to print string left align inside
         * 15 character width space.
         */
        printf("%-15s => %d\n", words[i], count[i]);
    }    
    

    return 0;
}

Suppose <strong>data/file3.txt</strong> contains.

I love programming.
I am learning C programming at Codeforwin.
Programming with files is fun.
Learning C programming at Codeforwin is simple and easy.

Output

Enter file path: data/file3.txt
Occurrences of all distinct words in file:
i               => 2
love            => 1
programming     => 4
am              => 1
learning        => 2
c               => 2
at              => 2
codeforwin      => 2
with            => 1
files           => 1
is              => 2
fun             => 1
simple          => 1
and             => 1
easy            => 1

Happy coding 😉