Word count for multiple .txt files in linux

I need to find the words in multiple .txt files using a linux cli. Currently I am using the following command:

cat *.txt|wc -w

I have made a test directory to practice the command and it seems to work for each individual .txt file but it fails to do it properly for all the .txt files. I have a directory with 5 files in which 4 of them contain each 5 words and 1 is emtpy. For the individual cat textfile.txt|wc -w it gives the right answer. But for the count it gives 17 when it should be (4 times 5 + 0 times 0 =) 20 Can someone tell my why the count given is 17 while the real count is 20?

2 Answers

You can run

wc -w *.txt

This will give you the word count for each file and a total sum in the last row.

As it turned out, OPs issue was a missing newline in one of the files. This caused cat *txt to combine multiple words into one and therefore resulting in a wrong count. The command above is more robust in this situation as it processes each file individually.

The most likely explanation is that the final lines of your files are not properly newline-terminated, so that when you cat them, the first word of the next file gets appended to last word of the previous file:

Ex. given

steeldriver@pc:~$ printf 'foo\nbar\nbaz\nbam\nboo' | tee {1..4}.txt
foo
bar
baz
bam
boosteeldriver@pc:~$ printf '' > 5.txt

then

steeldriver@pc:~$ wc -w {1..5}.txt 5 1.txt 5 2.txt 5 3.txt 5 4.txt 0 5.txt
20 total

but

steeldriver@pc:~$ cat {1..5}.txt | wc -w
17

Word count for multiple .txt files in linux

2 Answers

Your Answer

Sign up or log in

Post as a guest

You Might Also Like

Minecraft on iMAC

What is "rubber banding"?

How do you increase your team members' loyalty in Mass Effect 2?