Word count for multiple .txt files in linux
I need to find the words in multiple .txt files using a linux cli. Currently I am using the following command:
cat *.txt|wc -wI have made a test directory to practice the command and it seems to work for each individual .txt file but it fails to do it properly for all the .txt files.
I have a directory with 5 files in which 4 of them contain each 5 words and 1 is emtpy.
For the individual cat textfile.txt|wc -w it gives the right answer.
But for the count it gives 17 when it should be (4 times 5 + 0 times 0 =) 20
Can someone tell my why the count given is 17 while the real count is 20?
2 Answers
You can run
wc -w *.txtThis will give you the word count for each file and a total sum in the last row.
As it turned out, OPs issue was a missing newline in one of the files. This caused cat *txt to combine multiple words into one and therefore resulting in a wrong count.
The command above is more robust in this situation as it processes each file individually.
The most likely explanation is that the final lines of your files are not properly newline-terminated, so that when you cat them, the first word of the next file gets appended to last word of the previous file:
Ex. given
steeldriver@pc:~$ printf 'foo\nbar\nbaz\nbam\nboo' | tee {1..4}.txt
foo
bar
baz
bam
boosteeldriver@pc:~$ printf '' > 5.txtthen
steeldriver@pc:~$ wc -w {1..5}.txt 5 1.txt 5 2.txt 5 3.txt 5 4.txt 0 5.txt
20 totalbut
steeldriver@pc:~$ cat {1..5}.txt | wc -w
17