Help with a Bash script or some program to sort files and move to different folders
is there anyone out there who can help me sort some files with a bash script or terminal commands (or even a little Java program) that can be used to sort and and move files to new folders? I have been trying for days, but haven't figured it out.
I have a folder called "webpages" with hundreds of html files. Roughly speaking, they are divided in three categories, so I will need to run one script three times with different variables or a script that can do all of the sorting and moving at once.
I want to search for certain strings in the files and then send the files that match to new folders. To simplify, some of the web pages are about politics, some about business and some about computers. So, lets say I want to search all the files in the folder "webpages" for the words "election", "stock market" and "open source" and move the files containing the word "election" to a folder called "politics", the files containing the word "stock market" to a folder called "business" and the files containing the term "open source" to a folder called "computers".
Like I said, I have tried to figure it out, but just got laughed at for my efforts. I am no expert. Thank you!
23 Answers
Suppose you have the following files in the current directory:
a/sm1, with the content "a stock market b"b/sm2, with the content "x stock market y"sm3, that does not contain "stock market"destination, a directory where you want to move files containing "stock market".
Let's find all the files (of type f = file) in the current directory ( . ):
$ find . -type f
./a/sm1
./sm3
./b/sm2But sm3 does not contain "stock market", we don't want it. In this list of files that we have now, let's search for "stock market", and only display the files that match:
$ find . -type f | xargs grep --files-with-matches "stock market"
./a/sm1
./b/sm2Now let's get each of the files we got, and move them to the destination directory:
$ for f in $(find . -type f | xargs grep --files-with-matches "stock market"); do mv $f destination/; doneMake sure you have a backup before running this, just in case it doesn't move them how you want.
0I think some plain bash magic might do the trick:
#!/bin/bash
dir1=""
dir2=""
dir3=""
shopt -s nullglob
for i in *.html) do if [ "$(grep 'keyword1' $i)" != "" ]; then mv -vf "$i" "$dir1" elif [ "$(grep 'keyword2' $i)" != "" ]; then mv -vf "$i" "$dir2" elif [ "$(grep 'keyword3' $i)" != "" ]; then mv -vf "$i" "$dir3" else echo "$i">>nomatch fi
done
cat nomatch 6 Here are a few ways of doing what you want:
1. find
find . -iname '*html' -type f -exec grep -q election "{}" \; -and -exec mv {} politics/ \; Explanation
Here, we are using find's -exec option:
-exec command ; Execute command; true if 0 status is returned. All following arguments to find are taken to be arguments to the command until an argument consisting of `;' is encountered. The string `{}' is replaced by the current file name being processedSo, the first -exec searches the file (here, represented by {}) for election and the second one will preform the move. The -and ensures that the second -exec is only run if the first was successful, if the file matched the pattern.
2. find & shell.
This is the same basic approach as the one in Cos64's answer but with a few improvements.
find . -iname '*html' -type f -print0 | while IFS= read -r -d '' file; do grep -q election "$file" && mv "$file" politics/ doneExplanation
- The
findcommand will find all files (-type f) whose name ends in.html(or.HTML, the-inameis case insensitive) and print them separated by the NULL character. This is needed because file names in *nix systems can contain any character except/and\0(NULL). So, you can have files with spaces, newlines and any other strange character. These need to be treated specially. while IFS= read -r -d '' file; do ... done: this iterates over the output offind, saving each file as$file. TheIFS=sets the input field separator to nothing, which means we can deal with spaces in file names correctly. The-d ''makes it read\0-separated lines and the-rlets it deal with file names containing\.grep -q election "$file": search the file for the pattern. The-qsuppresses normal output and makes thegrepsilent.&& echo mv "$file" politics/: the&&ensures that this command is only run if the previous one (thegrep) was successful.
3. Bash.
This script is very similar to the one in @WilhelmErasmus's very good answer with the difference that i) it can take the set of patterns and replacements from the command line and ii) it also finds files in subdirectories.
#!/usr/bin/env bash
## Exit if no arguments were given
[ -z "$1" ] && echo "At least two arguments are needed." >&2 && exit 1
## Collect the arguments
args=("$@")
## Declare the $dirs associative array
declare -A dirs
## Save the arguments given in the $dirs array.
## $# is the number of arguments given, so this
## will iterate over of them, reading two by two.
for ((i=0;i<$#;i+=2));
do ## The arguments are pairs of patterns and target directories. ## Set the value of this pattern to the value of the next argument, ## its target directory. dirs[${args[$i]}]="${args[i+1]}"
done
## Ignore globs that match no files
shopt -s nullglob
## This enables ** to match subdirectories
shopt -s globstar
## Find all .html files
for file in **/*{html,htm,HTM,HTML}
do matched=0; for pat in "${!dirs[@]}" do ## Does this file match the pattern? ## The `-q` suppresses grep's output. grep -q "$pat" "$file" && ## Set matched to 1 if the file matches. matched=1 && ## If the grep succeeded, move the file ## to the corresponding directory mv "$file" "${dirs[$pat]}" && ## If the move succeeded, break the loop ## and move to the next pattern. break done ## Report files that didn't match [[ "$matched" -eq 0 ]] && printf "No matches for '%s'\n" "$file" >&2
doneRun the script giving it the names of patterns and their targets. For example, with the ones in your question:
bash move_files.sh "election" "politics" "stock market" "business" "open source" "computers" 1