Notepad++ Compare two files and remove
Say I have two files. file1.txt and file2.txt
Both files contains a list of shoe brands name (1000+ names), like this:
brand1 brand2 brand3 brand...
Now - I want to compare file1 to file2, delete all the reoccurring entries and only show me Whats in files1 that's not in file2 and vice versa.
The goal in other words is to see what's not in the opposite file since these entries is going to be typed manually into a product backoffice for two different categories so that they'll match/be the same in the end.
35 Answers
Would the plugin "Compare" of Notepad++ would do the trick?
You can install it from the menu of Notepad++ plugins=> Plugin Manager=> Compare 1.5.6
Here's the official description: A very useful diff plugin to show the difference between 2 files (side by side). Author: Ty Landercasper, now maintained and updated by Jean-Sebastien Leroy Source:
1An old question, but...
- Compare the files in WinMerge
- Tools -> Generate Patch (save this)
The patch has changes from both, but also extra markup. In notepad++, do the following replaces:
Search Mode: Regular Expression Find What: ^[0-9-].*$ Replace With: <blank> Replace All.
Search Mode: Regular Expression Find What: (<|>) Replace With: <blank> Replace All- Use the TextFX plugin in notepad++ either do a Tools->case-insensitive sort (output UNIQUE option selected), or Edit->Delete blank lines
Bit mungy, but I've yet to find a tool that will do this in one click.
To substract two files in notepad++ (file1 - file2) you may follow this procedure:
- Add
----------------------------as a footer on file1 (add at least 10 dashes). This is the marker line that separates file1 content from file2. - Then copy the contents of file2 to the end of file1 (after the marker)
- Control + H
- Search:
(?m)^\b(.*)\R(?=[\s\S]+-{10,}$[\s\S]+^\1\R) - Replace by:
(leave empty) - Select
Regular expressionradio button - Replace All
- Finally remove footer and file2 content
You can modify the marker if It is possible that file1/file2 can have lines equal to the marker. In that case you will have to adapt the regular expression.
By the way, you could even record a macro to do all steps (add the marker, switch to file2, copy content to file1, apply the regex, and even cleaning the data after the substraction) with a single button press.
If Unix is available to you, you could try these simple combinations of simple commands; tr, sort, and comm.
First, convert the file from horizontally separated to vertically separated:
tr '[:blank:]' '\n' < file1.txt > /tmp/file1.vertical
tr '[:blank:]' '\n' < file2.txt > /tmp/file2.verticalThen sort the files:
sort /tmp/file1.vertical > /tmp/file1.sorted
sort /tmp/file2.vertical > /tmp/file2.sortedNow you can see what's in file1 that's not in file2
comm -23 /tmp/file1.sorted /tmp/file2.sortedOr see what's in file2 that's not in file1
comm -13 /tmp/file1.sorted /tmp/file2.sortedIf you want the output in the same horizontal format you started with, you can do this:
comm -23 /tmp/file1.sorted /tmp/file2.sorted | tr '\n' ' '
comm -13 /tmp/file1.sorted /tmp/file2.sorted | tr '\n' ' 'When you are done, you could delete the temporary files you created:
rm /tmp/file1.vertical /tmp/file2.vertical /tmp/file1.sorted /tmp/file2.sorted For others looking for something similar and not tied to Notepad++, this excellent answer by iphonedroid shows diff should be able to do the job:
To show additions and deletions without context, line numbers, +, -, <, > ! etc, you can use diff like this:
diff --changed-group-format='%<%>' --unchanged-group-format='' a.txt b.txt For example, given two files:
a.txt
Common
Common
A-ONLY
Commonb.txt
Common
B-ONLY
Common
CommonThe following command will show lines either removed from a or added to b:
diff --changed-group-format='%<%>' --unchanged-group-format='' a.txt b.txt
# Output
B-ONLY
A-ONLYThis slightly different command will show lines removed from a.txt:
diff --changed-group-format='%<' --unchanged-group-format='' a.txt b.txt
# Output
A-ONLYFinally, this command will show lines added to a.txt
diff --changed-group-format='%>' --unchanged-group-format='' a.txt b.txt
# Output
B-ONLY