Convert hardlinks to reflinks
I have nested folders with a bunch of files inside that are hardlinked to each other. I would like to break the hardlinks (convert them into separate files), but then immediately convert each pair into a reflink (so they have different inodes but use the same section of disk).
find -type f -links +1will find all the hardlinks, while a command like
cp --reflink=always my_file.bin my_file_copy.binwill copy a file without using any more disk space, creating it as a reflink.
How do I combine these to go through a whole set of nested folders and convert each hardlink into a reflink, replacing them with the same filename?
12 Answers
You tagged ubuntu, I understand you are not limited to strictly POSIX tools and their POSIX options.
find . -type f -links +1 -execdir sh -c ' tmp="$(TMPDIR=. mktemp)" && cp -p --reflink=always -- "$1" "$tmp" && mv -f -- "$tmp" "$1"
' find-sh {} \; -printNotes:
- This converts hardlinks to reflinks, i.e.
my_file.binhardlink becomesmy_file.binreflink. There will be nomy_file_copy.bin. (This note is in case you want to createmy_file_copy.binreflink while leavingmy_file.binhardlink intact. The question is not crystal clear in this matter, it introducesmy_file_copy.binfor some reason.) - If
mktemporcpfails thenmvwill not be performed. In any case you shouldn't lose the original content, unless some other process modifies the temporary file. - Because
findtests files one by one, it will never overwrite (convert) all the hardlinks to any inode. If all the hardlinks are processed byfindthen-links +1will fail for the last one. The original inode will survive. This means if the original file is open and going to be modified in place (without changing the inode number) then the modification will survive somewhere (but it's hard to tell in advance which hardlink will be processed last and will keep its inode number). A situation when an open file gets totally unlinked, modified as such and removed from the filesystem as soon as it's closed shouldn't happen. - If
cpormvfails then the temporary file will survive. You may want to capture stderr to a file (2>some_file) and investigate later. -printwill act if the shell code succeeds. It's only there so you can see something happens.find-shis explained here: What is the second sh insh -c 'some shell code' sh?
Edit: As pointed out by Kamil, don't do the for x in $(find ...). Using the find -execdir sh -c format is the proper way to use find output. I'll leave my answer here however.
You can write a small Bash script or directly write a for loop in your bash shell:
$ for filename in $(find -type f -links +1); do echo "I found this file: ${filename}"; done
This example will take each line from the find command and place it in a ${filename} variable that you can then use. Here, we are just printing a I found this file: $filename for each one, but you can replace that with your copy command, which would probably look something like this:
$ for filename in $(find -type f -links +1); do echo "Copying ${filename} to ${filename}_copy.bin"; cp --reflink=always ${filename} ${filename}_copy.bin; done
Or, if you want to put this in a Bash script instead so it's easier to read and work with. Create a file copy_script.sh with these contents:
#!/bin/bash
for filename in $(find -type f -links +1); do echo "Copying ${filename} to ${filename}_copy.bin" cp --reflink=always "${filename}" "${filename}_copy.bin"
doneThen save and run with $ bash ./copy_script.sh