How to remove special 'M-BM-' character with sed
I have file that was created by copying content from DOCX document with LibreOffice into text file. I have modified file with sed to remove additional spaces and other stuff but then I noticed space that was immune to regular command:
sed -r 's:some-text :some-text:g' -i fileAfter using cat -A file I found out that this looks like this:
<p>M-BM- Lorem ipsum</p>How to remove it?
5 Answers
The M-BM- characters are an ASCII representation of byte sequence 0xc2 0xa0, which is the UTF8 encoding of unicode character A0 - a non-breaking space character. This character can be inserted in both LibreOffice and Microsoft Word documents using the key sequence Ctrl+Shift+SPACE.
For example if we create a new .odt document in LibreOffice and type ABCCtrl+Shift+SPACEDEF, then Save As... Text (ignoring the warning that the document may contain features that cannot be saved in that format), then view the resulting .txt file with cat:
$ cat nbsp.txt
ABC DEFand then again with the -v switch to show non-printing characters
$ cat -v nbsp.txt
M-oM-;M-?ABCM-BM- DEFNote that we also get an initial sequence M-oM-;M-? or hexadecimal 0xef 0xbb 0xbf which is the UTF8 byte order mark (BOM) consistent with the file type reported by the file command i.e.
$ file nbsp.txt
nbsp.txt: UTF-8 Unicode (with BOM) textUsing od to print the hexadecimal values in byte order we see
$ od -tx1 nbsp.txt
0000000 ef bb bf 41 42 43 c2 a0 44 45 46 0a
0000014It is possible to manipulate these characters using standard tools like sed or tr by specifying the hex codes as escape sequences e.g. to replace the non-breaking space with a plain ASCII space
$ sed 's/\xc2\xa0/ /g' nbsp.txt
ABC DEFChecking again with od confirms the replacement by an ordinary ASCII space 0x20 (decimal 32)
$ sed 's/\xc2\xa0/ /g' nbsp.txt | od -tx1
0000000 ef bb bf 41 42 43 20 44 45 46 0a
0000013In gnome-terminal (and maybe other UTF8-aware terminal emulators), it's also possible to enter the unicode code point value directly using the key sequence Ctrl+Shift+u followed by a hexidecimal value then the Enter key - the sequence shows up initially as u̲.̲.̲.̲ but then the character should compose when you hit Enter e.g. for the same non-breaking space replacement we can do
$ sed 's/Ctrl+Shift+ua0which displays as
$ sed 's/̲/̲u̲a̲0̲and then completes as
$ sed 's/ / /g' nbsp.txt
ABC DEFUsing cat -v we can confirm the M-BM- sequence has become an ordinary space
$ sed 's/ / /g' nbsp.txt | cat -v
M-oM-;M-?ABC DEFYou may want to look at more generic encoding converters such as iconv and uconv as well.
1After trying a lot of things, I have finally found solution. To replace that weird character with sed, you need to copy and paste exact text that contains that weird space near it, and then paste it directly into sed command:
sed -r 's:paste-here:<p>:g' -i file
Which will look like this in sed command:
sed -r 's:<p> :<p>:g' -i file
but it will work anyway.
You can remove ^M from the files directly via sed command, e.g.:
sed -i'.bak' s/\r//g *.*If you're happy with the changes, remove the .bak files:
rm -v *.bak "cat -v file " will show the non-printing characters in the file. Just redirect the output to some temporary file and use vim for replacing the M-BM- characters with nothing.
%s/M-BM- //g
Easiest solution.
little script for remove this devil M-BM- caracter ! ;) Just in case will be help any people.
#!/bin/bash
#############################################################################
# SCRIPT: M-BM-Remover.sh
# DESCRIPTION:
# This script will be able to detect hidden caracter "M-BM-",
# And/Or remove this !
# REVISIONS:
# 2014/06/11 YG
#____________________________________________________________________________
#
# PARAMETERS:
# > $1 :TARGET, (e.g. '"*.sh"' )
# > $2 :ACTION, (e.g. 'remove' )
# > $2 :BACKUP, (e.g. '' )
#
#############################################################################
TARGET=$1
ACTION=$2
BACKUP=$3
if [ "$TARGET" = "" ]
then echo 'Need to choose target file' echo 'M-BM-Remover [TARGET] [show/remove] [backup]' echo 'Example : M-BM-Remover "*.sh" remove backup' exit
fi
echo "ACTION = $ACTION";
echo "TARGET = $TARGET";
echo
if [ "$ACTION" = "show" ]
then for file in $TARGET do if [ "$file" != "M-BM-Remover.sh" ] then echo "Traitement de $file ..." cat -v $file | grep M-BM- NB=`cat -v $file | grep M-BM- | wc -l` echo "Occurence(s) : $NB" fi done
fi
if [ "$ACTION" = "remove" ] || [ "$ACTION" = "" ]
then for file in $TARGET do if [ "$file" != "M-BM-Remover.sh" ] then echo "Traitement de $file ..." NB=`cat -v $file | grep M-BM- | wc -l` if [ "$BACKUP" = "backup" ] then cat $file > $file.bak fi cat -v $file.bak | sed s/M-BM-//g > $file echo "Occurence(s) removed : $NB" fi echo done
fi