Extracting QuickTime mov_text subtitles with ffmpeg
I have a .mov file that has mov_text-formatted subtitle tracks that were created using the old (now deprecated) QuickTime 7 Pro's "Text Track" feature. This is what ffmpeg -i myfile.mov tells me about them:
Stream #0:4(eng): Subtitle: mov_text (text / 0x74786574), 640x116, 0 kb/s (default) Metadata: rotate : 0 creation_time : 2021-11-20T03:08:45.000000Z handler_name : Apple Text Media HandlerCurrently I still have access to QuickTime 7 Pro, where I can extract the subtitles to then convert to SubRip or some other format that can be read by other programs. However, I need to update my system and QuickTime 7 Pro doesn't run on macOS Catalina and above, so I am looking for a way to be able to still access those subtitles on newer systems. It seems ffmpeg is the way to go but I'm still stuck.
Here's what I've tried:
ffmpeg -i test.mov -map 0:s:0 -c copy -f data QT-subtitles.txtgives me the plain text of the subtitles but without timestamps, so it's pretty useless for converting the subtitles to a different format.
ffmpeg -i test.mov -map 0:s:0 QT-subtitles.srtcorrectly converts the subtitles to SubRip including timestamps. But it gets rid of line breaks and other formatting.
It also gives me the following error messages of invalid data found:
Stream mapping: Stream #0:4 -> #0:0 (mov_text (native) -> subrip (srt))
Press [q] to stop, [?] for help
[mov_text @ 0x7fa492823200] invalid UTF-8 byte in subtitle
[mov_text @ 0x7fa492823200] Invalid UTF-8 in decoded subtitles text; maybe missing -sub_charenc option
Error while decoding stream #0:4: Invalid data found when processing input
[mov_text @ 0x7fa492823200] invalid UTF-8 byte in subtitle
[mov_text @ 0x7fa492823200] Invalid UTF-8 in decoded subtitles text; maybe missing -sub_charenc option
Error while decoding stream #0:4: Invalid data found when processing input
size= 6kB time=00:02:04.30 bitrate= 0.4kbits/s speed=3.32e+04x
video:0kB audio:0kB subtitle:3kB other streams:0kB global headers:0kB muxing overhead: 82.697044%Suggestions from the comments below to specify -c text or -sub_charenc all give me the same results.
So, I'm hoping to find a way to extract the subtitle to a simple text file that is either still QTText-formatted (see below), or in some other way preserves linebreaks and other formatting. This is how the original QuickTime Text Track looks as text file when I extract and export it through QuickTime 7 Pro (see specs here).
{QTtext}{font:Verdana}{plain}{size:36}{textColor: 65535, 65535, 65535}{backColor: 0, 0, 0}{justify:default}{timeScale:30}{width:640}{height:116}{timeStamps:absolute}{language:0}{textEncoding:0}
[00:00:00.00]
Text here {bold} bold word {plain} text here
[00:00:01.12]
Text here text here text here Next line here (respects spaces)
[00:00:02.14]I could then simply write a script to convert the formatting tags such as {bold} etc. to html tags, which, I believe SubRip understands.
Is there a way to specify the output format through ffmpeg other than having it guess what I need from the file extension such as srt or txt? (I'm thinking in particular that if I was able to specify mov_text as output it might give me a QTText-formatted file like the one above ... but how would I do that?)
If ffmpeg can't do the job I'd be happy to try any other tools.