One of my side projects has been to transcode my entire media library from my preferred format (Matroska or MKV) to MP4. As much as I don't like it (MP4 doesn't support multiple soft-subtitles in the same file), the Roku doesn't support anything but MP4 files.

I have a pretty good number of video files, mostly because I break individual episodes out (so 11-22 per season) and I'm fond of anime (Bleach and Naturo are both over a hundred episodes), but I thought I was doing pretty good on the transcoding. However, since "I think" isn't really useful, I wrote up a fairly decent Bash script that tells me how many files I have of each type and it lets me break it up by directory.

This is mainly to see where I am in the transcode process but also to figure out which directory or sub-directory I need to convert next. When run, I get output like this:

$ ./convert-status A-H/?
       Count  MP4  MKV  AVI  MOV  MPG
       ----- ---- ---- ---- ---- ----
A-H/A    250   56  133   61    0    0
A-H/B    731  443   38  250    0    0
A-H/C    305  188   81   36    0    0
A-H/D    224  157   59    8    0    0
A-H/E    305  258   27    8    0   12
A-H/F    589  325  123  141    0    0
A-H/G    435    5  183  247    0    0
A-H/H    659   12   61  586    0    0
       ----- ---- ---- ---- ---- ----
        3498 1444  705 1337    0   12
$

From the above format, you can see that I've gotten about 1.4k files converted into the proper format but still got quite a few left to convert. Most of these are episodes (Bleach and Hercules) but at least it gives me a sense of progress.

The Bash script itself looks like this:

#!/bin/bash

# Wipe out our temporary directory, if we have one. This isn't likely
# since we are using $$ to get the PID of the process.
rm -f /tmp/convert-status-$$

# Figure out the width of the files. We do this so the columns line up
# pretty and has absolutely no impact on the functionality.
for dir in "$@"
do
    # Ignore non-directories.
    if [ ! -d "$dir" ]
    then
        # Create a generic placeholder for all non-directories.
        echo "-FILES-" >> /tmp/convert-status-$$
        continue
    fi

    # Include the directory name.
    echo "$dir" >> /tmp/convert-status-$$
done

# This fancy little bit of AWK (which is from the Internet and I don't
# exactly grok) figures out the maximum length string in the file we
# just created. After this run, $m will contain the longest string
# length (as an integer).
m=$(awk ' { if ( length > L ) { L=length} }END{ print L}' /tmp/convert-status-$$)

# Keep track of all the totals. We use printf even though we could
# use echo just so all the output calls are identical.
printf "%-${m}s  Count  MP4  MKV  AVI  MOV  MPG\n"
printf "%-${m}s  ----- ---- ---- ---- ---- ----\n"

# These are the counters for the grand totals (max) and the
# non-directory counts (files).
max=0
max_mkv=0
max_mp4=0
max_avi=0
max_mov=0
max_mpg=0

files=0
files_mkv=0
files_mp4=0
files_avi=0
files_mov=0
files_mpg=0

# Go through a list of all the directories in the parameters.
for dir in "$@"
do
    # Ignore non-directories.
    if [ ! -d "$dir" ]
    then
        # If this is a file, we just add to the counters.
        case ${dir#.} in
            "mp4") files_mp4=$(expr $files_mp4 + 1);;
            "mkv") files_mkv=$(expr $files_mkv + 1);;
            "avi") files_avi=$(expr $files_avi + 1);;
            "mov") files_mov=$(expr $files_mov + 1);;
            "mpg") files_mpg=$(expr $files_mpg + 1);;
            ) continue;;
        esac

        # Increment the general file counter.
        files=$(expr $files + 1)

        # Don't bother doing anything else.
        continue
    fi

    # Count the number of files of a given type inside that
    # directory. Since we are using find, this will recursively get
    # all the files inside subdirectories also. We don't care about
    # the file names, just how many we find. This does have a slight
    # bug if you have a .filename.extension file (which I use for
    # temporary files), but usually that is okay.
    mkv=$(find "$dir" -name ".mkv" | wc -l)
    mp4=$(find "$dir" -name ".mp4" | wc -l)
    avi=$(find "$dir" -name ".avi" | wc -l)
    mov=$(find "$dir" -name ".mov" | wc -l)
    mpg=$(find "$dir" -name "*.mpg" | wc -l)

    # Add up all the counts above so we have a "total files per
    # directory" variable.
    count=$(expr $mkv + $mp4 + $avi + $mov + $mpg)

    # Increment the grand totals for the bottom line.
    max_mp4=$(expr $max_mp4 + $mp4)
    max_mkv=$(expr $max_mkv + $mkv)
    max_avi=$(expr $max_avi + $avi)
    max_mov=$(expr $max_mov + $mov)
    max_mpg=$(expr $max_mpg + $mpg)
    max=$(expr $max + $count)

    # Write out a single record for everything, but only if we have
    # something.
    if [ $count -gt 0 ]
    then
        printf "%-${m}s  %5d %4d %4d %4d %4d %4d\n" \
            "$dir" \
            $count $mp4 $mkv $avi $mov $mpg
    fi
done

# Write out the file totals, but only if we have at least one file.
if [ $files -gt 0 ]
then
    printf "%-${m}s  %5d %4d %4d %4d %4d %4d\n" \
        "-FILES-" \
        $files $files_mp4 $files_mkv $files_avi $files_mov $files_mpg
fi

# Write out the grand totals.
max=$(expr $max + $files)
max_mp4=$(expr $max_mp4 + $files_mp4)
max_mkv=$(expr $max_mkv + $files_mkv)
max_avi=$(expr $max_avi + $files_avi)
max_mov=$(expr $max_mov + $files_mov)
max_mpg=$(expr $max_mpg + $files_mpg)

printf "%-${m}s  ----- ---- ---- ---- ---- ----\n"
printf "%-${m}s  %5d %4d %4d %4d %4d %4d\n" \
    "" \
    $max $max_mp4 $max_mkv $max_avi $max_mov $max_mpg

It should be pretty easy to convert it to fit other file formats (say text, OpenOffic.org, StarOffice, and Word) or just to get an idea of the file types.

2012-08-13