Up a Level
Getting number of different files

One of my side projects has been to transcode my entire media library from my preferred format (Matroska or MKV) to MP4. As much as I don't like it (MP4 doesn't support multiple soft-subtitles in the same file), the Roku doesn't support anything but MP4 files.

I have a pretty good number of video files, mostly because I break individual episodes out (so 11-22 per season) and I'm fond of anime (Bleach and Naturo are both over a hundred episodes), but I thought I was doing pretty good on the transcoding. However, since "I think" isn't really useful, I wrote up a fairly decent Bash script that tells me how many files I have of each type and it lets me break it up by directory.

This is mainly to see where I am in the transcode process but also to figure out which directory or sub-directory I need to convert next. When run, I get output like this:

$ ./convert-status A-H/?
       Count  MP4  MKV  AVI  MOV  MPG
       ----- ---- ---- ---- ---- ----
A-H/A    250   56  133   61    0    0
A-H/B    731  443   38  250    0    0
A-H/C    305  188   81   36    0    0
A-H/D    224  157   59    8    0    0
A-H/E    305  258   27    8    0   12
A-H/F    589  325  123  141    0    0
A-H/G    435    5  183  247    0    0
A-H/H    659   12   61  586    0    0
       ----- ---- ---- ---- ---- ----
        3498 1444  705 1337    0   12

From the above format, you can see that I've gotten about 1.4k files converted into the proper format but still got quite a few left to convert. Most of these are episodes (Bleach and Hercules) but at least it gives me a sense of progress.

The Bash script itself looks like this:


Wipe out our temporary directory, if we have one. This isn't likely

since we are using $$ to get the PID of the process.

rm -f /tmp/convert-status-$$

Figure out the width of the files. We do this so the columns line up

pretty and has absolutely no impact on the functionality.

for dir in “$@” do # Ignore non-directories. if [ ! -d “$dir” ] then # Create a generic placeholder for all non-directories. echo “-FILES-” >> /tmp/convert-status-$$ continue fi

# Include the directory name.
echo "$dir" >> /tmp/convert-status-$$


This fancy little bit of AWK (which is from the Internet and I don't

exactly grok) figures out the maximum length string in the file we

just created. After this run, $m will contain the longest string

length (as an integer).

m=$(awk ' { if ( length > L ) }END' /tmp/convert-status-$$)

Keep track of all the totals. We use printf even though we could

use echo just so all the output calls are identical.

printf “%-$s Count MP4 MKV AVI MOV MPG\n” printf “%-$s —– —- —- —- —- —-\n”

These are the counters for the grand totals (max) and the

non-directory counts (files).

max=0 max_mkv=0 max_mp4=0 max_avi=0 max_mov=0 max_mpg=0

files=0 files_mkv=0 files_mp4=0 files_avi=0 files_mov=0 files_mpg=0

Go through a list of all the directories in the parameters.

for dir in “$@” do # Ignore non-directories. if [ ! -d “$dir” ] then # If this is a file, we just add to the counters. case ${dir#*.} in “mp4”) files_mp4=$(expr $files_mp4 + 1);; “mkv”) files_mkv=$(expr $files_mkv + 1);; “avi”) files_avi=$(expr $files_avi + 1);; “mov”) files_mov=$(expr $files_mov + 1);; “mpg”) files_mpg=$(expr $files_mpg + 1);; *) continue;; esac

    # Increment the general file counter.
    files=$(expr $files + 1)

    # Don't bother doing anything else.

# Count the number of files of a given type inside that
# directory. Since we are using `find`, this will recursively get
# all the files inside subdirectories also. We don't care about
# the file names, just how many we find. This does have a slight
# bug if you have a .filename.extension file (which I use for
# temporary files), but usually that is okay.
mkv=$(find "$dir" -name "*.mkv" | wc -l)
mp4=$(find "$dir" -name "*.mp4" | wc -l)
avi=$(find "$dir" -name "*.avi" | wc -l)
mov=$(find "$dir" -name "*.mov" | wc -l)
mpg=$(find "$dir" -name "*.mpg" | wc -l)

# Add up all the counts above so we have a "total files per
# directory" variable.
count=$(expr $mkv + $mp4 + $avi + $mov + $mpg)

# Increment the grand totals for the bottom line.
max_mp4=$(expr $max_mp4 + $mp4)
max_mkv=$(expr $max_mkv + $mkv)
max_avi=$(expr $max_avi + $avi)
max_mov=$(expr $max_mov + $mov)
max_mpg=$(expr $max_mpg + $mpg)
max=$(expr $max + $count)

# Write out a single record for everything, but only if we have
# something.
if [ $count -gt 0 ]
    printf "%-${m}s  %5d %4d %4d %4d %4d %4d\n" \
        "$dir" \
        $count $mp4 $mkv $avi $mov $mpg


Write out the file totals, but only if we have at least one file.

if [ $files -gt 0 ] then printf “%-$s %5d %4d %4d %4d %4d %4d\n”
$files $files_mp4 $files_mkv $files_avi $files_mov $files_mpg fi

Write out the grand totals.

max=$(expr $max + $files) max_mp4=$(expr $max_mp4 + $files_mp4) max_mkv=$(expr $max_mkv + $files_mkv) max_avi=$(expr $max_avi + $files_avi) max_mov=$(expr $max_mov + $files_mov) max_mpg=$(expr $max_mpg + $files_mpg)

printf “%-$s —– —- —- —- —- —-\n” printf “%-$s %5d %4d %4d %4d %4d %4d\n”
$max $max_mp4 $max_mkv $max_avi $max_mov $max_mpg

It should be pretty easy to convert it to fit other file formats (say text, OpenOffic.org, StarOffice, and Word) or just to get an idea of the file types.