Getting number of different files

One of my side projects has been to transcode my entire media library from my preferred format (Matroska or MKV) to MP4. As much as I don't like it (MP4 doesn't support multiple soft-subtitles in the same file), the Roku doesn't support anything but MP4 files.

I have a pretty good number of video files, mostly because I break individual episodes out (so 11-22 per season) and I'm fond of anime (Bleach and Naturo are both over a hundred episodes), but I thought I was doing pretty good on the transcoding. However, since "I think" isn't really useful, I wrote up a fairly decent Bash script that tells me how many files I have of each type and it lets me break it up by directory.

This is mainly to see where I am in the transcode process but also to figure out which directory or sub-directory I need to convert next. When run, I get output like this:

$ ./convert-status A-H/?
       Count  MP4  MKV  AVI  MOV  MPG
       ----- ---- ---- ---- ---- ----
A-H/A    250   56  133   61    0    0
A-H/B    731  443   38  250    0    0
A-H/C    305  188   81   36    0    0
A-H/D    224  157   59    8    0    0
A-H/E    305  258   27    8    0   12
A-H/F    589  325  123  141    0    0
A-H/G    435    5  183  247    0    0
A-H/H    659   12   61  586    0    0
       ----- ---- ---- ---- ---- ----
        3498 1444  705 1337    0   12

From the above format, you can see that I've gotten about 1.4k files converted into the proper format but still got quite a few left to convert. Most of these are episodes (Bleach and Hercules) but at least it gives me a sense of progress.

The Bash script itself looks like this:


Wipe out our temporary directory, if we have one. This isn't likely

since we are using $$ to get the PID of the process.

rm -f /tmp/convert-status-$$

Figure out the width of the files. We do this so the columns line up

pretty and has absolutely no impact on the functionality.

for dir in “$@” do # Ignore non-directories. if [ ! -d “$dir” ] then # Create a generic placeholder for all non-directories. echo “-FILES-” >> /tmp/convert-status-$$ continue fi

# Include the directory name.
echo "$dir" >> /tmp/convert-status-$$


This fancy little bit of AWK (which is from the Internet and I don't

exactly grok) figures out the maximum length string in the file we

just created. After this run, $m will contain the longest string

length (as an integer).

m=$(awk ' { if ( length > L ) }END' /tmp/convert-status-$$)

Keep track of all the totals. We use printf even though we could

use echo just so all the output calls are identical.

printf “%-$s Count MP4 MKV AVI MOV MPG\n” printf “%-$s —– —- —- —- —- —-\n”

These are the counters for the grand totals (max) and the

non-directory counts (files).

max=0 max_mkv=0 max_mp4=0 max_avi=0 max_mov=0 max_mpg=0

files=0 files_mkv=0 files_mp4=0 files_avi=0 files_mov=0 files_mpg=0

Go through a list of all the directories in the parameters.

for dir in “$@” do # Ignore non-directories. if [ ! -d “$dir” ] then # If this is a file, we just add to the counters. case ${dir#*.} in “mp4”) files_mp4=$(expr $files_mp4 + 1);; “mkv”) files_mkv=$(expr $files_mkv + 1);; “avi”) files_avi=$(expr $files_avi + 1);; “mov”) files_mov=$(expr $files_mov + 1);; “mpg”) files_mpg=$(expr $files_mpg + 1);; *) continue;; esac

    # Increment the general file counter.
    files=$(expr $files + 1)

    # Don't bother doing anything else.

# Count the number of files of a given type inside that
# directory. Since we are using `find`, this will recursively get
# all the files inside subdirectories also. We don't care about
# the file names, just how many we find. This does have a slight
# bug if you have a .filename.extension file (which I use for
# temporary files), but usually that is okay.
mkv=$(find "$dir" -name "*.mkv" | wc -l)
mp4=$(find "$dir" -name "*.mp4" | wc -l)
avi=$(find "$dir" -name "*.avi" | wc -l)
mov=$(find "$dir" -name "*.mov" | wc -l)
mpg=$(find "$dir" -name "*.mpg" | wc -l)

# Add up all the counts above so we have a "total files per
# directory" variable.
count=$(expr $mkv + $mp4 + $avi + $mov + $mpg)

# Increment the grand totals for the bottom line.
max_mp4=$(expr $max_mp4 + $mp4)
max_mkv=$(expr $max_mkv + $mkv)
max_avi=$(expr $max_avi + $avi)
max_mov=$(expr $max_mov + $mov)
max_mpg=$(expr $max_mpg + $mpg)
max=$(expr $max + $count)

# Write out a single record for everything, but only if we have
# something.
if [ $count -gt 0 ]
    printf "%-${m}s  %5d %4d %4d %4d %4d %4d\n" \
        "$dir" \
        $count $mp4 $mkv $avi $mov $mpg


Write out the file totals, but only if we have at least one file.

if [ $files -gt 0 ] then printf “%-$s %5d %4d %4d %4d %4d %4d\n”
$files $files_mp4 $files_mkv $files_avi $files_mov $files_mpg fi

Write out the grand totals.

max=$(expr $max + $files) max_mp4=$(expr $max_mp4 + $files_mp4) max_mkv=$(expr $max_mkv + $files_mkv) max_avi=$(expr $max_avi + $files_avi) max_mov=$(expr $max_mov + $files_mov) max_mpg=$(expr $max_mpg + $files_mpg)

printf “%-$s —– —- —- —- —- —-\n” printf “%-$s %5d %4d %4d %4d %4d %4d\n”
$max $max_mp4 $max_mkv $max_avi $max_mov $max_mpg

It should be pretty easy to convert it to fit other file formats (say text,, StarOffice, and Word) or just to get an idea of the file types.