MfGames Media: Getting information about a movie

In the process of not writing, I decided to clean up my media server. One of the drawbacks of switching from MythTV front end to Roku is that the Roku doesn't have the flexibility that the MythTV had. I couldn't use MPlayer which had everything in a single file. Instead, I had to break things out. One advantage is that the files are grouped together so the metadata is with the cover image is with the video files.

Now, I want my Roku to be as pretty as my MythTV. This means I want to get an image on the display and maybe some information about the movie. On the MythTV, I just had a screenshot of whatever was in the movie 20% into it. It was... okay, mainly because I didn't have a good poster downloader. I also had poor metadata information (year, plot, etc) because the MythTV's process always hung so I killed it.

I wanted more.

With the Roku, I have the following two files:

  • Core, The.mp4
  • Core, The.srt (subtitles)

The poster would be a JPG file with the same name. But, I wanted to automated retrieving the file. So, I wrote a few problems. These are put into mfgames-media because that is my umbrella utility suite for media-related applications.

Getting the TMDB ID

The first step was getting the themoviedb ID for the file. I could do this by hand, but I'm lazy. So, I added mfgames-tmdb id which takes search terms.

$ mfgames-tmdb id "6th Day, The"
8452
$

This is a very simple output. It just grabs the first move that matches. For all put a few, this was good enough. I made it so it handles "The 6th Day" and "6th Day, The". It can also handle "6th Day, The (2000)" because Hollywood occasionally duplicates names (and remakes everything).

This is does one thing (hopefully) well. The main purpose is that I can use that number to do further output.

Caching Information

I like to cache information. Mainly to be polite to the systems giving free serivces but also because it is a lot faster if I don't have to touch the network. So, I decided to get something that downloads the TMDB information into a JSON file and stores it.

$ mfgames-tmdb json 8452 -o "6th Day, The.tmdb"
$ head 6th\ Day\,\ The.tmdb 
{
    "adult": false,
    "backdrop_path": "/qmo6cDZwZED5F7I1KnI2tk0gFda.jpg",
    "belongs_to_collection": null,
    "budget": 82000000,
    "genres": [
        {
            "id": 28,
            "name": "Action"
$

Having a TMDB-specific file doesn't work right now, but it is "good enough" for what I need at the moment. Eventually, I think I want to create a .meta file that contains all this information plus data from IMDB and TVDB (thetvdb site).

If the file already exists, it refuses to overwrite it. This way, I can script a "get the TMDB for everything" and not worry about losing stuff I already have or replacing correct data with incorrect one.

Side note, searching for "One, The" gets the Jet Li movie, but searching for "One, The (2001)" does not. These are the manual fixing steps I had to do.

Shell Games

Getting through all this was easy, for a single file. But I'm not going to write out the commands for every movie I had. So, I used the power of shell scripts and find commands to tag my entire library.

#!/bin/bash

USAGE: get-tmdb-json.sh *.mkv

This attempts to download the TMDB information for a given file. If

the file exists, it will not be downloaded. This assumes there is

only a single period in the filename.

Go through the input files.

for file in “$@” do # Figure out the basename of the file and relative directory path. dir=$(dirname “$file”) base=$(basename “$file” | sed 's@.\w*$@@')

echo "Processing: $base"

# First check to see if we need the JSON file.
json="$dir/$base.tmdb"

if [ ! -f "$json" ]
then
    # We need to download the file.
    echo "  Downloading TMDB JSON"

    # Try to get the TMDB ID for the file.
    tmdb_id=$(mfgames-tmdb id "$base")

    if [ "x$tmdb_id" != "x" ]
    then
        # Taking our time...
        echo "  Sleeping..."
        sleep 7

        # Download the file.
        echo "  TMDB ID: $tmdb_id"
        mfgames-tmdb json $tmdb_id "--output=$json"

        # Check to see if this appears to be an HTML file.
        if grep "<h1>" "$json" > /dev/null
        then
            echo "  File could not be downloaded"
            rm -f "$json"
        fi
    else
        echo "  Could not identify ID file"
    fi

    # Throttle slightly so TMDB doesn't hate us.
    echo "  Sleeping..."
    sleep 7
else
    # Just identify that we have the file already.
    echo "  TMDB JSON already downloaded: $json"
fi

done

With the above shell script, it goes through and creates the .tmdb for everything. I used sleeps to slow down (though it doesn't look like I had to). It took a few hours to run but then I had a .tmdb for all the movies I created.

$ find -name *.mkv -print0 | xargs get-tmdb-json.sh
... lots of output
$

Moving Forward

With this, the main thing is the cache file. I found out later in the process that I needed more information that TMDB didn't have. In specific, TMDB didn't have actors or directors, which meant I couldn't populate all the information that Roku/Roksbox uses. To do that, I have to go into IMDB (which I prefer not to). If I followed the current pattern, this would be a .imdb file. And I also needed information from thettvdb for the episodes (and a .tvdb file).

Before I release MfGames Media, I'm going to consolidate them into a .meta or .meta.json file and do processing that way.

Metadata

Categories:

Tags: