Travers Media Encoder

Outline

Travers Media Encoder is a cloud-based video and audio encoder that allows a user to convert media from one format to another over the web, with the actual transcoding being done on a server and then letting them download the result. Thanks to filepicker.io, the user can upload files from their own cloud storage service such as Dropbox without having the file locally. It is written in Python and uses the Django framework, libav encoding tools, SQLite and runs on linux.

Demo Video

In the video, the application is running locally on port 8000. The integration with filepicker.io is demonstrated by using a dropbox file not on the local machine. When the user hits the "Upload and Transcode" button, the file is uploaded to the application server. Then, it is transcoded and when complete, the page will refresh with a download link to where the file is being statically served from. We can observe the differences between the input and output by using avprobe, a media information tool included with libav. Lastly, we view the video and confirm it indeed works.

Input

avprobe version 0.8.6-6:0.8.6-1ubuntu2, Copyright (c) 2007-2013 the Libav developers
  built on Mar 30 2013 22:20:06 with gcc 4.7.2
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'input_small.mp4':
  Metadata:
    major_brand     : mp42
    minor_version   : 0
    compatible_brands: mp42isomavc1
    creation_time   : 2010-03-20 21:29:11
    encoder         : HandBrake 0.9.4 2009112300
  Duration: 00:00:05.56, start: 0.000000, bitrate: 551 kb/s
    Stream #0.0(und): Video: h264 (Constrained Baseline), yuv420p, 560x320, 465 kb/s, 30 fps, 30 tbr, 90k tbn, 60 tbc
    Metadata:
      creation_time   : 2010-03-20 21:29:11
    Stream #0.1(eng): Audio: aac, 48000 Hz, mono, s16, 83 kb/s
    Metadata:
      creation_time   : 2010-03-20 21:29:11

Output

avprobe version 0.8.6-6:0.8.6-1ubuntu2, Copyright (c) 2007-2013 the Libav developers
  built on Mar 30 2013 22:20:06 with gcc 4.7.2
[matroska,webm @ 0x1429b20] Estimating duration from bitrate, this may be inaccurate
Input #0, matroska,webm, from 'input_small.mp4_transcoded.mkv':
  Metadata:
    MAJOR_BRAND     : mp42
    MINOR_VERSION   : 0
    COMPATIBLE_BRANDS: mp42isomavc1
    CREATION_TIME   : 2010-03-20 21:29:11
    ENCODER         : Lavf53.21.1
  Duration: 00:00:05.56, start: 0.000000, bitrate: N/A
    Stream #0.0(und): Video: mpeg4 (Simple Profile), yuv420p, 560x320 [PAR 1:1 DAR 7:4], 30 fps, 30 tbr, 1k tbn, 30 tbc (default)
    Metadata:
      CREATION_TIME   : 2010-03-20 21:29:11
      LANGUAGE        : und
    Stream #0.1(eng): Audio: vorbis, 48000 Hz, mono, s16 (default)
    Metadata:
      CREATION_TIME   : 2010-03-20 21:29:11
      LANGUAGE        : eng

The audio, video and container formats are all different in the output file, and as seen in the demo the output plays just as well as the input, but now uses completely free codecs.

Challenges

I decided to do this project mainly because I wanted to seriously challenge myself. For that reason, I decided to learn the Python programming language, the Django framework and the libav encoding tools. I have a keen interest in scalability so I planned for the application to horizontally scale with ease from the beginning. I researched many solutions for this and found celery combined with redis message queue to be the optimal solution.

However, I quickly found that I didn't have the resources needed to confirm if the software could run on multiple machines, so I didn't actually implement distributed asynchronous messaging but kept it in mind so it would be easy to do if needed (in fact, it would only be about 5 lines of code).

My main challenge was the fact the encoding was much more difficult that I anticipated. The optimal settings used for each codec is equivalent to 2n^2, because the settings to encode from a => b are not the same as b => a. Note that this is an optimistic estimate since more seasoned encoders will want fine-grained control over each bitrate. Because of this, my demonstration is optimized for converting .mp4 files to .mkv as writing out endless settings files was not the reason I pursued this project.

Learning a new programming language and then working inside a huge, mature framework written in it was no easy task. I started off learning with reading through the entire official python tutorial and most of the documentation related to collections, then watched all of Google Code's video lectures. I am always anxious to write idiomatic code so I read over the pep8 style guidelines many times to make sure my python code looked like python code.

One of the biggest problems I faced during development was parsing the console output of the avconv subprocess. The problem here is that python's build-in subprocess module sees a new line character as the signal to return that line to the program. However, once encoding starts avconv continuously write to the same line and the output cannot be read until encoding is finished. One of my goals was to include a progress bar during encoding and if I could not consume the current output this could not be done.

After a lot of frustration and research I managed to achieve my goal using a third-party library called pexpect. pexpect is designed for controlling subprocesses rather than just executing them and it allowed me to parse the line of text that subprocess would not.

Development Environment

I developed the application on Ubuntu linux using Sublime Text 2 for my text editor. I used an application called virtualenv, which allows isolated python installations so package version dependencies do not collide at an OS level. This is recommended practice for any project of sufficient complexity. While writing the application, I rigorously tested the code using python's built-in test runner and used pylint to provide static analysis which made up for the lack on an IDE. For package management I used pip, which provides easy downloads of packages in PyPy, the python package database. For version control, I used git on github since I intended for the application to be open-source from the beginning.

Installation

This assumes you are running on ubuntu linux machine with python already installed. First we install our dependencies using the command-line.

sudo apt-get install git pip libav

git will allow us to download the code from github. pip is a python package manager that'll let us easily install our third-party libraries and libav is the encoding tools used for the actual conversion.

Next we prepare a directory to download the code to and clone the git repository.

mkdir git-repos
cd git-repos/
git clone https://github.com/thethomaseffect/travers-media-tools.git

Now we install the required python libraries (the web framework is one of them!)

pip install pexpect django

Lastly, your own filepicker.io API key must be added to /encoder/settings.py. For now, I've left mine there so you shouldn't need to do anything, but I'll very likely remove it in the near future. If in doubt, check the git commit history.

Running

Now we're ready to go! We'll build the database and run the test server! If you're prompted to create a superuser/admin type 'no' and hit return.

cd git-repos/travers-media-tools/encoder/
python manage.py syncdb
python manage.py runserver

if you're already running something on port 8000, just add a space and a new port number after runserver.

If everything was done right, the server should now be running and available on localhost:8000. Open your web browser, go to the address and upload a media file. I can only guarantee that the included input_small.mkv will work correctly, though many more should work. After hitting upload and waiting a while the page will refresh and you should be able download your new file. congrats!

Interesting Code Snippets

Calculating the percentage of encoding complete

    def get_time_elapsed(self):
        """
        Returns a int between 0 and 100 representing the percentage
        of encoding currently completed.
        """
        time_elapsed_regex = ".*?time=([+-]?\\d*\\.\\d+)(?![-+0-9\\.])"
        regex_group = re.compile(time_elapsed_regex, re.DOTALL)
        if not self.alive():
            return -1
        subprocess_output = self.encoder_thread.subprocess_output
        percentage_elapsed = lambda x: float(
            (x / self.media_object.media_duration) * 100)
        if subprocess_output:
            regex_match = regex_group.search(subprocess_output[-1])
            if regex_match:
                # INFO: In Python 3.x round returns an int so the cast to float
                # can safely be removed
                current_percentage = percentage_elapsed(
                    round(float(regex_match.group(1))))
                return int(current_percentage)
        return -1

def get_media_duration(self):
    """
    Spawns an avprobe process to get the media duration.

    Spawns an avprobe process and saves the output to a list, then uses
    regex to find the duration of the media and return it as an integer.
    """
    subprocess_command = "/usr/bin/avprobe " + self.input_filename
    info_process = pexpect.spawn(subprocess_command)
    subprocess_output = info_process.readlines()
    info_process.close

    duration_regex = ".*?Duration: .*?(\\d+):(\\d+):(\\d+).(\\d+)"
    regex_group = re.compile(duration_regex, re.IGNORECASE | re.DOTALL)

    def round_milliseconds(milliseconds):
        return round(float(milliseconds) / 100)

    for line in subprocess_output:
        regex_match = regex_group.search(line)
        if regex_match:
            # Return the total duration in seconds
            return ((int(regex_match.group(1)) * 3600) +
                    (int(regex_match.group(2)) * 60) +
                    int(regex_match.group(3)) +
                    int(round_milliseconds(regex_match.group(4))))
    # Not found so it's possible the process terminated early or an update
    # broke the regex. Unlikely but we must return something just in case.
    return -1

Note that this demonstrates Python's nested functions. round_milliseconds is only viable in the scope of the get_media_duration function. percentage_elapsed uses a lambda anonymous function which provides exactly the same functionality but is more suited to one-line expressions.

Encoder Thread

class EncodingThread(threading.Thread):

    """Do a thing"""

    def __init__(self, media_object):
        """Does a thing"""
        threading.Thread.__init__(self)
        self.media_object = media_object
        self.subprocess_output = []

    def run(self):
        """Does a thing"""
        # Some sensible defaults, such as overwriting if destination already
        # exists
        subprocess_command = "/usr/bin/avconv -i " + \
            self.media_object.input_filename + " -y " + \
            self.media_object.output_filename
        encoding_process = pexpect.spawn(subprocess_command)
        print ("Started %s" % subprocess_command)
        compiled_regex_list = \
            encoding_process.compile_pattern_list([pexpect.EOF, '(.+)'])
        while True:
            i = encoding_process.expect_list(compiled_regex_list, timeout=None)
            if i == 0:  # EOF
                print ("%s Process Finished" % subprocess_command)
                break
            else:
                output_line = encoding_process.match.group(0).rstrip()
                # Check the index of the first occurrence of frame=
                # Sadly this varies so we must also check 0
                if output_line.find("frame=") is 7 or 0:
                    self.subprocess_output.append(output_line)
        encoding_process.close

This demonstrates how object-orientated python looks, and also the correct way thread classes should be written (ie. only one method - run()). Anywhere the self keyword can be found there's likely to be some OO at work. Interestingly, despite encouraging an imperative style, everything in python is an object. This can easily be shown with the following code in the python interpreter:

number = 1
dir(number)

This will print all of number's methods, proving it is an object.

The above code also shows how pexpect works. There's plenty of regular expressions too for those who are fans!

Serving static files through Django

MEDIA_DOWNLOADS_ROOT = os.path.join(os.getcwd(),'media/finished/')
# ...
# Serves files from domain/downloads/
(r'^downloads/(?P<path>.*)$', 'django.views.static.serve',
    {'document_root': MEDIA_DOWNLOADS_ROOT}),

For production environments, something like apache would be more suited to this task but for testing it's very convenient!

Running POST requests on the server

def home(request):
    message = None
    if request.method == "POST":
        print "POST parameters: ", request.POST
        print "Files: ", request.FILES

        # Build the form
        form = models.UploadModelForm(request.POST, request.FILES)

        if form.is_valid():
            # Read the data and upload it to the location defined in UploadModel
            form.save()

            # Save the name of the uploaded file
            uploaded_filename = form.cleaned_data['filepicker_file'].name

            # Build the full input path to the file
            MEDIA_UPLOAD_ROOT = os.path.join(os.getcwd(),'media/uploads/')
            input_path = MEDIA_UPLOAD_ROOT + uploaded_filename

            # Build the output filename
            output_filename = uploaded_filename + "_transcoded.mkv"

            # Build the output path
            MEDIA_DOWNLOADS_ROOT = os.path.join(os.getcwd(),'media/finished/')
            output_path = MEDIA_DOWNLOADS_ROOT + output_filename

            # Transcode the file
            video_encoder = Encoder(input_path, output_path)
            video_encoder.start()
            while video_encoder.alive():
                print(video_encoder.get_time_elapsed())
                time.sleep(0.1)

            # Return the path to the file
            message = output_filename
        else:
            message = None
    else:
        form = models.UploadModelForm()

    return render(request, "home.html", {'form': form, 'message': message})

This is the code that processes the request, encodes the video and returns the info needed to generate the download link. To demonstrate the percentage function works it prints to the console, but in a production version of the application an AJAX function on the client-side would request this instead to update a progress bar. As Javascript, JQuery, AJAX and CSS were outside the scope of my project I didn't implement this, but it could be done very quickly.

File Upload Database Model

from django.db import models
from django import forms
import django_filepicker


class UploadModel(models.Model):
    # FPFileField renders as a filepicker dragdrop widget, but when accessed will
    # provide a File-like interface.
    filepicker_file = django_filepicker.models.FPFileField(upload_to='uploads')

class UploadModelForm(forms.ModelForm):
    class Meta:
        model = UploadModel

This is the code that is used to automatically generate what's required in the database to allow the use of filepicker.io for uploads.

Final Thoughts

I really enjoyed doing this project and learned a lot. I'd like to continue working on it as an open-source project as there is a real gap in the market for a product like this. The only people offering a similar service are charging for it so it has potential to grow as a free open-source software project. I think if possible I would wrap a C++ API instead of console output and compile my own version of libav so that the application is less sensitive to regex-breaking updates.

Learning a modern scripting language like python is very valuable and I've found it my go-to language whenever I want to quickly demonstrate some idea or algorithm. Thanks to implementations such as IronPython I've been able use it together with C# to enhance software such as games where scripting is highly valuable yet performance is critical.

Travers Media Tools

A avconv-based media transcoder for the cloud that scales horizontally.