I wanted to find the durations of a bunch of MP4 files located out on the net – durations for the introduction videos for the top Kickstarter projects.
But I wanted to do this quickly. Downloading all those MP4 files would take too long. A little bit of research revealed that MP4 files files set up for streaming have their metadata (or moov atom) at the beginning of the file.
Now I need a way to read just the metadata, without getting the entire file.
More research reveals that I can use curl and dd to get the first bytes of a file. For some reason ‘curl -r’ doesn’t work.
So now we’re ready to go.
I made a file that had one Kickstarter project URL per line. Here’s a couple of them:
This script will load the Kickstarter project page, and get the URL-encoded download link for the project’s introductory video, if there is one:
Now we need to URL-decode the URLs:
Now we get the durations from the video urls, you’ll need Python, pip, and virtualenvwrapper installed. We make a Python virtual environment, and install hsaudiotag module to decode the mp4 metadata:
This code uses curl and dd to download only the first 512-byte block of the MP4 file.
Now we analyze the durations using a simple R script, I am on a Mac so I need to use Homebrew to install R:
Output for the top 100 Kickstarter technology projects (by amount raised) – all numbers are in seconds:
The average duration of the top 100 Kickstarter videos is 203.3 seconds, or just about 3.38 minutes.
Thanks to: