Why the video tag won't save us

Wed 28 January 2009

I see Mozilla is forking over some cash to improve Ogg Theora. It's a nice symbolic gesture, certainly. But Mozilla is fighting an uphill battle, in the snow, upside down and underwater. I've been sitting on this post for quite a while, so this announcement is a good excuse to dust it off and finish it. If you have a glimmer of hope that the <video&gt tag will somehow undo the Flash dependencies that Google Video and Youtube hath wrought, I have news for you. Flash video is the path of least resistance to what video hosting sites want: revenue.

In 2007 a large consortium of engineers representing companies discussed the adding a <video&gt tag to HTML. A number of papers were published for the event, describing company's positions. Nokia wrote about how they need hardware decoder support for whatever codecs are chosen. Mozilla wrote about the need for an open and free "baseline" codec. Adobe's is surprisingly insightful. Youtube seems to have the distinction of being the only company to send a marketing rep to an engineering forum. An excerpt from their position paper :

It is estimated that the total revenue for online video will expand to $6.3 billion by 2012. Right now, online video content is responsible for an estimated $538 million in the US alone. That may seem like a small slice of the $20+ billion spent on online advertising in 2007, but consider this: while online ad revenues are growing 26% per year, online video ad revenues are growing 55.5% per year. The potential for Video monetization is clearly there – just a matter of when.

I'm not misrepresenting Youtube here; their entire paper reads like this. Reading between the lines of Youtube's otherwise incomprehensible position lies the statement that delivering advertising is all they care about. Not consumer preference, not saving energy or hardware support or bandwidth consumption. Ogg and Javascript bring nothing to the Youtube table.

Youtube isn't alone; several video sites are banking not on banner ads or customized channel pages, but on targetted per viewer ads, similar Google's AdSense. A traditional broadcast video has markers for broadcasters to insert ads, mixing them together in real time. These two ideas are fundamentally incompatible. You can't mix in ads server side and still hand out targeted ads to millions of people. Instead you can use Flash to overlay interactive ads on top of videos and do the processing on client side, where there's a lot of unused CPU. Some places have already started this, and it's quite annoying. This practice is why Flash can't use Xv -- Xv is in CMYK, and the non-video parts of Flash are in RGB. Hence the high loads when watching flash video; it decodes the video into CMYK, converts to RGB, then draws crap on top of it, and sends it to be drawn.

Video hosting sites are increasingly closing their services off behind walled gardens in preparation for "monetization". There are well known scripts that can retrieve the underlying video from a Youtube URL (and others), sans Flash advertising. Nobody should be surprised to find out that it's against the TOS to do so. When Google Tech talks migrated from Google Video to Youtube, they lost their .mp4 video enclosures, never to return. Simple programmatic access to hosted video is approaching impossible, restrained only by the need for Flash player to access the same video somehow. This would normally be a step backwards in making the world's information searchable and universally accessible and usable.

The good news

For sites that do offer direct access to video files, within the <video&gt tag lies some great functionality. You can use the attribute markup to set time markers ("a cue"), leading surfers directly to an excerpt of a larger video. I was going to share an example here, but I commented it out after seeing that pretty much nothing supports time markers yet.

Anyways, a lot is to be gained by this. You can quote exactly what a person said, or select a segment or scene from a larger video corpus to share with your audience. And you don't have to recut a video to do it, just let the browser handle that. Building these controls into the browser lets you entirely skip the cut and re-encoding and just grab what you need.

Some sites offer this function independently via URL parameters, but making it standard brings a benefit beyond your direct audience. You can apply a lot of the same techniques Google uses for regular links. If everyone uses the same published syntax to cite clips within a larger video, engines can index sites as they're discovered and instantly apply knowledge of the citation to the region of time quoted, rather than waiting for a programmer to translate another video hosting site's parameters. Search engines can apply annotations to a larger video and perhaps even generate table of contents automatically. Youtube is experimenting with annotations in Flash, and it's apparent they're in dire need of something more pageranky to curb the abuse.

This all relies on a fast seek to make cues feasible. As far as I can tell, none of the major vendors have even addressed this, even though the spec calls for it. For example, can anyone explain what unit the start and end attributes are measured in? Seconds would make sense, but it could be bytes if you want to seek quickly.

So I applaud Mozilla's efforts and wish them success, but we're kinda ignoring the elephant in the room here.

Comments !