There has been a lot of a debate in the blogosphere and trade journals about the value of automated content recognition (ACR) to the user experience and the best way to provide that capability. Let’s start by exploring the value of the feature first. Most apps are using the concept of ACR to provide ease of use for the consumer by identifying the show they are watching and “checking them in” to the show (IntoNow is probably the most well-known for this, but many others like ConnecTV and Viggle use it as well). Shazam uses it to provide a launch point to additional information about a product you are watching in a commercial. TVplus uses it to provide a synchronized content experience. There have been discussions about holding the microphone open (or checking occasionally) and when the consumer is determined to be watching something else, to prompt them to change channels (whether for rewards or otherwise).
How can ACR be used to drive better features for the business model? With integration into the 1st screen it could provide the ability to influence consumer behavior. For example, the app detects they are not watching the intended show and offer to tune them to the correct one. It can also provide feedback to advertisers that they are watching commercials (perhaps justifying the Viggle points they receive). It can provide better knowledge about the consumer (their likes, dislikes) for more targeted advertising, and when combined with the timecode or scene, provide contextual advertising or commerce opportunities (which are more lucrative to the provider). Finally, with all of the information, better recommendations mean better influence on the consumers viewing behavior—an incredibly powerful and lucrative feature (think about Google and the order of their links and adwords—what would American Idol pay to strongly influence the consumer to change channels and watch their show?).
Now if we believe there is real value in ACR for both the consumer and the business, how do we effectively implement it? Audio synchronization is the most widely used form of ACR in the 2nd screen and Social TV world today. The most common form of audio ACR is “finger printed” audio. Essentially, very similar to the way Shazam works with music, a database is created of the audio tracks of the TV shows and movies broken down into small segments of audio and then the device “listens” to what is happening and tries to match it to something in its database. That’s why it takes 6-12 seconds to create a match and it is so susceptible to background noise (the dog barking, baby crying, other guests talking). This is also why it is so hard to use audio finger printing to create a synchronized experience. ConnecTV, IntoNow, Viggle, and many other apps use this approach for checking you in. TVplus does manage a synchronized content experience using this method.
The next level of sophistication for audio-based ACR is “audio watermarking.” Instead of creating a database of all known audio tracks, you insert inaudible sounds into the audio track that create synchronization points. Think of it as something akin to the way a dog whistle works. The sounds can be created in a manner to cut through most background noises, and if managed correctly, can function like the time code or clock of the feature. Of course, changing the audio track in post production is expensive (when done for the thousands of shows that exist) and requires support of the content owners and distributors (so that no one replaces audio track—often the decision of the cable/telco network operator (Comcast, AT&T, etc) or of the digital video service provider (iTunes, Vudu, Netflix). The Sons of Anarchy “SOA Gear” app is an example of this watermarked audio approach.
If you have used any of the apps designed for movies (King’s Speech, Tron, Bambi, etc), you will notice that they have an ACR method based on synchronizing with your Blu-ray player. Essentially they reach out via local wi-fi and get the time code of the movie that is playing and relay that back to the app. This is not affected by background noise and doesn’t require a change to the audio track, but does require the cooperation of the content creator to allow the connection from the app (in BDLive). Some of the more sophisticated apps have both Blu-ray and audio ACR capability, checking for the Blu-ray connection first, and then using Audio ACR as a backup.
In the next 6-12 months, you will see the movie and TV apps (Netflix, Hulu, Boxee, Vudu, etc) start providing some level of synchronization capability , allowing either their own app or 3rd parties (if they are smart) to access the time code and name of the feature that is playing.
Flingo adds another level of sophistication to this concept. They insert themselves into the “Operating System” of the smart TV, allowing the app to know the time code and feature name of anything that is playing. In theory, this enables them to work across multiple apps.
Where is all of this headed? While the most effective method of synchronizing is great if you are developing the feature (I have seen an example that checks for Blu-ray connectivity first, then looks for a set top box that is can communicate with, and then finally uses audio sync for ACR), it is a difficult approach for 3rd party apps (those that create experiences for many TV shows and movies instead of a single experience for a single event or feature). The most cost effective way to reach the most consumers is audio finger printing, while the best experience for the consumer is direct integration into the OTT movie service, set top box or Blu-ray player. I would expect, however, that even when the app can speak to the movie service directly, for there to be a “fail over” option of audio-based ACR. So if you are building an app, some level of audio ACR capability is probably an entry fee to this fast growing market place, and your ability to supplement that with tighter integration to the set top box, blu-ray player, or the OTT movie service itself can be a major differentiator against the competition.