There has been a lot of a debate in the blogosphere and
trade journals about the value of automated content recognition (ACR) to the user
experience and the best way to provide that capability. Let’s start by exploring the value of the
feature first. Most apps are using the
concept of ACR to provide ease of use for the consumer by identifying the show
they are watching and “checking them in” to the show (IntoNow is probably the
most well-known for this, but many others like ConnecTV and Viggle use it as
well). Shazam uses it to provide a
launch point to additional information about a product you are watching in a
commercial. TVplus uses it to provide a
synchronized content experience. There
have been discussions about holding the microphone open (or checking
occasionally) and when the consumer is determined to be watching something
else, to prompt them to change channels (whether for rewards or
otherwise).
Let’s start by reviewing the consumer value options first. Can we use ACR to affect the first screen? Absolutely. Think Flingo here. If the TV or the 2nd screen knows what content is playing, it can change audio, change content sources, pause automatically (when you are exploring something deep on the 2nd screen), etc. Can we use ACR for social features? The check-in is an obvious case which saves the consumer from having to type in the show name, etc. It could even describe the scene you are watching or the play that just occurred in a game and pre-populate your Tweet (NCAA March Madness does this for cheering). Probably the most valuable feature set to the consumer is providing additional Stimulation. You can provide synchronized experiences, deliver engaging commerce or product information, provide contextual facts, trivia, the latest scores, stats, etc, etc. Discovery could be the most appealing feature when combined with knowledge of the user (ie which member of the household) which would be used to keep track of what they really watch for purposes of providing better recommendations and discovery.
How can ACR be used to drive better features for the
business model? With integration into the 1st screen it could
provide the ability to influence consumer behavior. For example, the app detects they are not
watching the intended show and offer to tune them to the correct one. It can also provide feedback to advertisers
that they are watching commercials (perhaps justifying the Viggle points they
receive). It can provide better
knowledge about the consumer (their likes, dislikes) for more targeted
advertising, and when combined with the timecode or scene, provide contextual
advertising or commerce opportunities (which are more lucrative to the
provider). Finally, with all of the
information, better recommendations mean better influence on the consumers
viewing behavior—an incredibly powerful and lucrative feature (think about
Google and the order of their links and adwords—what would American Idol pay to
strongly influence the consumer to change channels and watch their show?).
Now if we believe there is real value in ACR for both the
consumer and the business, how do we effectively implement it? Audio synchronization is the most widely used
form of ACR in the 2nd screen and Social TV world today. The most common form of audio ACR is “finger
printed” audio. Essentially, very
similar to the way Shazam works with music, a database is created of the audio
tracks of the TV shows and movies broken down into small segments of audio and
then the device “listens” to what is happening and tries to match it to
something in its database. That’s why it
takes 6-12 seconds to create a match and it is so susceptible to background noise
(the dog barking, baby crying, other guests talking). This is also why it is so hard to use audio
finger printing to create a synchronized experience. ConnecTV, IntoNow, Viggle, and many other
apps use this approach for checking you in.
TVplus does manage a synchronized content experience using this method.
The next level of sophistication for audio-based ACR is
“audio watermarking.” Instead of
creating a database of all known audio tracks, you insert inaudible sounds into
the audio track that create synchronization points. Think of it as something akin to the way a
dog whistle works. The sounds can be
created in a manner to cut through most background noises, and if managed
correctly, can function like the time code or clock of the feature. Of course, changing the audio track in post production
is expensive (when done for the thousands of shows that exist) and requires
support of the content owners and distributors (so that no one replaces audio
track—often the decision of the cable/telco network operator (Comcast,
AT&T, etc) or of the digital video service provider (iTunes, Vudu,
Netflix). The Sons of Anarchy “SOA Gear”
app is an example of this watermarked audio approach.
If you have used any of the apps designed for movies (King’s
Speech, Tron, Bambi, etc), you will notice that they have an ACR method based
on synchronizing with your Blu-ray player.
Essentially they reach out via local wi-fi and get the time code of the
movie that is playing and relay that back to the app. This is not affected by background noise and
doesn’t require a change to the audio track, but does require the cooperation
of the content creator to allow the connection from the app (in BDLive). Some of the more sophisticated apps have both
Blu-ray and audio ACR capability, checking for the Blu-ray connection first,
and then using Audio ACR as a backup.
In the next 6-12 months, you will see the movie and TV apps
(Netflix, Hulu, Boxee, Vudu, etc) start providing some level of synchronization
capability , allowing either their own app or 3rd parties (if they
are smart) to access the time code and name of the feature that is playing.
Flingo adds another level of sophistication to this
concept. They insert themselves into the
“Operating System” of the smart TV, allowing the app to know the time code and
feature name of anything that is playing.
In theory, this enables them to work across multiple apps.
Where is all of this headed?
While the most effective method of synchronizing is great if you are
developing the feature (I have seen an example that checks for Blu-ray
connectivity first, then looks for a set top box that is can communicate with,
and then finally uses audio sync for ACR), it is a difficult approach for 3rd
party apps (those that create experiences for many TV shows and movies instead
of a single experience for a single event or feature). The most cost effective way to reach the most
consumers is audio finger printing, while the best experience for the consumer
is direct integration into the OTT movie service, set top box or Blu-ray
player. I would expect, however, that
even when the app can speak to the movie service directly, for there to be a
“fail over” option of audio-based ACR.
So if you are building an app, some level of audio ACR capability is
probably an entry fee to this fast growing market place, and your ability to
supplement that with tighter integration to the set top box, blu-ray player, or
the OTT movie service itself can be a major differentiator against the
competition.
@ChuckParkerTech
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.