Engineering video playback for Alexa and her friends might sound like an obscure task, 这与成功视频发行商的核心要求相去甚远, 但实际上,这是一个非常聪明的解决大搜索问题的方法. 观众现在有很多视频服务可供选择, 很难找到有问题视频的服务, 开发人员发现创建一个优雅的, 用户界面(UI)的高效导航.


Voice control enables user navigation via speech instead of via a graphical user interface, with the result being that users don’t have to think about how to find their content. “语音遥控是让UI扁平化的好方法. It gives an awesome experience and is a way to get access to what otherwise is a dizzying array of content choices,Jonathan Palmatier说, 产品管理副总裁, 语音控制, 康卡斯特有线电视. 康卡斯特的X1电视机顶盒有一个语音遥控器, which just might be the invention that will make people stop hating their cable companies (at least if their company is Comcast).

音频控制只是故事的一部分. 当与AI(人工智能)配对时, 软件应该能够学习观众的喜好, 调到正确的频道或服务, and deliver increasingly appropriate search results and recommendations over time. 所以在未来, 告诉设备, “Play my favorite TV show” should do just that—but we’re getting a bit ahead of ourselves here. 通往音频控制的道路可能是漫长而曲折的. 剧中的角色包括亚马逊的Alexa, 苹果的Siri, 以及微软的小娜, 以及谷歌助手和康卡斯特的X1.

- vs附近. 远场通信

X1和任何你能与之通话的遥控器都使用近场通信, 在有限距离内设备的一种短距离连接标准. Alexa (via the Amazon Echo) and other always-on devices use far-field communication. “(Far-field devices) are always on and listening for a keyword to wake up and then start recording and transmitting the voice command. Our voice remote [X1] only works when a user presses the microphone,帕尔玛提尔说.

The difference between a voice remote search and playing content via a far-field AI platform can be a thin, 移动线. Amazon’s Fire TV remote is Alexa-enabled and can respond the same way Alexa would on an Echo device, but the vast majority of the Alexa controls are for audio and connected home devices. 现在可用的一个视频播放控制是Plex, and Alexa can play Plex content if there is a Plex server in a home media set-up.

许多语音平台都可以很好地处理自己的内容(例如.e., Alexa works best with Amazon content) or when using a voice remote to play a movie. The problem develops when a viewer wants to seek content from another media source or app, 或者提出更复杂的要求. Media apps need to be designed with 语音控制 to benefit from the audio navigation available from the AI platforms.

对于本文, 亚马逊发来了书面声明,我在这里进行了浓缩, 可以添加通过“技能”显示卡片控制. It’s these “skills” that enable activity—connecting the NPR skill means Alexa can access content from the NPR app and deliver an audio news briefing (summary). 没有连接技能, Alexa会返回很多“我不明白这个问题”的回答. Siri也有类似的问题. (记者无法联系到微软和Roku的联系人接受采访.)

“Alexa,打开我的电视,播放今晚的新闻”的日子还很遥远. “When you use an Alexa or other far-field device, you could be many feet away from it. That drastically increases the intelligence that the device has to have in order to mitigate against any ambient noise,帕尔玛提尔说. Anyone who has tried to get a far-field device to recognize commands over a lot of noise can appreciate that, 在某些方面, 近场似乎是一个更容易解决的问题.

主题和. 查询

除了被分为近场通信技术和远场通信技术, 还有两种不同类型的语音识别方法, 第一个是面向主题的语音识别,比如Xfinity的解决方案, Mark Vena说, 前全球营销副总裁, 吊索的媒体 和EchoStar. Subject-oriented queries —like “Give me movies with George Clooney” or “Find local news on now”—seem to be an easier proposition.

Then there are query-based solutions that voice recognition devices use to handle more open-ended questions. 如果你问谷歌助手:“好吧,谷歌,今晚Sling TV播什么??,谷歌打开了吊索的媒体网站. 人工智能角色可以让你更接近内容, 但在许多情况下,发布内容可能仍然需要几次点击.

吊索的媒体没有语音控制选项, 但这是Vena个人非常感兴趣的话题, 所以, 似乎, 还有很多人. 在2017年1月的CES上, 似乎每个人都在宣传自己的设备支持alexa, and Amazon itself featured a giant walk-in Echo mockup outside of its meeting space. 如果不出意外的话,亚马逊在推出Alexa方面做得非常出色. 亚马逊表示,有成千上万的开发人员正在为Alexa开发技能.


查询-based systems should generate a wider range of data from user-specific searches. 这有关系吗?? 当它告诉你你看过A的时候, B, 和C, 加上搜索过的X, Y, 和Z, 你可能会喜欢这样那样的内容. 亚马逊在其推荐引擎上建立了一整套业务, 使其成为一种有吸引力的方式来分发更多的内容.

在一个非常基本的层面上, 甚至在尝试确定观众可能对什么内容感兴趣之前, 最初的问题需要一个准确的答案. 提供只有50%正确率的信息是可以接受的吗? 在理论上, Alexa and her friends come from the smart school and have had the benefit of all the resources these big corporations could throw at them. In reality, these 语音控制 systems are often as smart as young children learning how to talk. Alexa volunteered that she’s 2 in human years; Apple TV’s Siri says, “我觉得我昨天就被冷落了.“当我们与他们交谈时,他们的答案是正确的,我们印象深刻. 当他们不这样做的时候,我们对他们的想法就不那么好了.

Our unscientific test of “Show kids’ movies” returned unexpected results with Xbox One, Roku的语音界面在搜索结果中返回成人和暴力电影. The Cortana interface with Xbox One didn’t tap into its movies app; instead, 它显示了在线搜索的电影和YouTube视频. 所有这些界面在回答更具体的问题时似乎都表现得更好. 系统智能的发展还需要一段时间, 不管系统是使用基于查询还是基于主题的搜索.


“One of the biggest challenges in many interfaces with deep libraries of content is there’s just too much there to find what you really like,Tjeerd Hoek说, 创意副总裁 青蛙设计. The company is known for helping many Fortune 500 companies with their product and user design, 包括为各种视频提供商开发接口. The challenge for Hoek is how to display a library of content in a way the user can easily navigate in as few steps as possible. “你看到或听到的事情就那么多, 因此你甚至不会搜索[不同的内容]. Finding media is a good example of something that is much better done by voice than by giving people a search box and a number of filters on the left side to get to the one song, 或者是他们心目中的电影.”

现代 绿野仙踪

很多年前Hoek在微软工作的时候, 这家巨头正在测试用户对语音导航的反应. “我们会把人叫进来,让他们跟电脑说话,”霍克说. “我们有一个人(在另一个房间)控制电脑 绿野仙踪.

Today’s systems incorporate the wizard into the operating system and use a natural-language processor (NLP) so users can ask questions in everyday English. “[Comcast] has invested heavily into building our own NLP because we believe that’s a strategic component to the system, 因为魔力来自于能够调整它. 你不只是从查询中生成通用的答案,”Palmatier说. 对于使用者来说,日常英语很容易. 对于媒体公司的开发者来说,这是一个棘手的问题.


“在为语音驱动的环境开发时, 确保你拥有传统的应用内部曝光度和深度链接策略,迈克尔·戴尔说。, 工程副总裁 Ellation. Next up is targeting respective voice-controlled software development kits (SDKs). Dale表示,这些都映射到各自平台上的标准行为. “Adding a video to your watchlist for your iOS app works easily with a Siri kit integration. Casting a continuation of your favorite show on your TV via Chromecast may be easiest to facilitate via Google Actions.”

很快,消费者将能够在任何设备上获取他们想要的任何内容. 为了支持这一点,电视需要变得更容易控制.

