“Humans can speak 150 vs. type 40 words per minute, on average..”
Kleiner Perkins’ Internet Trends 2016.
The latest Mary Meeker report (ie the link above) provides some interesting insights, as usual. One of them is the reference to voice becoming a much more significant computing interaction mechanism – evolving from keyboards to microphones and keyboards (see slides 115 on) with a 3.5x speed increase. It also shows that the concept is already approaching an important technological tipping point where word recognition is nearing 99% translation accuracy.
To me, voice-based input would seem well suited to short, transactional interactions (eg hands-free search) versus a dictation approach of large slabs of words (ponder the disjointed workflow of fixing mistakes in thinking and translation on the fly). This is possibly also reflected in the stats on slide 127, which shows that the primary setting for voice usage is 43% in the home and only 3% at work.
I suspect that OSS might be a little different than most work settings. Interactions with OSS tend to be quite transactional in nature. Looking at the recent series on future OSS user interfaces, voice interactions seem as well suited to future OSS as they are for Baidu, Google, etc today.
Something else for you to think about in relation to OSS voice interactions – Network Engineers don’t trust software. Many Network Engineers love using command lines for a variety of reasons – speed, getting the full truth and I wonder whether it’s also as a show of technological superiority. Every industry has it’s own jargon that is efficient for peer-to-peer communications but excludes others who aren’t “in the club” (think lawyers and their legal jargon). I wonder whether OSS operators will actually want OSS user interfaces that have a transaction-based language / notation (think YANG) that makes OSS voice interactions efficient rather than using an all-inclusive natural language model.
Which would you rather use? Can you envisage speaking to your OSS, asking questions to get answers in the near future?
If so, do you think we should begin working on an industry-standard notation or would it have to be specific to suit each vendor’s implementation (just as they currently “interpret” standards / recommendations to suit their product)?
PS. Don’t forget gesture-based OSS interactions either.