I've been struggling to define NUI clearly for my own benefit and the benefit of everyone else - both NUI practitioners and those new to the term. It seems to be me that the term has gone undefined far too long. After a couple of other attempts this week I've finally settled on a definition with which I'm very comfortable.
In fact, I'm so confident in this definition that I've put it into the Natural User Interface entry on the Wikipedia. I encourage everyone interested in NUI to use it. It's a definition that is concise and easy to understand. Here it is:
A Natural User Interface is a human-computer interface that models interactions between people and the natural environment.
There are couple of aspects to this definition that are important.
NUI is a form of HCI. Its important that we make that explicit.
NUI models natural interactions. That means it leverages and uses as a template the interactions people have with each other (e.g. speech and gestures).
NUI also models interactions between people and the natural environment (e.g. water and rocks) as opposed to their artificial environment (e.g. computers and cars.).
UPDATE January 23, 2010
An anonymous contributor has removed my edit to the wikipedia which means that at least one person doesn't agree with my definition. I could go and put my definition back in but that would probably result in a wikipedia pissing match where we just go back and forth doing and undoing my changes. That won't be productive. Instead I would like to see this discussion, as well as others like the Ron George OCGM discussion, move to a more appropriate forum. Perhaps as mailing list. Suggestions are welcome.
A Natural User Interface is a human-computer interface that models aspects of direct interactions between people and their natural environment.
It took me months to arrive at a definition of NUI that I'm comfortable with and within 48 hours of proposing a definition I'm already questioning its validity. Why? It's that pesky term "natural" that just keeps messing things up. In the definition I proposed, I was careful to place the context of "natural" as it relates to the users own environment. What I meant by "their natural environment" was the objects and processes by which the user is already familiar - only I did a really poor job of expressing that idea.
In an interview Bill Buxton (see video below) does a much better job of explaining what I was trying to say. Essentially he tells us that what is "natural" depends on what is already familiar to the user and what is also appropriate at the moment the user is interacting with the system.
To drive home that point of "appropriate" Buxton makes the excellent point that using a keyboard and a mouse is probably the most natural (not to mention efficient) method of interfacing with a computer when you want to write an article or a blog or a book or a letter. This is really an important point.
I'm most comfortable writing with QWERTY keyboard. In fact, I avoid using a pencil when ever possible because its not nearly as efficient for me. I also cannot imagine using a speech-to-text system for writing. Not only does speech-to-text pretty much suck, even if it was 100% accurate I would still prefer to use a keyboard. I think better with a keyboard than when I'm talking. So, in my case, what is the most natural UI for writing is a keyboard and mouse. I don't think that I'm alone.
What Bill Buxton emphasizes is that a NUI must leverage skills that people already have - whether they are innate or learned. For example, driving a car is not an innate skill, it's a learned skill. An interface that can leverage aspects of the controls for an automobile will be more familiar and more appropriate in some contexts than a gestural UI or a speech UI.
Another point that Bill Buxton makes is that NUIs should be multi-channel which is kind of like being multi-modal. A multi-modal interface is one that offers different mechanisms for interacting with the computer which may be combined. For example, you might type on a laptop, use its touch screen, or voice controls depending on what you want to accomplish. You may use some modes simultaneously (touch and speech) or individually depending on the context. I'll use speech to make phone calls when I'm in the car by myself via my iPhone's voice dialing feature, but I'll use the touch screen when I'm around people I don't know. Not only is my UI multi-modal, its multi-channel. I can use different combinations of UI modes to accomplish the same task.
Bill Buxton makes an off the cuff statement that we should perhaps use the term "Appropriate User Interface" rather than "Natural User Interface" and that, I think, is the bedrock of the point of what he is trying to get a across. While "Appropriate User Interface" seems far more accurate description of the new UI paradigm, its "Natural User Interface" that's obtained traction as a term. The question is, what do we mean by "Natural"?
Given the importance of appropriateness perhaps that should be in the definition of NUI. The following is an alternative definition of NUI. I don't know which definition is best, the first one I proposed or this second one, but perhaps together they can help us find the best definition.
A Natural User Interfaces is a human-computer interface that leverages skills already familiar to the user that are most appropriate to the context of the interaction.
To be perfectly honest I was a little disappointed at the lack of discussion surrounding my first attempt to define NUI. Hopefully, some more discussion will occur around this post.
The following video is really important for anyone interested in NUI. Bill Buxton is, in my opinion, one of the foremost authorities on the subject. He's been researching multi-touch, speech, and many other aspects of human-computer interfaces for 30 years. The guy knows what he's talking about.
Btw - Josh Blake has taken notes on the interview which I found to be very useful. Buxton says so many important things that you need to listen to the interview a couple times and review Josh's notes to really appreciate it.
UPDATE: January 22, 2010
I've come up with my final definition of NUI which you can find at this post.
"Natural User Interface" (NUI) is a term that is kind of like "pornography" in that its hard to define but you know it when you see it.
There have been a number of definitions offered for NUI, but none of them seem to capture the full spectrum of NUI technologies. I've decided to propose a definition that I believe captures the spirit of term Natural User Interface while helping to differentiate NUI from other kinds of human-computer interfaces.
There are four qualities of a NUI that should be in the definition of the term: Directness, Naturalness, and the fact that NUI is a type of Human-Computer Interface.
Directness
NUI is often differentiated from GUI by the directness of interaction. We say that NUI is more direct than GUI. The typical example of this is multi-touch where we can manipulate objects directly on the screen using your fingers rather than with a joystick or a mouse. The manipulation is said to be more direct, and therefor an example of NUI. The same could be said for audio interfaces where the user speaks commands directly to the computer rather than typing them in via a keyboard. In fact, the directness definition tends to apply pretty well to may different NUI technologies from touch screens, to direct voice input, to augmented reality, tangible computing, and even automatic identification. But directness, while an important dimension of NUI, its actually a bit too inclusive to stand alone. For example, the tuning in a radio station using a dial is pretty direct but its not NUI.
Naturalness
Another word often used to describe NUI s naturalness. Not natural as in Nature, but natural as in the learned and innate manor with which people interact with non-technological aspects of their environment. For example, its more "natural" for people to move things around with their hands than with a joystick. We don't use a joystick to eat food or wash ourselves. Similarly, its more natural for people to express their needs using speech than a keyboard. Although critical to the definition of NUI, naturalness, like directness cannot stand alone. For example, Augmented Reality - arguably a NUI technology - has little or no alignment with how we naturally perceive the world around us. Its not like objects naturally have meta data labels floating over them. In addition, many of the manipulations we use on multi-touch screens (e.g. pinch) have no natural counterpart in our interactions with everyday non-technical objects. You don't pinch a piece of paper to make it smaller while preserving its aspect ratio.
It would seem to me that directness and naturalness are both qualities that should be present in a NUI technology. A technology that is neither direct nor natural probably isn't NUI. For example, a joystick or mouse is neither direct nor natural. The same can be said for a keyboard.
Computer Input and Output
It's also important that we explicitly tie the definition of NUI to the physical devices used to input data and to output data to a computer. The way in which a computer processes information is not important to the definition of NUI. We don't care if a multi-touch computer uses artificial intelligence or mechanical gears to process information. What we care about when it comes to NUI is how do we give the computer data and how do we get information from the computer. These are the mechanisms that make up a computer's Interface. Defining NUI as a type of Human-Computer Interface may seem redundant but I think its important to emphasize the fact that we are talking about the inputs and outputs mechanisms of a computer and not the computational, memory, or storage aspects of a computer.
A Definition for NUI
Based on the proceeding paragraphs about directness, naturalness, and computer I/O, I propose the following definition for NUI.
A Natural User Interface is a human-computer interface that models aspects of direct interactions between people and their natural environment.
The term "models aspects" is the best I could think to express the idea that the NUI is not attempting to exactly duplicate a natural interaction, but is simply using some qualities natural interaction as a model. In fact, NUI interface are not indistinguishable from natural interactions. They simply resemble certain aspects of natural interactions.
UPDATE: January 22, 2010
I've come up with my final definition of NUI which you can find at this post.
UPDATE January 20, 2010
With the ink not even dry on this definition I've already written a new post which proposes an alternative definition of NUI based on commentary from Bill Buxton.
I've been reading and learning about non-speech audio feedback and how it might be used in NUI. A particularly good resource on the subject is a book that was being written by Bill Buxton and others in the early 90's, but was never finished called Auditory Interfaces: The Use of Non-speech Audio at the Interface - the unfinished book is on-line and free to read.
There is a lot of ways to slice and dice the topic non-speech audio feedback, but one way of looking at it is in terms of signals and data representation.
Non-speech audio feedback as signals would include the various chimes and beeps your computer makes. These are called Audio Icons or Earcons. One of the things I'm interested in is what is called "source" sounds or sounds that provide clues as to the function or operation that is causing the sounds. Source sounds are analogs drawn from real-world sounds. One of the best examples, in my opinion, is the sound that Apple Mail makes on a Mac or iPhone when sending email. When Apple Mail successfully sends an email you hear a small swoosh sound like a bottle rocket taking off. It's not loud or intrusive but it is very distinctive and it immediately communicates to the user that an email has been sent. Another example of a source sound is the sound of a door closing or opening when one of your "buddies" on your favorite instant messenger client logs off or on.
Using sound as data representation is also very interesting. What, for example, would the stock market sound like if its ebbs and flows were converted into musical notes? Using sound to represent the shape and contour of data may sound a bit weird, but there are lots of really useful applications.
I personally like the idea of using non-speech audio feedback for what Mark Weiser called "Calm Technology". Calm technology, or calm computing, is some type of feedback about the running state of a system which is manifest on the "periphery" of our consciousness.
The example of Calm Technology presented in Mark Weiser and John Seely Brown's seminal article on the topic, was a long string of plastic hanging from the ceiling in the corner of the office. The end of the plastic string at the ceiling was attached to a spinning motor whose speed was determined by the amount of data flowing over the company Ethernet cables. As the motor turned, the string would naturally whirl in circles either very slowly, when the Ethernet was not busy, or quickly like a tornado when the Ethernet had a lot of data streaming through it. Most of the time workers in the office wouldn't pay much attention to the plastic string twirling in the corner only taking notice when it started spinning quickly. This would be an example of a visual Calm Technology. The use of non-speech audio feedback for Calm Computing is really interesting.
Have you ever had a problem with your car that you first noticed as a strange sound? Although we are not aware of it, we tend monitor the running state of our cars by the sounds they make. The sound of the engine, the sound of the tires on the road, the sound of the air flowing through the vents. When our car starts to make a new sound, something its never done before, we take notice. What if an ambient sound, something subtle and even enjoyable, could be assigned to the running of your computer?
Personal computers are running dozens of processes all the time. Most of those processes are for good, but sometimes bad process (e.g. viruses) are also running. How does the average person know if a virus has invaded their computer. There is generally no way to tell except to run a virus checker and (based on recent experience) virus checkers don't always detect a virus. But what if, like some new sound made by your car, you could tell that something was different about your computer by the sounds it makes while running? That could be very useful.
Imagine that every process running on your computer had a unique croak, chirp or trill - the sounds of frogs, crickets, and cicadas of a small pond at dusk. If every process had a unique croak, chirp, or trill - a sound that is the same every time the process is run - our computers would have a kind of natural ambient pond-like sound when it ran. At first we would take notice but after a short time the sound would settle into the periphery of our awareness so that we would only take notice when a new, and unexpected sound, was introduced. If we just installed some new software a new sound would register when the software was installed and become a part of the natural and healthy ambient audio rhythm of the computer. If, however, some new process - one we did not intentionally install - was introduced such as a virus, the new pond-sound (i.e. croak, chirp or trill) would be out of place and stand out. We might take notice and wonder, what new process is running?
As an experiment, play the YouTube video below but turn down the sound so its very quiet and minimize your browser so you are not looking at the video, only listening to it. After a few seconds start doing something else on your computer like reading some other article and just let the sounds of the pond sink into the background. It's a nice calming collection of sounds and you can quickly tune it out and focus on other matters. If a new sound was suddenly introduced, however, you would probably take notice. The sounds in this video are constant, nothing really new is introduced, but try to imagine a new kind of croak, chirp or trill suddenly making an appearance.
Thirteen months ago, just two weeks before Christmas, I faced one of the biggest turning points in my life. I had just been laid-off as the VP of Developer Relations at Curl, Inc. In that same week that I received shipment of my very own Microsoft Surface.
The Microsoft Surface computer cost me over $18k after paying for the device, extended warrantee, delivery, and set up. Getting laid-off the same week that I got my Surface seemed like bad timing, but in fact it was the best thing to happen to me career-wize in a decade.
With ample free time I was able to dive right in and train myself in C#, .NET, and the Surface SDK. My background was as a Java developer working on server-side open source projects - not the best preparation for UI development not to mention NUI development.
As I looked for a new job I wrote some simple programs for my Surface one of which is pictured above. It was a paint application for my kids. They loved the application and played with it many times over the course of the last year.
Of course, having a Microsoft Surface and finding a job programming in it were two different things. I had decided three months earlier that I wanted to develop applications for the Surface but I had planned to have more time before transitioning back in to software development - I had been out of development since taking a Job as an Industry Analyst at Burton Group in 2004.
As luck would have it a company out of Denver, CSG, was at that time courting Best Buy with a product designed around retail using, in part, Microsoft Surface machines. CSG needed someone on the ground near Best Buy's headquarters in Minneapolis. Someone who could program a Microsoft Surface. It seemed like destiny that I, a re-minted software developer with a Microsoft Surface and no job was located in the same city as CSG's biggest potential client, Best Buy.
Although my official start date wasn't until sometime in February, I started working immediately on the a demo that CSG could show Best Buy. CSG couldn't pay me yet, but its not like I had Microsoft Surface programming jobs lined up out the door. I took the opportunity to learn how to program the Surface on a real project and CSG got my labor for free. I really busted my butt working on that demo and when I finally got on the payroll I started putting in some serious hours.
By late Spring I had been working around the clock on the project for about five months and I was exhausted and to be perfectly honest dispirited. While I loved working on the Microsoft Surface I wasn't fond of the long hours I was working, so I put in my resignation and moved on to a new gig developing an iPhone application for a company in the Health Care industry. I haven't been able to talk about my work at CSG until now because the whole project with Best Buy was hush-hush. But last week CSG and Microsoft made some announcements (here and here) about the effort and what was a successful test of the Microsoft Surface application I helped develop. It's my hope that Best Buy and others will adopt CSG's "DigitalFolio" platform with its Microsoft Surface computers. If it does, than my work on that device will be used in hundreds of locations by thousands of people, which would be pretty neat.
I never had time to go back to writing code for my Microsoft Surface. It sat in our spare bedroom basically gathering dust as I toiled away at iPhone development the last six months. Although I would turn the Surface computer on occasionally for my kids, or my brother-in-law, who liked to play with it - I never did go back to writing software for the device. Finally, I put it up for sale and sold it. It now sits waiting to be picked up and shipped to its new owner where it will be put to good use.
Working on the Microsoft Surface will always be one of my fondest memories. The device literally changed my life. Until I first saw the Surface I was on a completely different career path aiming for the executive ranks. After seeing the surface for the first time in September of 2008 and then owning and working on it for several months, I changed my mind and decided software development was the life for me. Specifically, focusing on the development of Natural User Interfaces.
As of now I've finished my engagement developing the iPhone application for the health care company and I'm enjoying a much anticipated three month sabbatical with my family. We plan to live in Costa Rica for about two months and then return to the Great White North of Minnesota and pick up where we left off. I don't know what I'll be working when I get back - I haven't been looking for what comes next - but I know that I'll be having a blast developing software for some multi-touch or other NUI platform.
I may never have a chance to write software for the Microsoft Surface again, but I'll never forget it and I'll always be thankful to Microsoft for having created it and productized it. Microsoft Surface changed my life for the better and I'm grateful for that.