Microsoft Research's Anoop Gupta wearing the Kinect SDK launch shirt

Microsoft this morning officially brought its Kinect motion sensor to Windows PCs — releasing a software development kit that will let developers create noncommercial Windows 7 programs that can be controlled by gestures and voice commands.

The release, the prelude to a future SDK for commercial Kinect applications on Windows, makes official a trend that started with grassroots Kinect hacks after the release of the sensor for Xbox 360 last year. GeekWire spoke with Microsoft Research’s Anoop Gupta this morning for more details about the SDK and the company’s plans.

Q: What’s of the significance of the Kinect SDK release from your perspective?

Gupta: I’m really excited. A lot of people were waiting for the SDK. The wait is over.

Q: A lot of people weren’t waiting.

Gupta: This is the official Windows SDK for Kinect, and it’s a noncommercial SDK. We have some of the deepest insights into the technology, and there was an amazing amount of work done across Microsoft Research and the product group, deep algorithms, to develop the technology. By sharing these insights, incorporating tools, we are hope we are making it easy for academics and developers all over the world to create wonderful, exciting, innovative applications.

Q: What’s can developers access in the Kinect device through the SDK?

Gupta: We provide access to the raw sensor data. There are three kinds of sensors — the RGB video sensor, the depth sensor and then the four-element microphone array. This capability is important to everybody, but particularly to academics and researchers who want to build on the core capabilities, advanced new algorithms. Then we provide skeletal tracking capability for up to two people simultaneously. This makes possible a lot of gesture-driven applications that people have been excited about.

And we have a lot of advanced audio capabilities that build upon the microphone array. We do noise suppression. Manufacturing or hospital or wherever you are, if you’re in a noisy environment, being able to get rid of the noise. Echo cancellation, which becomes important in conferencing environments. Integration with the speech API from Windows. You can do speech recognition. And I think that allows for the creation of true multi-model natural user interface applications. It’s not just gestures — it’s how gestures, speech, other modalities come together.

It’s programmable in both unmanaged and managed code, so there’s C++ interfaces, there’s C#, Visual Basic.

Q: Why go with only non-commercial licenses initially?

Gupta: We thought it was really important to get out early, to get something in the hands of developers … learn from that and build upon that to release the commercial SDK.

Q: For years, people have used a keyboard and mouse, and now Microsoft is pushing toward supplementing that with the natural user interface. How do you think people will look back on this day in five or 10 years?

Gupta: I think it is an inflection point in that it is adding new capabilities that were simply not there. We think providing the capability so that you can do human motion tracking, you can get this rich microphone array capabilities and these gesture and multi-modal interfaces at a$150 cost point (for the Kinect sensor) is pretty dramatic and revolutionary. In the gaming and entertainment worlds it’s already tens of millions of people that have been transformed. Now to take it to the hundreds of millions of people who have PCs and embedded PCs, while it will take some time, this will be a pretty memorable moment.

Q: How will this change how people interact with their machines?

Gupta: It is less about how you interact with your machines when you sit at a two-foot distance. You have touch, mouse and keyboard there. It becomes more interesting — imagine the kitchen, making chocolate chip cookies and your hands are gooey, you can interact with speech. The same applies to greasy hands in automotive repair shop or a doctor whose hands are sterile can scroll to a different portion of the X-ray or zoom. It is about posters and interactive displays in shopping. … It informs how we think about telepresence (and videoconferencing).

Q: Why does this only work with Windows 7?

Gupta: It’s a noncommercial SDK, there are hundreds of millions of Windows 7 machines out there. We didn’t want to go through all the testing and everything else. We wanted to get the capability out as soon as possible.

