Many years ago, I had access to a piece of software called Microsoft Phone. It was distributed with some versions of the Creative Labs Phone Blaster, a rather massive expansion card that combined a modem with a sound card. What made this combination really cool was that the card and the software turned your computer into a telephony device. It wasn't the first or only device on the block that could turn your computer into an answering machine — I had something in my computer in college that did the job as well (although I'm sure my roommate often times wished we just had a simple tape-based machine, especially when I would reboot my computer and it would knock him off the phone). It also wasn't the only software that could use your computer's microphone and speakers as a speakerphone — some Phone Blasters were bundled with completely different software that did that as well.
What made Microsoft's product so interesting is that it was integrated with its fledgeling text-to-speech and voice-recognition programs, and it used the common MAPI message storage system. Back in the day, when Windows 95 was still new, mail storage could be set up as more of a common database. At least, the mail storage location was a simple control panel icon, and the built-in Windows Mail client could connect to the MSN service without an issue, without having to use any custom email software. (It seemed simpler then; maybe it could still be done that way today, since I'm back to using Windows Mail in Vista instead of Outlook or Thunderbird or Outlook Express or even Live Mail.)
In any case, installing the Microsoft Phone software included installing Microsoft Voice, which allowed for some voice recognition. Trying to control the computer with it was more of a gimmick than being really useful at all. (Looking back on it now, it's hard to expect more out of 1995 technology, although I'll get into that in a minute.) The command set was fairly limited, although what commands it did know, it did recognize fairly well, so although there wasn't a lot you could do with it, you could at least do those things consistently.
Now, here's where things got interesting, as far as Microsoft Phone was concerned. One of Phone's features was that you could dial in to your phone and enter a code to access the program and start issuing commands. Fairly standard fare for answering machines. And, just like any ordinary answering machine, you could issue those commands by using a touchtone keypad. Where it started to set things apart was, Phone would prompt you for commands, and not by using pre-recorded prompts. It would read you instructions using text-to-speech reading from a help file. While this might not sound like such a big deal, especially in an age where sound compression and disk space are cheap, back then, this was huge. Phone was able to provide a rich, vocal interface without having to save megabytes of prerecorded files.
The next interesting thing it could do, because of text-to-speech, was it could read extended information about your message. It could not only announce the time and date of a message, and read the phone number, but it could also read the name of the caller. If that caller was in your address book (remember, this was all coming from your MAPI store), it could read that personalized name. Granted, the pronunciation wasn't always perfect, but it was still a very cool feature to hear your computer read the name of your caller to you over the phone along with the message.
Now, this was back in the days of dialup, and you could set your computer to call into your ISP and check and download email during the day. This brings me to the next very cool feature. Remember, all messages were stored in the single MAPI store. So when you play new messages, included in that was your computer reading your new email messages to you. The first time I showed this feature off to a friend, her jaw hit the floor as soon as she realized what was happening.
And finally, because this was running with the Microsoft Voice speech recognition engine, not only could you give the commands using your touchtone phone buttons, you could speak your commands and have the machine respond.
So where is this technology today? As I was writing this, I was thinking about how cool this was back in 1995, but I was questioning how it would work a decade later. In 1995, voice was still a preferred method of communication. Being able to call my computer and have it read my email to me would be a great feature. Today? Not so much. In 1995, 99% of email in my inbox was interesting and valid, came from human beings who sat down and thought about what they wanted to say, and wrote it as if they were writing a letter, with well-thought-out sentences and grammar. Today, most email is hastily dashed off, with lots of abbreviations, written with very little context. And that's only the ones I want to read, which is a minority. The majority of email I get now is mass mailings from corporations, or spam about prescription drugs I don't need or software that's not licensed or legal, or newsletters or chain letters or group mailings or any number of other impersonal communications — basically, nothing I'd need to phone home about.
However, the concept of having voice messages delivered to email is desirable. So I guess, a decade later, I'd be looking at the reverse of what was cool in 1995 — being able to connect a text-type device (i.e. an email browser) to my home message storage and get voice messages. It might be cool to have dictation transcribe that to text for very low bandwidth applications, but with the way email clients work these days, it'd almost be unnecessary. (Almost. I still use Pine over an SSH connection to read email from work, so text-only email and browsing isn't dead yet. At least not for me.)
I'm still intrigued by the concept of a software-controlled answering machine, one that could take messages, convert them to email (making the "from address" look like a phone number would be a nice touch), especially one that could be programmed to automatically answer and annoy calls from certain phone numbers (such as those identified charities that continue to call soliciting donations despite the request for no solicitations). In the end, we've got to go with something that "just works"; so although I tried to recapture those old glory days by buying the Microsoft Cordless Phone product around the turn of the century (that failed to live up to expectations, especially when it wasn't supported past Windows 98), we ended up buying a plain old telephone answering machine that isn't a piece of software installed on a computer — and my current roommates (i.e. my family) are much happier for it.
As far as voice control, I finally decided to play around with it in Vista the other day. I will say that I am impressed with how far it has come. For general navigation, there's this "say what you see" concept, where you can just say the name of what you see — the name of a link on a web page, for instance — and the computer will attempt to discern what you mean. You say the name of a command button or link, and it clicks it. However, it's still rather clumsy. I feel like I'm talking to my 1-year-old, having to repeat things over and over again, sometimes louder, sometimes trying to say things a different way. Dictation was especially frustrating, as you're supposed to be able to say "correct" and the words it just got wrong in order to fix it — and yet over and over again, as I was trying to dictate and correct, it started typing "correct this" and "correct that", like an old sitcom routine where the dullard keeps reciting stage directions.
And in that sense, it doesn't feel much different than 1995. Sure, you can actually dictate text now. But giving commands? That's been around for over 10 years, and we're still pointing and clicking with a mouse. At the end of the day, I couldn't see how it could possibly be any more convenient or useful than just using the freaking mouse or keyboard to directly input the location or text desired. Even if it does entertain my wife to listen to me try to use it....