How-to: Voice control Windows Media Center

Difficulty: 2.5

WSR voice controlI have been interested in the voice control of computers for a long time. My first attempt was around 10 years ago, and I had some success with it. In the right environment, I was able to say commands to my computer and it would respond based on what I said. The problem was that I didn’t have a practical use for it yet. It was clear in this early testing that using a keyboard and mouse was far more convenient, reliable and a quicker option than using voice. It will remain that way for many of the standard interactions (i.e email, facebook) we have with computers, at least in the short term.

The day Microsoft Kinect was launched in Australia, I saw the promotional video showing people waving their arms around to navigate through their media centre. It seemed to me that this would be a fairly unreliable and exhausting way to control anything, apart from games specifically designed for the technology. I was way too lazy to consider using this technology into the future.

I concluded that voice is the simplest way to control anything, and that it always will be. This led me to start playing around with voice control again. I ran through the voice tutorials and was able to get the computer to understand my voice some of the time. It did stuff up on me a whole lot, but it was clearly much more reliable than software I had used in the past.

Now around 6 months on, I have written an AutoHotkey script and a WSR macro that interact with Windows Media Center and Windows Speech Recognition software, allowing my media centre to be controlled completely by voice. This is a practical use for voice control. I can navigate faster with my voice than I can with a remote control. Instead of needing to know which button to press on my remote (or remotes), I simply speak my mind. I no longer use a remote at all. This is something I have wanted for a long time and I am excited about this outcome.

This system far exceeds any other voice control setup on the market today in terms of reliability and practicality. Most of the problems as to why systems haven’t worked in the past has not been because the software was inadequate for the task, (the software has worked fine for many years). Most of the problems are environmental, and my solution tackles these environmental issues. Rather than trying to make technology that works in our environment, my solution changes the environment to enable the technology to work. I believe it is inevitable that all future voice control systems will need to take this approach for the system to work.

This article will give you all the information you need to control your Windows Media Center home theatre PC with your voice. I will provide the easy to edit scripts and show you how to install them on your PC. I will also explain what works and what doesn’t, as well as explaining why previous attempts have not been successful. The more I explain how it all works, the easier it will be for you to set it up and get it working reliably. This will not be as easy as installing the software and having the results you want right away. You will need to train it to recognise your voice, and you will need to learn the correct commands to make. A solution that can understand the whole English language is a long way off. It is much more difficult to synthesize human understanding than it is for a computer to understand dictation. That is why we need to have set commands.

There is a video of my home theatre PC running this system after the jump.

 


Microphone


What you need:
The costs

  • A Windows 7 PC running Windows Media Center 7
  • A decent microphone. I have been using this one from Logitech which retails for $50, though it is not hard to find cheaper. This microphone picks me up from anywhere in the room and has noise cancelling features. I plan on putting in boundary microphones in the future.

How it works:

This system uses Windows Speech Recognition(WSR) to understand the commands we say. This is free software that is built into every Windows 7 PC. WSR Macros runs in parallel with WSR to perform actions based on what we say. I used this to create voice tags which link in with keyboard shortcuts. i.e. Saying “Stop TV” will send the keystroke Ctrl+s. Because keyboard shortcuts are being used, the xml files could easily be altered to work with XBMC or Media Portal.

The second component is what makes my system work. The AutoHotKey script waits for a predefined button press (0 or [). When that button is pressed, the volume of the media drops to a level that WSR will not pick up enough sound to misinterpret as a command. It then opens up communication to WSR welcoming our commands as we talk at a natural volume.

When we have finished saying our commands, we can either say “Play TV” or press the button again. This will turn off WSR, put the volume back to a suitable listening level and then resume playback of our media.

Using this method, we don’t care how loud the volume is when the media is playing. It makes no difference if the stereo is on full blast because the trigger button will drop the level to a volume that we can talk comfortably to the system. I have been unable to find any products or patents that do this, and it is a necessary step for voice control to work. The alternative is to yell over the sound coming out the speakers, and at the same time hoping that WSR doesn’t pick up music as commands.

Windows Media Center has voice control built in, but it suffers the same problems as all the other products on the market. The volume needs to be impractically low for it to work. These built in commands do not clash with the commands built into my system. They work in conjunction with each other, and they only work when a trigger has been pressed.

The software:

All of the software is free.

Please don’t be put off by the list of programs below. Each of them are necessary for the system to work, and they are all fairly straightforward to setup. The format they are in allows you to easily make any changes to the voice tags. The system is very open so you can use it as you want.

  • WSR OverlayWindows Speech Recognition: This software is available from the Speech Recognition Applet within your Control Panel. I feel it has never been given the props it deserves. I can’t ask for much more than 100% accuracy when using this system with a headset. It does have a requirement that you use the English (US) language, so if your system is currently running a different language, you will need to change it in the Regional and Language Applet of the Control Panel.
  • WSR Macros: This program works with WSR. It lets us install xml scripts which define the commands the computer will listen for. WSR Macros can be downloaded for free from Microsoft.
  • AutoHotkey: This is not required, but it is what I wrote the script in. If you want to make changes to the script, you will need this program. It is available from Autohotkey.com.
  • The following programs and scripts are included in Voice Control.zip file.
    • Nircmd.exe: It is used by the script to change the system volume. It is truly a fantastic piece of freeware.
    • Speech Recognition Control.exe & Speech Recognition Control – No Eventghost.exe: This is a compiled version of Speech Recognition Control.ahk. This manages the trigger function and volume changes. Use No Eventghost if you do not yet have eventghost installed.
    • Media Center.xml: This script contains the majority of the commands I have setup for WSR Macros.
    • Sydney TV Channels.xml: This is a sample script which looks after TV channel control.

WSR OptionsInstallation:

To launch Windows Speech Recognition, go into your Control Panel and launch the Speech Recognition Applet. There is not much to setup in this software. Within the options, you will probably want to tick the setting to load it at startup. I also turned off audible feedback and the dictation scratchpad.

Download and install WSR Macros. You will need to have this loading at startup. The easiest way to do this is to make a shortcut to it in the Startup folder of your start menu. At the same time, unzip the zip file to the Speech Macros folder within your Documents folder. This folder will automatically be created when WSR Macros is installed.

The options for both WSR and WSR Macros can be found in your task tray near your clock.

WSM OptionsClick on New Speech Macro.

Select Advanced

Copy and paste the entire Media Center.xml into the text box. Click Next.

You can call the file anything you want, but I call mine Media Center.WSRMac.

WSM Options NewEnsure you digitally sign the file. This will ensure the script is allowed to run on your computer. This has advantages later on also. The alternative to digitally signing your files is to drop the security level down. I would recommend against reducing your security.

Repeat the steps above for the Sydney TV Channels.xml file. You will probably want to change the xml file to suit your needs. The code is as follows.

<command>
<listenFor>ABC</listenFor>
<listenFor>Channel 2</listenFor>
<sendKeys>2</sendKeys>
<sendKeys>\</sendKeys>
</command>

The <listenFor> command determines what phrase you say to go to the particular channel. It is best to make it 3 or more syllables i.e. instead of saying “GO”, say “channel go”. You can have as many phrases for each channel as you want.

The <sendKeys> command determines the keyboard button to press. i.e. This example presses 2 on the keyboard which will go to channel 2.

The <sendKeys>\</sendKeys> is necessary as it turns off speech recognition and turns the volume back up to listening volume.

If you are willing, please send me your altered TV Channels.xml files so I can share them among readers to save them from requiring to go through this step.

Finally, create a shortcut in the Startup folder of your Start menu to Speech Recognition Control.exe.

If you have installed the system correctly, after you reboot, you should see a green H icon, WSR icon and WSR Macro icon in your system tray. You are now ready to test it out. The trigger button is either “[” or “0” as determined by Speech Recognition Control.exe. In effect, you should be able to press the 0 button on your keyboard and the system will be triggered. In most cases the 0 button of your remote control will also work.

Training:

Speech recognitionTraining is an essential part of getting this to work. Without going through the training process, this is unlikely to work for you. The process is simple, but it will take each member of your family a bit of time. Each training session takes around 10 minutes.

I needed to make the system understand my Australian accent so I had to be very patient with it. I ran through the training session 3 times in the first week. I then spent a lot of time using the system to have it understand me well. It took around 3 weeks before it really got a hold of my voice. Much of the problems were related to me saying commands that didn’t exist yet, but you should not have the same problem if you use the “What can I say sheet”. I am now 6 months into this project and it understands me around 95% of the time (given the right environment). I hope the rest will be solved by using better equipment.

Open up the Speech Recognition applet from the Control Panel. Click on “Train your computer to better understand you”.

Follow the prompts. You will see the screen to your right. Speak the text in the box. Not only will the tutorial learn your voice very effectively, you will learn the many features of Windows Speech Recognition. It is really quite amazing what the system is capable of.

Speech recognition3Once the computer understands the line you have spoken, it will progress automatically. I found that it did not progress each time. I would wait 2 seconds and then repeat the line. This felt like the system wanted me to say it again to get another sample of my voice. Some pages I needed to repeat up to 4 times.

It is worth doing this training session a few times over the following days or weeks. Putting in the time early on will save frustration later on.

Once training has taken place, my understanding is that the system will continue to better understand you over time. That is how it has been for me. When the computer does understand you, you are able to transfer your voice profile from computer to computer, so you will only need to go through it once.

Troubleshooting and Tips:

  • Many of the problems are likely to come from inadequate training. If you haven’t trained it, you may have trouble seeing the system work effectively.
  • When editing the xml files, alter the xml file rather than the macro. Ensure you delete unnecessary macros so that any two macros don’t conflict.
  • Ensure the correct microphone is selected. Using the “Setup your microphone” option within WSR is a good place to start.
  • This system uses keyboard shortcuts, so if you are press either [, ], \, 0, * or -, the function will occur rather than the character appearing on the screen. These buttons are rarely used in a media centre setup. A dedicated machine is recommended to avoid this problem. The keys can be changed in Speech Recognition Control.ahk if necessary.
  • A reboot is often a sure fire way of getting it going again. I do this weekly, but I needed to do it more frequently when I was going through the training.

When it will work:

  • It will work where there is a quiet environment without much background noise.
  • It will work if you train it adequately, and continue to improve the more you use it.
  • If you live alone, this is likely to work fantastically well for you.
  • If you are a quiet and patient person.
  • When everyone in the household uses it.
  • It will work exceptionally well in a sound proofed room.

When it will not work:

  • If there is any background noise i.e. all the noise we have always taken for granted.
    • If the shopping is being put away
    • If a neighbour is mowing their lawn
    • If kitchen appliances are running near your TV room, i.e. dishwasher
    • If the rain is loud on the roof
    • If someone is eating chips in the TV room
  • If you have young kids, the technology is unlikely to work for them or around them.
  • When inadequate training has taken place.
  • If you yell at it – a natural tone works best. If it isn’t working, there is likely another problem.
  • In a corporate environment. Each person would need their own office which would be a good outcome, but also an unrealistic one.
  • If your system is busy with other tasks and resources are drained. i.e. My system struggles a little when recording 4 channels at once.

Resolving the problems:

Sound proofing the room is a solid way of fixing some of the problems, with the added benefit that sound won’t escape from the room. Saying this, it won’t be a practical option for many people.

Some of the problems can be controlled somewhat by isolating the rooms which are going to use this technology. We can put noise gates and effects on the intercepted voice, but this is only going to go so far. Sound can only be altered so much, and the human voice uses a huge frequency range, so to cut out any frequencies will affect the incoming signal detrimentally. No matter how long we wait into the future, technology will have very little impact in resolving the majority of these issues.

As I said earlier, we need to create an environment that allows the system to operate. This is likely to impact on the ways that we want to use it. Some considerations will need to be made by people in your household for voice control to work effectively in your home. The majority of the problems are cultural and can be resolved by changing how and when we do things. It will be up to us individually to adapt to the technology to give it every opportunity to work. I believe that using the technology (and making minor alterations in the way we do things and in our homes to accommodate it) will result in very positive outcomes for everyone who grabs it with two hands and runs with it. Time will tell…

This system is much more complicated than simply installing software and watching it do it’s thing. As people are using the system and someone talks to them while the microphone is open, the system will fail. Naturally a little anger may go towards that person. The same will occur as they are using the system towards you. The result could be that we give up on using the system altogether, or we all take a bit of extra time to think before we talk so as to be sure that the system isn’t in use. The only technology I can think that may help this is to install a light that goes on when the microphone is open (On Air). But as far as I can think, that is as close as technology can get to fixing such problems.

Why use voice?:

Keyboards and mice have been necessary tools to help us communicate with computers over the last 50 years. They were created because there was no way to control computers by voice at the time. Now that computers have got to such speeds that enables voice control to work, we can start commanding a computer using our natural form of communication, speech. In time, we will be able to control all electrical devices in our home in much more simple ways.

Each voice tag can perform any task or tasks we want. The result is that we can say one word to set off a series of tasks automatically. A good example is changing volume. If we want to change the volume from 40% to 20%, using a traditional remote, we would need to press the volume down button 20 times. With voice, we can go directly to the volume we want.

Anything a computer can control can be triggered by a voice tag. This has great potential in home automation, and controlling anything electronic around the house. This is better than the current home automation where the switches are relocated to a single panel on a wall. Other tasks can be automated without any human input. This is the start of a fully encompassing system which will make our lives easier.

Speech Recognition vs Voice Control:

Voice control and speech recognition are very different things. Speech recognition is when we expect a computer to recognise anything we say, and then have the computer convert it to text. This is very complicated because our language has many thousands of words and even more combinations of words. Everyone’s voice is different and accents and lisps have a huge impact on what the computer hears. I am yet to see a free program that does this reliably.

Voice control is based upon a set of various predefined commands, where the computer decides which command it should perform based on what it hears us say. This narrows down what the computer expects to hear from us down to a few hundred commands. Because of this, voice control is much simpler to get working, and the results are much more reliable.

How and why my system is different to others:

The main difference between my system and others is the volume drop that occurs before we give commands. This is what makes it all possible and practical. Without this functionality, it will never work in rooms we listen to our media in. It is possible in quieter rooms to have the computer respond to a keyword to open up listening for commands, but as this system will already be in place, I will probably retain it throughout the house as this system develops.

This same volume drop allows for the use of a microphone that is free standing. We don’t need to use a headset for the system to work. This keeps our hands free, which is pretty much the point of using voice in the first place.

It would be very easy to have the computer tell me what it is doing each time I say a command, as most other commercial packages do. But I don’t want a relationship with my computer. I don’t want it to take 5 seconds to tell me it is going to do something I don’t want. When I give a command, I want the computer to perform that command instantly without whingeing. That is why I have not added this feature.

It would be very easy to add in commands for playing genres or artists. The code already exists here (look for Windows Media Player), but yet again it complicates the list of words by adding in thousands of extra commands the computer needs to choose between. The script would need to be altered slightly so it does not clash with my commands. For these reasons I chose to take it out of this script. It will be easy to add this functionality later, but the way I use the system, I don’t see much benefit to it apart from it being a neat trick. (Update: The included Advanced commands script will add in this functionality. I have removed some of the ambiguity requests. It seems to work quite well for me now.)

I have made the commands a little longer than what some other systems use. This helps the software better understand what it is meant to do. i.e. There is not a big difference between the sound of the words play or replay. I have extended them to be Instant Replay and Play TV, which are both very unique sounding commands. This makes the system a whole lot more reliable. Once I have installed adequate sound equipment in my house, it may be possible for the computer to differentiate between the shorter versions, in which case I will change the script to respond to those words. In the meantime our voice profile is learning our voices to give much more reliable responses into the future. (Update: The new script includes both the longer and shorter commands. The longer commands are more reliable, but the shorter commands do work when the computer understands them.)

Resources:

Windows Media Center keyboard shortcuts – Link to Microsoft
WSR Macro Library and Samples – Link to MSDN, Microsoft

Conclusion:

This system works for me exactly as I have explained in this article. Hundreds and hundreds of hours have gone into these scripts and macros through testing and tweaking. I am hopeful this will work really nicely for you. The format it is in opens doors for easy expandability for future projects. There are many commands which stretch well outside of Windows Media Centers’ normal capabilities. Please keep coming back to Inspect My Gadget to see where this system leads me. There are plenty more projects in the pipeline, and voice control allows what was once my imagination to become a reality.

Remember, most of the problems will be able to be resolved by changing your environment or training the system more. There is a good chance I will delete comments complaining that the software doesn’t work for the first few weeks as it is not helpful to anyone. It took months to learn how to type, and this requires the same effort. If you believe it doesn’t work, I think it is likely you will never have voice control in your home.


Related Posts

45 Comments so far »
 

  1. Daniel said, on May 3, 2011 @ 11:11 am

    RE: Voice control for XBMC

    Hi, I found your site, and I wanted to get XBMC working with my voice.

    I have the same Logitech microphone that you do, Windows Vista, all the software you recommended, and I have the Firefly remote setup through EventGhost.

    What’s the best way to set up the voice with XBMC? I want to basically just say “stop”, “play”, “pause”, “fast-forward”, etc.

    Can you help?

  2. Chris Duckworth said, on May 4, 2011 @ 9:54 am

    Hi Daniel, give me a couple of days and I will write a script that will get most of the way there. XBMC uses keyboard shortcuts the same as Windows Media Center so the transfer of commands should be pretty straightforward. The funtions you want will be in there, but you may well need to say play video or play music instead of just the word play. It will be included, but it may not register with the voice recognition as cleanly. I will have a play and let you know in the next 48 hours.

  3. Chris Duckworth said, on May 5, 2011 @ 8:51 pm

    I stuffed around with XBMC a bit tonight. It works a bit different to Windows Media Center and requires a few extra steps to work through the menus. Things like going to the start of the movie after fast forwarding were a bit of an annoyance, basically making the fast forward useless. I will play around with it a bit more on the weeknd and see what I can come up with. One thing is for sure, it looks fantastic, and there are plenty more themes to try out.

  4. chef said, on June 23, 2011 @ 7:18 am

    Thanks very much for all your hard work. Now that the SDK is out for Kinects and windows seven I look wordward to using your scripts and the four high density microphones with the kinect!

  5. shacklefurd said, on June 23, 2011 @ 8:47 am

    Hey there! I stumbled upon your website a couple days ago and ever since I haven’t been able to stop thinking about controlling my computer with my voice! :D

    …well I just got my logitech desktop microphone today and have been training the computer for a while, but I hit a bump…When I press “0” the computer listens, but when I press “0” again, or say “Play TV” it stops listening as expected but doesn’t turn the volume back up? Don’t know what would be causing this but it’s pretty annoying cause everytime I say a command it does something and keeps the volume muted so I have to manually turn up the volume again.

    Computer is Windows 7 Ultimate – x86
    It’s an HTPC so it doesn’t have a whole lot extra installed either so don’t know what the conflict could be.

    Thanks for the wonderful effort!! I’m so excited to get my computer listening to my every word :D

  6. Chris Duckworth said, on June 23, 2011 @ 8:07 pm

    Thanks for the positive feedback.
    I think the Kinect is a great piece of technology, and it will likely have a better microphone setup than I have been using. The problem with it is that it sits near the TV, so it will pick up noise form the TV. The command volume may need to be reduced in the volumes.ini file for it to work well. I look forward to receiving your results with it.

    I have updated the scripts. Upgrading is a little bit of a hassle as it requires updating the Speech Macros – the media center one mostly. It accepts a few more commands now and has a timer on the skip so if you don’t give a command after saying skip or skip ads, it will automatically play after 3 seconds.

    It has been 3 months since the last update, so there are a bunch of tweaks in it, and a lot more reliability. As with all of it, I have done it in a way that suits the way I use the system. I would be interested in getting feedback. The whole thing is working great for me now, though it is still not perfect. It may need to be coded in C to get perfect results. I can honestly say I have not used a remote control in 3 months.

    Shacklefurd: The volumes are controlled by nircmd.exe. I think the new scripts may help fix your problem. The new scripts don’t specify a full path to nircmd. Try the new scripts and let me know if you are still experiencing the problem. If you don’t get the results, I will see what else I can do to help.

  7. Tricia said, on July 10, 2011 @ 7:52 am

    Just tried out your instructions. When I load your media center.xml windows 7 returns an error loading the file.

    I am not sure what I need to do next.

    Thanks for taking the time to help.

  8. Eddy Carroll said, on July 25, 2011 @ 8:50 pm

    Hi Chris,

    Interesting project! You’ve discovered many of the same issues we ran into with getting voice control operating reliably in a living room.

    Have you had a chance to try it out with our Amulet Media Center voice remote? We integrate a microphone with an accelerometer so that the mic remains muted during normal use but turns on when you raise the remote to talk. This, in conjunction with the close proximity of the mic to the person talking, greatly reduces the effects of outside interference from TV soundtracks, music, etc, improving reliability and reducing false positives. It gives you the benefit of an IR-learning Media Center remote with the addition of voice control whenever you need it.

    We bundle our own voice control software with the Amulet remote, but the hardware will work fine on its own. From Window’s perspective, it looks just like a standard USB audio input device, so it works fine with other voice recognition solutions like VoxCommando, and I expect yours, with no special configuration needed.

    Eddy

  9. Chris Duckworth said, on August 4, 2011 @ 11:40 am

    I haven’t received an error on the media center.xml file before (though I have on some others). Are you copying the contents of the file into Speech macros? Open up the xml file in notepad and select all the text and copy it. Overwire the default template that comes up in WSR so that you only have the xml file in its entirity in the file.

    What arrer are you getting?

  10. Chris Duckworth said, on August 4, 2011 @ 12:10 pm

    Hi Eddy,

    Thanks for the message. I have not used an Amulet remote yet, but I did look into them quite in depth when I started out with this voice control stuff. It is a certainly a good way to conquer many of the issues that crop up. It is probably the best option on the market currently. It also looks nice. I have little doubt that the amulet remote would work with my system, and it is probably a whole lot easier to setup than my system.

    My main reason for starting this project was to remove remote controls from my loungeroom alltogether and I succeeded with this for when I am using the room. I have learnt that visitors have trouble using the voice control, and because of that a remote will be required, at least in the short term. My system does have it’s advantages though as it works when I have my volume cranked up infinantly loud. It was also built as a solution to enable voice control beyond the lounge room and Windows Media Center. I have been busy working on a bunch of voice projects where the use of a remote is not appropriate – slowly releasing them up here as I feel the timing is right.

    It’s exciting times when voice control is getting close to being mainstream. The advantages it offers over traditional control methods are huge.

    Cheers
    Chris

  11. Scott said, on August 29, 2011 @ 10:45 pm

    Hi..

    Thanks for the tutorial, I have tried setting this up and thought everything was going fine.. however when I press the 0 key on the remote I get an error telling me that EventGhost can not be found.

    I see you have included some EventGhost XML files in the zip

    1. Do I have to install EventGhost to make 0 control function work?
    2. If so what do I need to do with the XML files?
    3. I have tried installing EventGhost but when I press 0 all that happens is EventGhost is launched.

    Would very much appreciate your help with this.
    Thanks

  12. Chris Duckworth said, on August 29, 2011 @ 11:12 pm

    Hi Scott,
    Fair call. I haven’t used it on a machine without eventghost for the last few months so I hadn’t tested this problem out. I have gone through the autohotkey script and removed all the references to turning devices on and off – all of the eventghost stuff. This is saved as the file Speech recognition control – no eventghost.ahk. You can play with that, or download the exe and copy that into your speech macros folder. It needs to go there to call on nircmd. Try this and see how you go. I have tested it briefly and it seems alright. I can’t think of any problems with removing the eventghost stuff. It just simplifies the script a bit.
    Cheers
    Chris

  13. PDKY said, on November 19, 2011 @ 5:29 pm

    Chris,

    First of all let me say your your speech recognition for wmc is beyond awesome. I use it all the time. Thanks by the way for publishing it for everyone. I do however have a question. What can be done to make it so I could be able to ask for a certain song or certain album and it play it? Kind of like Rob’s wmp program. Once again thanks for the awesome creation!

    P D

  14. Chris Duckworth said, on November 19, 2011 @ 8:51 pm

    Hi PD, First of all, I am very happy to hear you are using the software and enjoying it. I thought I was the only person on the planet enjoying the benefits of practical voice control until I saw your comment. Good on you for working through it. In the Advanced Speech Macros folder there is a file called advanced commands. This is what you are after. It is Robs script, but it has been edited slightly so it does not clash with my software. It can “Play Album\Song\Artist\Genre”. It interacts with WMP, but because WMC uses WMP, it also changes the track in WMC. I have found it to crash every so often on me, and it can be a little unreliable in understanding what I wan’t. It expands the amount of commands considerably. I have taken it off my home system now, I never really found it all that useful for how I use the system.
    Some of the unreliablity of the script may be that I am using a UK engine as there was not one available for Austrlaians. It took me weeks to train it and I lost my voice on numerous occasions. If you are from US or UK you might have a whole lot more success.
    Eventually I want to write the software in c#. That should respond quicker with the system and decrease the chances of the system stuffing up. It’s a work in progress.

  15. PDKY said, on November 21, 2011 @ 2:32 pm

    Chris,

    You are right. I tried the macro of Bob’s that you edited. It does clash pretty bad so I deleted it. Maybe something will pop up for us that we can use.

  16. mike said, on December 5, 2011 @ 11:51 pm

    where do you find the media center .xml files? i like to get my media center on voice control

  17. Chris Duckworth said, on December 11, 2011 @ 5:52 pm

    They are in the advanced spepch macros folder of voice control.zip. You will most likely just want the media center.xml to start off with.

  18. Paul said, on December 22, 2011 @ 8:49 am

    Chris,

    I noticed when watching your video that the status on the wsr bar goes to off when you give commands. Then it automatically wakes up when you give the next command. My problem is when I give a command it goes to sleep and stays asleep. I actually have to manually click to get it to start listening again. What am I doing wrong?

  19. Chris Duckworth said, on January 11, 2012 @ 12:40 pm

    Press the 0 button on your remote or keyboard and that will toggle it on and off, as well as do a bunch of other stuff which makes the system work.

  20. Neil said, on January 18, 2012 @ 3:57 am

    Get set of programming! I’m trying to use your programs to run a car PC setup for voice control over my music files. But, because I also use the car PC for my navigation, having the “0” key be the toggle is a problem since the “0” key will often be necessary for street address inputs. I’d like to use the oemtilde (“~”) character instead as the toggle. I tried editing the ahk file to use the “~” as the trigger, but it appears that the “0” and “]” still remain as toggles. I’m guessing that they are also programmed into your “Speech Recognition Control.exe” file. How do I recompile the “Speech Recognition Control.exe” file and reprogram the ahk file to use only the “~” symbol as the toggle?

  21. Chris Duckworth said, on January 19, 2012 @ 12:43 am

    Hi Neil,

    You need to download autohotkey. The speech recognition control is a compiled autohotkey script, so once autohotkey is installed, you will be able to compile the script with the ~ being the trigger button. The new autohotkey script will replace the speech recognition control.exe file. I did find in the car that some of the functions needed slight alterations. Where the script turns on my TV when it is first pressed, it should jump that bit and go straight to opening up the voice trigger. It’s all in the first few lines. Let me know if you need further help. I have been playing around a bit with car stuff. I hope it doesn’t tangle up your navigation too much, but if it does clash, changing the shortcuts should rectify it.

    Cheers
    Chris

  22. chef said, on January 19, 2012 @ 8:27 am

    Hello again! Great work! I’ve been following your progress for quite a while now :)
    Just have a quick question for you. In your post above you mentioned that you added more advanced commands for Media Player. Does that mean we can ask for songs/artist by name and they will play?

  23. Chris Duckworth said, on January 19, 2012 @ 5:40 pm

    Hi Chef, thanks for the encouragement. The advanced commands will play song, genre and artist, but it is a bit buggy. I think it just gets in a tangle trying to figure out the command between so many different band names. I had it running on my computers for a while and it worked most of the time, but the occasional crash of just that script was enough for me to remove it. You may find that if your computer is newer, you will have more luck with it. Cheers Chris

  24. gaz said, on January 21, 2012 @ 2:12 pm

    just found your site. I hope to start using it soon. I just wanted to say thank you. I do not understand what you are talking about but will give it a try. I am surprised that it has not been addressed by big companies as using a blue tooth mouse in bed is a pain in the … well done and thanks in again. ozy ozy ozy

  25. Chris Duckworth said, on February 8, 2012 @ 7:14 pm

    Yep, it would be great to use from the bed. I have considered long and hard why the major companies haven’t picked it up and I have my theories. It’s hard for people to adjust to for starters, as there is so many commands to learn even if it is plain english… but the list of reasons is much longer. Maybe they wanted to do it without my gadget, though it’s not really possible to have voice control working in any practical way without it. oi oi oi

  26. Neil said, on February 14, 2012 @ 8:06 am

    Hi Chris,

    I finally got back to this…been busy. About my post of 1/18/2012, what I tried doing is taking the “Speech recognition control.ahk” file, opening it up in notepad and edited the references to “0” and “]” to reflect the oemtilde (“~”) character instead as the toggle. I then recompiled the script using autohotkey and then replaced the “Speech recognition control.exe” file with the newly compiled one. For some reason, that did not remove “0” as the trigger, and sometimes “~” did work, but not consistently. I’m guessing my editing is wrong/incomplete. Any suggestions?

  27. Chris Duckworth said, on February 14, 2012 @ 9:34 am

    Hi Neil, if you aren’t using a USB-UIRT, you will want to use the -no eventghost script. The other thing that gets me is that the ~ requires shift + `. It may be worth using ` as the trigger instead of ~ because the script may be detecting multiple shift presses, as many of the shortcuts within the script use the shift key.

    Let me know how you go. So long as you tell me if you are using a USB-UIRT, I am happy to edit the file for you.

  28. Richard said, on April 9, 2012 @ 5:53 am

    Hi Chris, Great piece of work – like other commenters I am looking forward to the day when you can say “play – name of song”

    I just started playing with this today – hoping to get it working for me. I have hopefully a simple problem in that the script cannot find the nircmd.exe file. I have manually put it in /windows/system32 and also done the “copy to windows directory”. Also,when the script is running I lose some of my punctuation keys e.g. ‘[] – is that expected behaviour?

    Thanks a lot,

    Richard

  29. Chris Duckworth said, on April 24, 2012 @ 10:37 am

    Hi Richard, It is probably because my documents folder is in a different spot to yours. If you unzip the zip file into your speech macros/programs folder, I think it would be alright. The way to redirect it though is to go through the autohotkey script and change the path to where you have nircmd. It is expected to lose some keyboard buttons using this method. I am not too sure how to get around this without using a different programming language. I tried to pick keys that weren’t used very often in a media pc setup.
    The play song functionality does work some of the time using the special commands macro, but it does struggle a bit with large music collections. I eventually came to the conclusion that I used the special functions so infrequently, I uninstalled them and went back to the basic media and power scripts.
    Cheers
    Chris

  30. Lincoln Jones said, on May 1, 2012 @ 4:49 am

    Hello Chris,

    I love this project, and I started setting it up already. I’m having an issue with my microphone though. I got a Logitech desktop mic, and it really isn’t cutting it for picking up my voice. I’d really like to keep the mic mounted in my entertainment center, but the mic basically needs to be a foot or two in front of me to work. So I was hoping you (or someone else) had suggestions for a different mic to use. I’ve seen some mics that are made to work with field recorders that I thought might work well, but I’m not sure if those are made for close recording or if they can pick up a room’s worth of sound. The other idea I had is possibly using a parabolic microphone that’s focused on the area I sit in usually.

    Thought I’d pick your brain about this, or maybe you can go into more detail about your microphone setup and how you got it to work well for you? Thanks!

  31. mike said, on May 1, 2012 @ 9:40 am

    i followed the step but when i try to install macros and try to past the media center xml i got a error when i try to use the media center the macros didnt want to work how do i solve this or what are the advanced steps

  32. Chris Duckworth said, on May 1, 2012 @ 3:45 pm

    Are you getting a particular error. The only time they aren’t working for me is when the mic isn’t plugged in. Is Windows Speech Recognition working? I would start off with only one macro, and that is the media center – no eventghost one. You do need to manually copy and paste the text within the macro to a new macro because of Windows security. Copying the file only will not work. I hope this helps

  33. Chris Duckworth said, on May 1, 2012 @ 3:54 pm

    Hi Lincoln, There are a great deal of mics on the market that will do the trick. The reason I used the logitech was because it was a condensor mic that was very cheap. I don’t know if the problem will be with the mic itself. Can you use an audio recorder (audacity) or skype test call to see if you are getting enough volume. The reason I put it behind my couch was to make it close enough to hear me, though it does pick up from anywhere in the room. If you get a good signal through that, then it is probably a case of training the system, from where you will be using the system.
    The best mics I have seen for the job are probably stage surround mics. The problem is that they require phantom power and will pick up everything. At a cost of at least $300, and then the rest of the gear to power them, it will start getting too expensive for most who haven’t benifeted fromt he system in cheap form.

    It took me around 3 months to get the system to understand me as well as I thought I could get it. During this time, I lost mny voice 3 times. It has unexpectedly continually improved since then. Once a good voice profile is created, that can be copied from computer to computer very quickly, so no secondary training is required. It’s worth putting in the time now, to reduce frustration further down the track.

  34. Eddy Carroll said, on May 1, 2012 @ 6:46 pm

    Hi Lincoln,

    On the subject of mics, I mentioned our Amulet Voice Remote earlier in the discussion but it’s perhaps worth mentioning again since that was nine months ago. We integrate a wireless mic with a learning IR remote, aimed at exactly this sort of scenario.

    One advantage is that we use an accelerometer to turn on and off the mic automatically so that the speech recognizer isn’t listening to random background sounds when you’re not speaking commands; in our testing, this dramatically improves reliability.

    While we bundle our own voice control software to add voice control to Windows Media Center, our hardware works well with any speech software that uses a standard audio input device, such as Chris’s neat solution – no special drivers needed.

    More info at http://www.amuletdevices.com/ if you’re interested (price is currently US$149).

    Hope this isn’t too commercial; I figured it’s probably of interest to at least some of the readers. (If anyone is using an Amulet with Chris’s scripts, I’d love to hear how you’re getting on.)

    Eddy Carroll
    Amulet Devices

  35. Lincoln Jones said, on May 3, 2012 @ 1:00 am

    Yeah, the issue was volume for sure. I tried using my soundcard to boost the input volume, but it added too much noise to be intelligible. I ended up ordering this mic:
    http://www.mxlmics.com/microphones/web-conferencing/AC-404/

    You mentioned boundary mics in your post, and all the reviews rave about how well it can pick up voices (and everything else) from a good distance. So I’m hoping it’ll work well for my needs. I’ll report back with my findings.

    (I’m also a musician, so I can justify spending a good amount on the microphone as I can use it for home recording as well.)

  36. Chris Duckworth said, on May 3, 2012 @ 9:38 am

    That mic looks good. I didn’t know they made uch things, but it is likely going to do you well.

    The only concern I have about it is that being USB, it will be hard to put a noise gate on it or a parametric eq, but you may well get away without them.

    And please do let me know how it goes. Might get one of these myself.

  37. Jason said, on August 22, 2012 @ 6:16 am

    Thanks for this tutorial and utility! The only issue I have is that SpeechRecognitionControl.exe renders several keys on my keyboard inoperable; In addition to ‘0’, I can’t use: -, =, ‘

    oh, and the inability to control playlists via speech. Other than that, it’s very cool!!

  38. Joe Winograd said, on August 28, 2012 @ 12:27 am

    Hi Chris,
    First, let me say – GREAT STUFF! I’m modifying your [Speech recognition control.ahk] script to work in my environment and I think I have everything nailed down except the WakeonLAN parameter for EventGhost. It appears to be a MAC address, but I don’t know for what device. Please let me know. Thanks much, Joe

  39. Joe Winograd said, on August 28, 2012 @ 5:18 am

    Chris,
    Another question: Is there any advantage in using EventGhost? Since your AHK script supports both EG and non-EG environments (and you have both EG and non-EG versions of the pre-compiled EXE), I’m wondering what the value of EG is in this case. I do have EG on other machines (to emulate some keystrokes on a non-WMC remote for use with WMC), but if I can avoid EG with your solution (and not lose any functionality!), I’d prefer that. Thanks again, Joe

  40. Chris Duckworth said, on August 28, 2012 @ 9:01 am

    It wouldn’t be hard to change the script to use more obscure shortcuts, but when back ont he couch, those keys that don’t work are rarely used. When I update the scripts next, I will likely use more obscure ones so that it is not an issue, but this system was designed to be used away from a keyboard.

  41. Chris Duckworth said, on August 28, 2012 @ 9:02 am

    Hi Joe, Thanks for the feedback. The MAC address will need to be changed to match your computer MAC address. The wired connection fro the machine you want to turn on. Currently, it’s pointing to my gaming computer.

  42. Chris Duckworth said, on August 28, 2012 @ 9:10 am

    The non eventghost script is more of a try before you put effort in scenario. If you like it’s functionality then I would highly encourage you to go the eventghost option. It allows for a much broader and functional system. You can control multiple computers with the one voice control system by using the network sender features. The is how I turn on other computers in the house and open programs on them, even though I am not talking with them directly. In time, it will allow for much more. It may well be that the eventghost media player functions are more reliable than the keyboard shortcuts I have used, so it may be worth trying them.

    If you have eventghost on other computers, you can expand their trees to include the extra stuff you need. You can copy and paste my tree into yours so long as there aren’t any clashes. I like that you said you were editing the ahk file. This is exactly the attitude to have with this stuff. By going through those steps you can make the system do anything you want….Anything. Different things will work better or worse in different environments. This is what is working for me now, but I know when I move house, I will be making some substantial additions and alterations.

    2 years without a remote control in my hand. There is no turning back.

  43. Joe Winograd said, on August 28, 2012 @ 11:12 am

    Chris,
    Thanks for the fast reply – very much appreciated! My HTPC is a laptop that runs 24×7 (connected via HDMI to a big screen TV), so turning the PC off and on is not an issue – guess I won’t be needing that chunk of code. :)

    One of the great things about your solution for me is that I’ve already been using all of the software that you selected – AutoHotkey, NirCmd (I use many NirSoft utilities), and EventGhost. The reason why I asked about a non-EG solution is that I plan to install EG only when needed, because I’ve had the occasional glitch with it (unlike AutoHotkey and the NirSoft utils, which are on all of my machines). Having your AHK script is wonderful, as I’m able to change the locations to match my system (even something as simple as instead of ).

    Back to EG, I haven’t used it much (just got that one remote control working with it) and was not aware of its ability to expand trees on other computers to include additional branches. I’ll have to take a harder look at EG’s capabilities – I know it’s a very robust piece of software.

    In terms of more obscure shortcuts, I ran into the same issue as Jason, although I acknowledge that this is because I’m testing in a sandbox where keyboard use is critical. My idea on this is to program the needed functions using obscure shortcuts on my programmable remote (Universal Remote Control’s URC-SR3), thereby freeing up keys like 0, -, *, et al. Someday, I may reach your nirvana on the HTPC with no keyboard and no remote control – just voice!

    I’d like your opinion on one of my objectives with your solution, namely, to have an all-in-one PC mounted on a swing-arm in the kitchen to be used largely as a TV with W7/WMC. My vision is that the user (also known as my better half) will be wearing a Bluetooth-based stereo headphone/microphone, such as the Kinivo BTH220. Voice control is critical for this application, as the user is an awesome cook and baker, and hands are typically in no condition to be touching a remote control – already had to replace the TiVo remote three times! (I was intrigued with Eddy Carroll’s posts about their Amulet Voice Remote, which would be great for the family room machine, but, for the reason just cited, not the kitchen machine.) I know that you recommend a good quality mic, which I realize is crucial to voice recognition accuracy, so I’m wondering how you feel about the Bluetooth headphone/mic idea, or if you have any other ideas for my kitchen project. Thanks again, Joe

  44. Joe Winograd said, on August 28, 2012 @ 11:19 am

    This board apparently doesn’t like the less-than and greater-than symbols, which I tend to use to delineate path names. So the “even something as simple as instead of” comment in the preceding post should have said “even something as simple as [Program Files (x86)] instead of [Program Files]”.

  45. Chris Duckworth said, on August 28, 2012 @ 11:55 am

    Hi Joe,
    Eventghost is very powerful, thoguh I too have had a couple of minor stability issues with it. Nothing enought to not use it though. The more time you spend with it, the more you will realise is possible with it.
    The shortcuts should all be able to be changed to ctrl+alt+something combinations apart from 1 of them(currently [) which will be the input. This one key to trigger is required as Eventghost will disable the keyboard with the keyboard shortcut sending plugin. It’s been a long time since I first set this up so the memory is a bit hazy.
    Voice control in the kitchen is a very useful thing, but it comes with a great deal of challenges. Probably more than anywhere else in the house. The headset will solve some of the problems, though I did design this system so that I could use it when I am theoretically naked with nothing in my hands or my person. My gadget was designed to change the environment to make voice control work instead of previous solutions that expect the user to live in a non realistic environment. For voice control to work in the kitchen, I beleive a trigger is still needed. This could be a hip press button, a foot button, a wave of the arm using a kinect. Whatever option is chosen, a trigger is required. The alternative is that the headset will pick up random kitchen noise as false positives for commands, and perform them. The trigger may even breifly turn off the rangehood and other things like that, but nothing will stop the chips from sizzling away. I have come up with a few solutions for this, but I am going to try them out before sharing them here. The wiring is going in as we speak.
    Chris





Leave a Comment

Name: (Required)

E-mail: (Required but will not be published)

Website:

Comment: