myVoice Plugin: Progress report (1 Viewer)

booyakasha · May 21, 2006

Hey all,

As I've stated on another thread, I've started working on a speech recognition system for MP utilizing MS Speech 5.1.
Currently it does the following:

When you start MP, the only thing it understands is the keyword: "my computer"
If you say the keyword, it will change to understanding the following:
-move up
-move down
-move left
-move right
-previous menu
-home
-exit
-my music
-my videos
-settings
-select item
etc...(many words not tested yet)

Phrases such as "my music" will take you to that window from whatever screen you're on.
If you don't say anything it understands in 5 seconds, it reverts back to just understanding the keyword.

My next steps are:
-further explore MP's interface and add the corresponding voice commands
-add modifiers (ex: move down three)
-integrate playlist/video/music selection through artist, genre, etc
-multi languages (could be SDK specific though)?

What I could use:
-General feedback
-beta tester(s)
-collaborators

I'm trying not to release a large public beta on this as I'm looking at an easier way to distribute the MS Speech engine as most people don't need most of the SDK.

That's about it for now. I'll use this thread to update my progress, so please feel free to post or PM me.

smnnekho · May 21, 2006

needless to say: this will be (the hell of an) improvement.

i would love to stand ready for your call as a beta tester for your work and of course with any kind of help and ideas i could possibly provide!!

aasmund Nordal · May 21, 2006

I just love this

Would it be possible for the end user to choose speech engine, or do we have to use microsoft's?

In my opinion multilangual should not be a priority.
English and French is the most important in the begining.

guilhem · May 22, 2006

hello !

I've already work on a similar plugin but i 've a problem with the popup and using voice remote. In fact, MP froze when an popup menu (F9) opened.
Have you solve this problem or there isn't similar problem on your plugin.

I think there isnot french speech engine ( i don't find it) perhaps an french canadian ( i have read on an forum but i don't fint to)

good job

eagle · May 22, 2006

aasmund Nordal said:
In my opinion multilangual should not be a priority.
English and French is the most important in the begining.

In my opinion multilangual is important, because it increases the acceptance of the family. My two sons (10 and 8 years old) don't speak English or French, so it is a must.

If it works, they won't use IR oder Touchscreen anymore

eagle

smnnekho · May 22, 2006

imho, the highest priority should be to make this as configurable as even possible. this depends on how the 'keywords' are stored. if they are internal, there would be no (t really) chance for changing them, but if they would be stored external with referral id's it would be customizable. (at least if you say 'my computer' , 'casseiopeia', 'dude', 'KITT' or whatever (-; as activating keyword etc.)

a question related to the engine: i guess it really recognizes the words, meaning you don't have to record a command and save it as keyword (which wouldn't be very (multi-) user friendly but have written 'My computer' in the source and the engine just recognizes it.?

zion22 · May 22, 2006

MS Speech if speech-to-text, so it will decode the speech to text, no need to "record" a command. The downside is that you need to speak English pretty good (Maybe not for these simple commands tho).

Waiting with great expectations.

booyakasha · May 22, 2006

guilhem: I'm not using sendkeys. In an early test I ran into this freezing problem as well using that method. Now I just send messages (gui and action) to MP directly.

smnnekho: currently none of the recognition is really hardcoded except the switch between recognition modes. An xml file contains many of the mappings from MP's actions/windows to voice, so it's quite easy to change something like "my music" to "my tunes", or "my computer" to "KITT"

aasmund Nordal: It's built specifically for the MS Speech 5.1 engine (free and easy to integrate with C#). If I run into too many problems down the line, especially with multi-language support, I'll look into using another engine (sphynx?).

As for the multi-language support, that's something I'll probably look at in more detail once I get the full functionality working.

smnnekho · May 22, 2006

i would second that multilanguage support shoudln't have priority. imho english would please most of us (and i'm in fact no native speaker)

priority should be to get this one working fine in one language before getting to work for other ones. of course all this ain't true if implementing other languages wouldn't be that much of a problem...

2 questions / 1 suggestion :

will the engine recognize a specific word even if it's a whole sentence? just because of the effect on visitors (-; much more impressive to build the words (not the activating keyword) in a whole sentence..?

will the XML files containing the 'keywords' and the 'gui.actions' be universal? meaning that you can add your own commands even for your own gui.actions? or will you implement every thinkable action anyway?? (-;

suggestion (only if you didn't thought about it:

i would think about a tiny sound-prompt. meaning when the computer is ready to take commands afer saying "my computer" (kitt was just a joke by the way) a small beepbeep like in star trek would be cool. should be as gentle as possible though, so it doesn't start to bother. and another signal for the end of the 'hearing sequence' so that if something went wrong you don't keep talking for 10 minutes (-;

booyakasha · May 23, 2006

will the engine recognize a specific word even if it's a whole sentence? just because of the effect on visitors (-; much more impressive to build the words (not the activating keyword) in a whole sentence..?

Not quite sure what you mean here. Currently I have a 1 to 1 relationship between a phrase/word to an action, although I'll add more. ex: a cursor up command could be mapped to both "move up" and "up"

will the XML files containing the 'keywords' and the 'gui.actions' be universal? meaning that you can add your own commands even for your own gui.actions? or will you implement every thinkable action anyway?

The gui/window actions currently mimic the structure already contained in a file called keymap.xml found in your mediaportal folder. I'm basically adding a <voice> tag to that (through a secondary file, so as not to change the original). For me to send MP a command, it has to know about it. I'm not sure yet how this applies to other peoples plugins.

As for your last suggestion, the last thing I added was a beep when switching to full recognition mode, and an exclamation sound when switching to "my computer" mode. I'm still not completely happy with that solution, as I think a visual representation would be better, but I'm not sure how to do that at this point.

myVoice Plugin: Progress report (1 Viewer)

booyakasha

Portal Pro

smnnekho

Retired Team Member

aasmund Nordal

Portal Pro

guilhem

Portal Member

eagle

Portal Pro

smnnekho

Retired Team Member

zion22

Portal Pro

booyakasha

Portal Pro

smnnekho

Retired Team Member

booyakasha

Portal Pro

Users who are viewing this thread