myVoice Plugin: Progress report (2 Viewers)

booyakasha

Portal Pro
December 6, 2005
153
12
Canada
Home Country
Canada Canada
Hey all,

As I've stated on another thread, I've started working on a speech recognition system for MP utilizing MS Speech 5.1.
Currently it does the following:

When you start MP, the only thing it understands is the keyword: "my computer"
If you say the keyword, it will change to understanding the following:
-move up
-move down
-move left
-move right
-previous menu
-home
-exit
-my music
-my videos
-settings
-select item
etc...(many words not tested yet)

Phrases such as "my music" will take you to that window from whatever screen you're on.
If you don't say anything it understands in 5 seconds, it reverts back to just understanding the keyword.

My next steps are:
-further explore MP's interface and add the corresponding voice commands
-add modifiers (ex: move down three)
-integrate playlist/video/music selection through artist, genre, etc
-multi languages (could be SDK specific though)?

What I could use:
-General feedback
-beta tester(s)
-collaborators

I'm trying not to release a large public beta on this as I'm looking at an easier way to distribute the MS Speech engine as most people don't need most of the SDK.

That's about it for now. I'll use this thread to update my progress, so please feel free to post or PM me.
 

smnnekho

Retired Team Member
  • Premium Supporter
  • February 6, 2006
    507
    7
    40
    Germany
    needless to say: this will be (the hell of an) improvement.

    i would love to stand ready for your call as a beta tester for your work and of course with any kind of help and ideas i could possibly provide!!
     

    aasmund Nordal

    Portal Pro
    June 20, 2005
    204
    0
    Norway
    Home Country
    Norway Norway
    I just love this :D

    Would it be possible for the end user to choose speech engine, or do we have to use microsoft's?

    In my opinion multilangual should not be a priority.
    English and French is the most important in the begining.
     

    guilhem

    Portal Member
    April 2, 2006
    23
    0
    44
    France
    hello !

    I've already work on a similar plugin but i 've a problem with the popup and using voice remote. In fact, MP froze when an popup menu (F9) opened.
    Have you solve this problem or there isn't similar problem on your plugin.

    I think there isnot french speech engine ( i don't find it) perhaps an french canadian ( i have read on an forum but i don't fint to)

    good job
     

    eagle

    Portal Pro
    September 25, 2004
    603
    79
    Unterfranken
    Home Country
    Germany Germany
    aasmund Nordal said:
    In my opinion multilangual should not be a priority.
    English and French is the most important in the begining.

    In my opinion multilangual is important, because it increases the acceptance of the family. My two sons (10 and 8 years old) don't speak English or French, so it is a must.

    If it works, they won't use IR oder Touchscreen anymore :D :D

    eagle
     

    smnnekho

    Retired Team Member
  • Premium Supporter
  • February 6, 2006
    507
    7
    40
    Germany
    imho, the highest priority should be to make this as configurable as even possible. this depends on how the 'keywords' are stored. if they are internal, there would be no (t really) chance for changing them, but if they would be stored external with referral id's it would be customizable. (at least if you say 'my computer' , 'casseiopeia', 'dude', 'KITT' or whatever (-; as activating keyword etc.)

    a question related to the engine: i guess it really recognizes the words, meaning you don't have to record a command and save it as keyword (which wouldn't be very (multi-) user friendly but have written 'My computer' in the source and the engine just recognizes it.?
     

    zion22

    Portal Pro
    April 6, 2006
    157
    2
    51
    Home Country
    Sweden Sweden
    MS Speech if speech-to-text, so it will decode the speech to text, no need to "record" a command. The downside is that you need to speak English pretty good (Maybe not for these simple commands tho).

    Waiting with great expectations. :D
     

    booyakasha

    Portal Pro
    December 6, 2005
    153
    12
    Canada
    Home Country
    Canada Canada
    guilhem: I'm not using sendkeys. In an early test I ran into this freezing problem as well using that method. Now I just send messages (gui and action) to MP directly.

    smnnekho: currently none of the recognition is really hardcoded except the switch between recognition modes. An xml file contains many of the mappings from MP's actions/windows to voice, so it's quite easy to change something like "my music" to "my tunes", or "my computer" to "KITT"

    aasmund Nordal: It's built specifically for the MS Speech 5.1 engine (free and easy to integrate with C#). If I run into too many problems down the line, especially with multi-language support, I'll look into using another engine (sphynx?).

    As for the multi-language support, that's something I'll probably look at in more detail once I get the full functionality working.
     

    smnnekho

    Retired Team Member
  • Premium Supporter
  • February 6, 2006
    507
    7
    40
    Germany
    i would second that multilanguage support shoudln't have priority. imho english would please most of us (and i'm in fact no native speaker)

    priority should be to get this one working fine in one language before getting to work for other ones. of course all this ain't true if implementing other languages wouldn't be that much of a problem...

    2 questions / 1 suggestion :

    will the engine recognize a specific word even if it's a whole sentence? just because of the effect on visitors (-; much more impressive to build the words (not the activating keyword) in a whole sentence..?

    will the XML files containing the 'keywords' and the 'gui.actions' be universal? meaning that you can add your own commands even for your own gui.actions? or will you implement every thinkable action anyway?? (-;

    suggestion (only if you didn't thought about it:

    i would think about a tiny sound-prompt. meaning when the computer is ready to take commands afer saying "my computer" (kitt was just a joke by the way) a small beepbeep like in star trek would be cool. should be as gentle as possible though, so it doesn't start to bother. and another signal for the end of the 'hearing sequence' so that if something went wrong you don't keep talking for 10 minutes (-;
     

    booyakasha

    Portal Pro
    December 6, 2005
    153
    12
    Canada
    Home Country
    Canada Canada
    will the engine recognize a specific word even if it's a whole sentence? just because of the effect on visitors (-; much more impressive to build the words (not the activating keyword) in a whole sentence..?
    Not quite sure what you mean here. Currently I have a 1 to 1 relationship between a phrase/word to an action, although I'll add more. ex: a cursor up command could be mapped to both "move up" and "up"

    will the XML files containing the 'keywords' and the 'gui.actions' be universal? meaning that you can add your own commands even for your own gui.actions? or will you implement every thinkable action anyway?
    The gui/window actions currently mimic the structure already contained in a file called keymap.xml found in your mediaportal folder. I'm basically adding a <voice> tag to that (through a secondary file, so as not to change the original). For me to send MP a command, it has to know about it. I'm not sure yet how this applies to other peoples plugins.

    As for your last suggestion, the last thing I added was a beep when switching to full recognition mode, and an exclamation sound when switching to "my computer" mode. I'm still not completely happy with that solution, as I think a visual representation would be better, but I'm not sure how to do that at this point.
     

    Users who are viewing this thread

    Top Bottom