Problem with the composition

17 Dec 2004


      Hi Emilian!
...
I have a strange problem with composition with the fsa tool - It
says that  " 'some word' is not in second alphabet". What does it
mean? Could you please have a look at the files that I'm trying to
compose  - a dictionary transducer and a language model transducer.
I've also attached the error log.
The command I use is
fsa att:Dict_2003+.extended.fsm.txt att:lm.txt compose write att:LoG.txt
P.S. What is "the second alphabet" anyway? The output or the input
alphabet of the second transducer?
The "warnings" (yes, I should make that more explicit) appear due to the
fact that the lexicon contains more symbols than the language model and
FSA does NOT silently ignore that fact by default. Internally, you can
disable
those warnings. However, if you know that your lexicon contains more words
(look at the sizes of alphabets through "info") than you can safely ignore
them as FSA does exactly what you want. In your case I doubt that. I see at
least two mistakes:

    1. FSA still interprets AT&T's format by assuming that 0 is the initial
state
        (This behaviour is wrong. I will go and fix that soon).

    2. Your lexicon contains 9589 symbols, but your language model has
        only 3689 and I suggest that you wish to map unknown words to the
        unknown class. FSA does not do this on its own unless you use the
        failure symbol *FAIL* instead of UNK or use an intermediate
transducer
        that maps lexicon words to lm words (you can use the map-fsa script
        to automate that).

Cheers,
Stephan


-- 
NEU +++ DSL Komplett von GMX +++ http://www.gmx.net/de/go/dsl
GMX DSL-Netzanschluss + Tarif zum superg�nstigen Komplett-Preis!

Stephan Kanthak

tags

participants (1)