Hi Emilian!
I have a strange problem with composition with the fsa tool - It says that " 'some word' is not in second alphabet". What does it mean? Could you please have a look at the files that I'm trying to compose - a dictionary transducer and a language model transducer. I've also attached the error log.
The command I use is fsa att:Dict_2003+.extended.fsm.txt att:lm.txt compose write att:LoG.txt
P.S. What is "the second alphabet" anyway? The output or the input alphabet of the second transducer?
The "warnings" (yes, I should make that more explicit) appear due to the fact that the lexicon contains more symbols than the language model and FSA does NOT silently ignore that fact by default. Internally, you can disable those warnings. However, if you know that your lexicon contains more words (look at the sizes of alphabets through "info") than you can safely ignore them as FSA does exactly what you want. In your case I doubt that. I see at least two mistakes: 1. FSA still interprets AT&T's format by assuming that 0 is the initial state (This behaviour is wrong. I will go and fix that soon). 2. Your lexicon contains 9589 symbols, but your language model has only 3689 and I suggest that you wish to map unknown words to the unknown class. FSA does not do this on its own unless you use the failure symbol *FAIL* instead of UNK or use an intermediate transducer that maps lexicon words to lm words (you can use the map-fsa script to automate that). Cheers, Stephan -- NEU +++ DSL Komplett von GMX +++ http://www.gmx.net/de/go/dsl GMX DSL-Netzanschluss + Tarif zum superg�nstigen Komplett-Preis!
participants (1)
-
Stephan Kanthak