Hi, again.
After getting familiar with the FSA toolkit, it seems to be a very
nice toolkit. However, I have one question. When big transducers are
composed and determinized, is normally necessary to some additional
operations to improve the performance of the FSA toolkit? Or should
"compose", "trim" and "determinize" be enough?
I have an acyclic pronunciation lexicon transducer (50 input labels,
24362 output labels, 20784 states, 45145 arcs) with disambiguation
symbols, and a language model acceptor (24363 labels, 729323 states,
5640013 arcs). Composing them with the AT&T toolkit (using "dmake -a
lex -b lm") takes 4m35s with peak memory consumption around 1,4G. The
RWTH FSA tool, on the other hand, got up to 11G in 15 minutes after
which I stopped the process before it started to use swap. I used the
following command line:
fsa --progress=yes bin:lex closure bin:lm compose trim determinize \
write bin:composition
The system is SuSE Linux 9.0 (x86-64).
Any ideas if I am missing something? Perhaps I have to look more
carefully that the conversion between formats is ok. At least the
number of states, arcs, input-epsilon-arcs and output-epsilon-arcs
seems to match.
--
Teemu Hirsimäki