Composing and determinizing efficiently
Hi, again. After getting familiar with the FSA toolkit, it seems to be a very nice toolkit. However, I have one question. When big transducers are composed and determinized, is normally necessary to some additional operations to improve the performance of the FSA toolkit? Or should "compose", "trim" and "determinize" be enough? I have an acyclic pronunciation lexicon transducer (50 input labels, 24362 output labels, 20784 states, 45145 arcs) with disambiguation symbols, and a language model acceptor (24363 labels, 729323 states, 5640013 arcs). Composing them with the AT&T toolkit (using "dmake -a lex -b lm") takes 4m35s with peak memory consumption around 1,4G. The RWTH FSA tool, on the other hand, got up to 11G in 15 minutes after which I stopped the process before it started to use swap. I used the following command line: fsa --progress=yes bin:lex closure bin:lm compose trim determinize \ write bin:composition The system is SuSE Linux 9.0 (x86-64). Any ideas if I am missing something? Perhaps I have to look more carefully that the conversion between formats is ok. At least the number of states, arcs, input-epsilon-arcs and output-epsilon-arcs seems to match. -- Teemu Hirsimäki
participants (1)
-
Teemu Hirsimäki