Hi, again.
After getting familiar with the FSA toolkit, it seems to be a very
nice toolkit. However, I have one question. When big transducers are
composed and determinized, is normally necessary to some additional
operations to improve the performance of the FSA toolkit? Or should
"compose", "trim" and "determinize" be enough?
I have an acyclic pronunciation lexicon transducer (50 input labels,
24362 output labels, 20784 states, 45145 arcs) with disambiguation
symbols, and a language model acceptor (24363 labels, 729323 states,
5640013 arcs). Composing them with the AT&T toolkit (using "dmake -a
lex -b lm") takes 4m35s with peak memory consumption around 1,4G. The
RWTH FSA tool, on the other hand, got up to 11G in 15 minutes after
which I stopped the process before it started to use swap. I used the
following command line:
fsa --progress=yes bin:lex closure bin:lm compose trim determinize \
write bin:composition
The system is SuSE Linux 9.0 (x86-64).
Any ideas if I am missing something? Perhaps I have to look more
carefully that the conversion between formats is ok. At least the
number of states, arcs, input-epsilon-arcs and output-epsilon-arcs
seems to match.
--
Teemu Hirsimäki
I managed to compile the RWTH FSA Toolkit on a SuSE 9.1 Linux system,
but I did have to make a few changed in the code. I had to add the
line
#include <cc++/tokenizer.h>
in files
src/Core/Configuration.cc
src/Core/Directory.cc
src/Fsa/Storage.cc
Then I also had to edit the file "config/os-linux.make" and change the
line
LDFLAGS += $(shell ccgnu2-config --libs)
to
LDFLAGS += $(shell ccgnu2-config --stdlibs)
as the "--libs" flag did not seem to add enough libraries for the
Common C++. In particular, the "-lccext2" flag was missing. The
version of the Common C++ library was 1.3.1.
--
Teemu Hirsimäki
+358 50 3667288
The top level README file has a short C++ example of using the
library. It contains three bugs:
* Must use "f->" instead of "f." everywhere
* f->setInputAlphabet() wants ConstAlphabetRef instead of (StaticAlphabet*)
* It seems that setId() must be called for each state manually.
Here is the correct (hopefully) version of the code:
using namespace Fsa;
StaticAutomaton *f = new StaticAutomaton;
f->setType(TypeAcceptor);
f->setSemiring(TropicalSemiring);
StaticAlphabet *a = new StaticAlphabet();
f->setInputAlphabet(ConstAlphabetRef(a));
State *sp1 = new State(), *sp2 = new State(StateTagFinal);
sp1->setId(0);
sp2->setId(1);
sp1->newArc(sp2->id(), f->semiring()->one(), a->addSymbol("a"));
sp2->newArc(sp2->id(), f->semiring()->one(), a->addSymbol("a"));
f->setState(sp1);
f->setState(sp2);
f->setInitialStateId(sp1->id());
ConstAutomatonRef fr = ConstAutomatonRef(f);
--
Teemu Hirsimäki
morning all,
i believe i've discovered a bug in fsa-0.9.1 concerning i/o
of (default) weights (using CountSemiring). I understand
that default weights are not written to output files
in order to minimize i/o, but i'd like to load the same
automata that i've previously saved...
attached are two automata files which illustrate
the problem -- the first (example-in.xml) was created
by hand, the second (example-out.xml) is the erroneous
output of:
bash$ ./fsa.linux-intel-standard example-in.xml write -
marmosets,
Bryan