morning all,
i believe i've discovered a bug in fsa-0.9.1 concerning i/o
of (default) weights (using CountSemiring). I understand
that default weights are not written to output files
in order to minimize i/o, but i'd like to load the same
automata that i've previously saved...
attached are two automata files which illustrate
the problem -- the first (example-in.xml) was created
by hand, the second (example-out.xml) is the erroneous
output of:
bash$ ./fsa.linux-intel-standard example-in.xml write -
marmosets,
Bryan
Hello!
I'm trying to compile fsa on:
Debian 2.2.20 (woody) with gcc 3.4. Platform: PC. I get such errors:
make[1]: Entering directory `/home/marcin/PTX/fsa/fsa-0.9.1/src'
../Rules.make:99: warning: overriding commands for target `install'
Makefile:30: warning: ignoring old commands for target `install'
make -C Core libSprintCore.linux-intel-standard.a
make[2]: Entering directory `/home/marcin/PTX/fsa/fsa-0.9.1/src/Core'
compiling Configuration.cc
Configuration.cc:111:2: warning: #warning is a GCC extension
Configuration.cc:111:2: warning: #warning
"Configuration::Resource::match() fails to match *.A.B with A.A.B !"
Configuration.cc: In member function `s32
Core::Configuration::Resource::match(const std::vector<std::string,
std::allocator<std::string> >&) const':
Configuration.cc:115: error: `StringTokenizer' undeclared in namespace
`ost'
Configuration.cc:115: error: parse error before `(' token
Configuration.cc:116: error: no class template named `StringTokenizer' in
`ost'
Configuration.cc:116: error: parse error before `=' token
Configuration.cc:116: error: `token' undeclared (first use this function)
Configuration.cc:116: error: (Each undeclared identifier is reported only
once
for each function it appears in.)
Configuration.cc:116: error: `tokenizer' undeclared (first use this
function)
Configuration.cc: In member function `const Core::Configuration::Resource*
Core::Configuration::ResourceDataBase::find(const std::string&) const':
Configuration.cc:246: error: `StringTokenizer' undeclared in namespace
`ost'
Configuration.cc:246: error: parse error before `(' token
Configuration.cc:247: error: no class template named `StringTokenizer' in
`ost'
Configuration.cc:247: error: parse error before `=' token
make[2]: *** [.build/linux-intel-standard/Configuration.o] Error 1
make[2]: Leaving directory `/home/marcin/PTX/fsa/fsa-0.9.1/src/Core'
make[1]: *** [Core] Error 2
make[1]: Leaving directory `/home/marcin/PTX/fsa/fsa-0.9.1/src'
make: *** [build] Error 2
I'm not feeling up to look into that code. Maybe I'm making some trivial
mistake here, any suggestions?
Marcin Szkudlarek
Hi!
Archives for fsa will be disabled till Monday as I need to modify them
which in turn depends on the central system administration service
at RWTH Aachen.
Thanks for you patience,
Stephan Kanthak
--
GMX ProMail mit bestem Virenschutz http://www.gmx.net/de/go/mail
+++ Empfehlung der Redaktion +++ Internet Professionell 10/04 +++
Hi Emilian!
> I have a strange problem with composition with the fsa tool - It
> says that " 'some word' is not in second alphabet". What does it
> mean? Could you please have a look at the files that I'm trying to
> compose - a dictionary transducer and a language model transducer.
> I've also attached the error log.
>
> The command I use is
> fsa att:Dict_2003+.extended.fsm.txt att:lm.txt compose write att:LoG.txt
>
> P.S. What is "the second alphabet" anyway? The output or the input
> alphabet of the second transducer?
The "warnings" (yes, I should make that more explicit) appear due to the
fact that the lexicon contains more symbols than the language model and
FSA does NOT silently ignore that fact by default. Internally, you can
disable
those warnings. However, if you know that your lexicon contains more words
(look at the sizes of alphabets through "info") than you can safely ignore
them as FSA does exactly what you want. In your case I doubt that. I see at
least two mistakes:
1. FSA still interprets AT&T's format by assuming that 0 is the initial
state
(This behaviour is wrong. I will go and fix that soon).
2. Your lexicon contains 9589 symbols, but your language model has
only 3689 and I suggest that you wish to map unknown words to the
unknown class. FSA does not do this on its own unless you use the
failure symbol *FAIL* instead of UNK or use an intermediate
transducer
that maps lexicon words to lm words (you can use the map-fsa script
to automate that).
Cheers,
Stephan
--
NEU +++ DSL Komplett von GMX +++ http://www.gmx.net/de/go/dsl
GMX DSL-Netzanschluss + Tarif zum superg�nstigen Komplett-Preis!
Hi Stefan,
I have a strange problem with composition with the fsa tool - It
says that " 'some word' is not in second alphabet". What does it
mean? Could you please have a look at the files that I'm trying to
compose - a dictionary transducer and a language model transducer.
I've also attached the error log.
The command I use is
fsa att:Dict_2003+.extended.fsm.txt att:lm.txt compose write att:LoG.txt
Thanx in advance,
Emilian
P.S. What is "the second alphabet" anyway? The output or the input
alphabet of the second transducer?
Hi Stefan,
I tried to compile and run FSA on our Suse Linux x86_64 machine
(uname: Linux 2.6.4-52-smp #1 SMP Wed Apr 7 01:58:54 UTC 2004 x86_64
x86_64 x86_64 GNU/Linux) with a little succes. I got some warnings in
Unicode.cc, namely:
Unicode.cc: In member function `OutIterator
Core::UnicodeInputConverter::convert(const InIterator&, const InIterator&,
OutIterator) [with InIterator = const char*, OutIterator =
std::back_insert_iterator<std::vector<char, std::allocator<char> > >]':
Unicode.cc:199: instantiated from here
Unicode.cc:159: warning: invalid conversion from `const char**' to `char**'
Unicode.cc: In member function `OutIterator
Core::UnicodeInputConverter::convert(const InIterator&, const InIterator&,
OutIterator) [with InIterator = const char*, OutIterator =
std::back_insert_iterator<std::string>]':
Unicode.cc:204: instantiated from here
Unicode.cc:159: warning: invalid conversion from `const char**' to `char**'
Unicode.cc: In member function `OutIterator
Core::UnicodeOutputConverter::convert(const InIterator&, const InIterator&,
OutIterator) [with InIterator = const char*, OutIterator =
std::back_insert_iterator<std::vector<char, std::allocator<char> > >]':
Unicode.cc:266: instantiated from here
Unicode.cc:223: warning: invalid conversion from `const char**' to `char**'
Unicode.cc: In member function `OutIterator
Core::UnicodeOutputConverter::convert(const InIterator&, const InIterator&,
OutIterator) [with InIterator = const char*, OutIterator =
std::back_insert_iterator<std::string>]':
Unicode.cc:271: instantiated from here
Unicode.cc:223: warning: invalid conversion from `const char**' to `char**'
Unicode.cc: In member function `OutIterator
Core::UnicodeOutputConverter::convert(const InIterator&, const InIterator&,
OutIterator) [with InIterator = __gnu_cxx::__normal_iterator<const char*,
std::basic_string<char, std::char_traits<char>, std::allocator<char> > >,
OutIterator = std::ostream_iterator<char, char, std::char_traits<char> >]':
Unicode.cc:276: instantiated from here
Unicode.cc:223: warning: invalid conversion from `const char**' to `char**'
Unicode.cc: In member function `OutIterator
Core::UnicodeOutputConverter::convert(const InIterator&, const InIterator&,
OutIterator) [with InIterator = const char*, OutIterator =
std::ostream_iterator<char, char, std::char_traits<char> >]':
Unicode.cc:281: instantiated from here
Unicode.cc:223: warning: invalid conversion from `const char**' to `char**'
Unicode.cc: In member function `OutIterator
Core::UnicodeOutputConverter::convert(const InIterator&, const InIterator&,
OutIterator) [with InIterator = const char*, OutIterator =
std::ostreambuf_iterator<char, std::char_traits<char> >]':
Unicode.cc:286: instantiated from here
Unicode.cc:223: warning: invalid conversion from `const char**' to `char**'
Also, I had to explicitly typecast in the call to std::min
// u32 bufferSize = std::min(bufferThreshold_ - putBackBufferSize,
formatted_.size());
u32 bufferSize = std::min(size_t(bufferThreshold_ -
putBackBufferSize), formatted_.size());
Otherwise it complained that it cannot find std::min:
TextStream.cc: In member function `virtual int
Core::TextInputStream::Buffer::underflow()':
TextStream.cc:422: error: no matching function for call to `min(unsigned int,
size_t)'
Ignoring the warnings and modifying the code a bit got me the
executable, which unfortunately I couldn't run, because it gave me a
segmentation fault. Running gdb on
fsa.linux-x86_64-standard and backtracing printed this output:
(gdb) backtrace
#0 0x000000000048cb9d in Core::Choice::addChoice ()
#1 0x000000000048cdda in Core::Choice::Choice ()
#2 0x000000000049e184 in global constructors keyed to
_ZN4Core17AbstractParameterC2ERKS0_
()
#3 0x00000000004bccf6 in __do_global_ctors_aux ()
#4 0x00000000004352e3 in _init ()
#5 0x00000000004bcc20 in Core::XmlWriter::Buffer::~Buffer ()
#6 0x00000000004bcc91 in __libc_csu_init () at elf-init.c:60
#7 0x0000002a96264e05 in __libc_start_main () from /lib64/tls/libc.so.6
#8 0x0000000000435f7a in _start () at ../sysdeps/x86_64/elf/start.S:96
#9 0x0000007fbffff0a8 in ?? ()
#10 0x0000000000000000 in ?? ()
#11 0x0000000000000001 in ?? ()
#12 0x0000007fbffff3d0 in ?? ()
#13 0x0000000000000000 in ?? ()
#14 0x0000007fbffff40e in ?? ()
#15 0x0000007fbffff427 in ?? ()
#16 0x0000007fbffff437 in ?? ()
#17 0x0000007fbffff45f in ?? ()
#18 0x0000007fbffff4e1 in ?? ()
#19 0x0000007fbffff508 in ?? ()
#20 0x0000007fbffff515 in ?? ()
#21 0x0000007fbffff520 in ?? ()
#22 0x0000007fbffff530 in ?? ()
#23 0x0000007fbffff558 in ?? ()
#24 0x0000007fbffff590 in ?? ()
#25 0x0000007fbffff599 in ?? ()
#26 0x0000007fbffff5ac in ?? ()
#27 0x0000007fbffff5c0 in ?? ()
#28 0x0000007fbffff5db in ?? ()
#29 0x0000007fbffff5e5 in ?? ()
#30 0x0000007fbffff5f2 in ?? ()
#31 0x0000007fbffff787 in ?? ()
#32 0x0000007fbffff79d in ?? ()
#33 0x0000007fbffff7a8 in ?? ()
#34 0x0000007fbffff7b6 in ?? ()
#35 0x0000007fbffff7c3 in ?? ()
#36 0x0000007fbffff7cb in ?? ()
#37 0x0000007fbffff7d9 in ?? ()
#38 0x0000007fbffff7f0 in ?? ()
#39 0x0000007fbffffa13 in ?? ()
#40 0x0000007fbffffa1d in ?? ()
#41 0x0000007fbffffa3f in ?? ()
#42 0x0000007fbffffa54 in ?? ()
#43 0x0000007fbffffa7c in ?? ()
#44 0x0000007fbffffa98 in ?? ()
#45 0x0000007fbffffaa9 in ?? ()
...
Everything compiled and ran OK on a 32bit machine. Is there anything
that I can do to run FSA on the 64bit machine?
Best,
Emilian
Hi!
We are proud to announce the first maintainence release of fsa, namely
version 0.9.1. This release fixes:
* compilation problems on x86_64
* a crash on x86_64
* synchronous pruning
* reading / writing of AT&T's ASCII file format
It offers newly:
* simplified interface: removed dump method from Semiring class
* some more documentation in the README
* a NEWS file
Cheers,
Stephan Kanthak
--
GMX ProMail mit bestem Virenschutz http://www.gmx.net/de/go/mail
+++ Empfehlung der Redaktion +++ Internet Professionell 10/04 +++
hi Stephan, hi list,
On 15 December 2004 at 21:45:18, Stephan Kanthak appears to have written:
> At least some success. You are the first one reporting successful compilation
> outside our institute....
well, here's another success report: compilation works for me on
debian unstable (gcc 3.3.5), although i've had no success reading
at&t format files (at&t format output works fine though) -- i haven't
looked at the code yet, so if this is a known bug, just ignore me.
that said, the fsa library looks like a great toolkit: thanks for
releasing it and for getting these lists up and running so quickly!
marmosets,
Bryan