2009-17: Learning Communicating and Nondeterministic Automata

22 Sep 2009

      The following technical report is available from
http://aib.informatik.rwth-aachen.de:

Learning Communicating and Nondeterministic Automata
Carsten Kern
AIB 2009-17

The results of this dissertation are two-fold. On the one hand,
inductive learning techniques are extended and two new inference
algorithms for inferring nondeterministic, and universal,
respectively, finite-state automata are presented. On the other hand,
certain learning techniques are employed and enhanced to
semi-automatically infer communicating automata (also called design
models in the software development cycle). For both topics,
theoretical results on the feasibility of the approaches, as well as
an implementation are presented which, in both cases, support our
theory.

Concerning the first objective to derive a so-called active online
learning algorithm for nondeterministic finite-state automata (NFA),
we present, in analogy to Angluin's famous learning algorithm L* for
deterministic finite-state automata (DFA), a version for inferring a
certain subclass of NFA. The automata from this class are called
residual finite-state automata (RFSA). It was shown by Denis et
al. that there is an exponential gap between the size of minimal DFA
and their corresponding minimal RFSA. Even if there are also cases
where the canonical (i.e., minimal) RFSA is exponentially larger than
a corresponding minimal NFA, we show that the new learning
algorithm---called NL*---is a great improvement compared to L* as the
inferred canonical RFSA has always at most the size of the
corresponding minimal DFA but is usually even considerably smaller and
more easy to learn. Unlike a learning procedure developed by Denis et
al.---called DeLeTe2---our algorithm is capable of deriving canonical
RFSA. Like L*, the new algorithm will be applicable in many fields
including pattern recognition, computational linguistics and biology,
speech recognition, and verification. From our point of view, NL*
might especially play a major role in the area of formal verification
where the size of the models that are processed is of enormous
importance and nondeterminism not regarded an unpleasant property.

The second objective of this thesis is to create a method for
inferring distributed design models (CFMs) from a given set of
requirements specified as scenarios (message sequence charts). The
main idea is to extend the L* algorithm to cope with valid and invalid
sets of system runs and, after some iterations, come up with an
intermediate design model (a DFA) which exhibits features that make it
distributable into communicating components (or processes) interacting
via FIFO channels. Theoretical results on which classes of CFMs are
learnable in which time-complexity bounds are presented. We also
developed a tool implementation called Smyle, realizing important
theoretical results evolving from this part of the thesis. Based on
this learning formalism we also derive a software-engineering
lifecycle model called the Smyle Modeling Approach in which we
embedded our learning approach.

Additionally, we launched a project for a new learning library called
libalf which includes most of the learning algorithms (and their
extensions) mentioned in this thesis. We hope that, due to its
continuously increasing functionality, libalf will find broad
acceptance among researchers, and that it will be the starting point
for an extensive project of different research groups which will
employ libalf, or augment the library with new algorithms.

Carsten Fuhs

tags

participants (1)