+**********************************************************************
*
*
* Einladung
*
*
*
* Informatik-Oberseminar
*
*
*
+**********************************************************************
Zeit: Freitag, 12. Juli 2019, 10.00 Uhr
Ort: Informatikzentrum, E3, Raum 9222
Referent: Dipl.-Inform. Malte Nuhn
Thema: Unsupervised Training with Applications in Natural Language
Processing//
Abstract:
The state-of-the-art algorithms for various natural language processing
tasks require large amounts of labeled training data. At the same time,
obtaining labeled data of high quality is often the most costly step in
setting up natural language processing systems.Opposed to this,
unlabeled data is much cheaper to obtain and available in larger
amounts.Currently, only few training algorithms make use of unlabeled
data. In practice, training with only unlabeled data is not performed at
all. In this thesis, we study how unlabeled data can be used to train a
variety of models used in natural language processing. In particular, we
study models applicable to solving substitution ciphers, spelling
correction, and machine translation. This thesis lays the groundwork for
unsupervised training by presenting and analyzing the corresponding
models and unsupervised training problems in a consistent manner.We show
that the unsupervised training problem that occurs when breaking
one-to-one substitution ciphers is equivalent to the quadratic
assignment problem (QAP) if a bigram language model is incorporated and
therefore NP-hard. Based on this analysis, we present an effective
algorithm for unsupervised training for deterministic substitutions. In
the case of English one-to-one substitution ciphers, we show that our
novel algorithm achieves results close to human performance, as
presented in [Shannon 49].
Also, with this algorithm, we present, to the best of our knowledge, the
first automatic decipherment of the second part of the Beale
ciphers.Further, for the task of spelling correction, we work out the
details of the EM algorithm [Dempster & Laird + 77] and experimentally
show that the error rates achieved using purely unsupervised training
reach those of supervised training.For handling large vocabularies, we
introduce a novel model initialization as well as multiple training
procedures that significantly speed up training without hurting the
performance of the resulting models significantly.By incorporating an
alignment model, we further extend this model such that it can be
applied to the task of machine translation. We show that the true
lexical and alignment model parameters can be learned without any
labeled data: We experimentally show that the corresponding likelihood
function attains its maximum for the true model parameters if a
sufficient amount of unlabeled data is available. Further, for the
problem of spelling correction with symbol substitutions and local
swaps, we also show experimentally that the performance achieved with
purely unsupervised EM training reaches that of supervised training.
Finally, using the methods developed in this thesis, we present results
on an unsupervised training task for machine translation with a ten
times larger vocabulary than that of tasks investigated in previous work.
Es laden ein: die Dozentinnen und Dozenten der Informatik
_______________________________________________
--
--
Stephanie Jansen
Faculty of Mathematics, Computer Science and Natural Sciences
HLTPR - Human Language Technology and Pattern Recognition
RWTH Aachen University
Ahornstraße 55
D-52074 Aachen
Tel. Frau Jansen: +49 241 80-216 06
Tel. Frau Andersen: +49 241 80-216 01
Fax: +49 241 80-22219
sek(a)i6.informatik.rwth-aachen.de
www.hltpr.rwth-aachen.de
Tel: +49 241 80-216 01/06
Fax: +49 241 80-22219
sek(a)i6.informatik.rwth-aachen.de
www.hltpr.rwth-aachen.de
Dear members of the Computer Science Department,
You are cordially invited to the talk of Kuldeep Meel (NUS Singapore)<https://www.unravel.rwth-aachen.de/cms/UnRAVeL/Das-Graduiertenkolleg/Gastwi…> for next week, 2,12., 10:30.
Title: Sparse Hashing for Scalable Approximate Model Counting: When Theory and Practice Finally Meet
Given a Boolean formula F, the problem of model counting, also referred to as #SAT, is to compute the number of solutions of F. The hashing-based techniques for approximatecounting have emerged as a dominant approach, promising achievement of both scalability and rigorous theoretical guarantees. The standard construction of strongly 2-universal hash functions employs dense XORs (i.e., involving half of the variables in expectation), which is widely known to cause degradation in the runtime performance of state of the art SAT solvers.
Consequently, the past few years have witnessed an intense activity in the design of sparse XORs as hash functions. In this talk, we will first survey the known results, and identify the crucial bottleneck contributing to their lack of ability to provide speedups. We will then formalize a relaxation of universal hashing, called concentrated hashing, and establish a novel and beautiful connection between concentration measures of these hash functions and isoperimetric inequalities on boolean hypercubes. This allows us to obtain tight bounds on variance as well as the dispersion index and show that logarithmically sized XORs suffices for the design of sparse hash functions belonging concentrated hash family. Finally, we use sparse hash functions belonging to this concentrated hash family to develop new approximate counting algorithms. Our comprehensive experimental evaluation of our algorithm on 1896 benchmarks with computational effort of over 20,000 computational hours demonstrates speedup compared to existing approaches. To the best of our knowledge, this work is the first study to demonstrate runtime improvement of approximate model counting algorithms through the usage of sparse hash functions, while still retaining strong theoretical guarantees.
(Based on Joint work with S. Akshay, D. Agarwal, and Bhavishya; The corresponding papers were published at SAT-20 and LICS-20)
Bio:
Kuldeep Meel is Sung Kah Kay Assistant Professor of Computer Science in School of Computing at the National University of Singapore. He received his Ph.D. (2017) and M.S. (2014) degree from Rice University, and B. Tech. (with Honors) degree (2012) in Computer Science and Engineering from Indian Institute of Technology, Bombay. He is a recipient of 2019 NRF Fellowship for AI. His research interests lie at the intersection of Artificial Intelligence and Formal Methods. His work received the 2018 Ralph Budd Award for Best PhD Thesis in Engineering, 2014 Outstanding Masters Thesis Award from Vienna Center of Logic and Algorithms and Best Student Paper Award at CP 201
Wednesday, 2.12.2020, 10:30
https://rwth.zoom.us/j/92047949381?pwd=LzIwUW96WEM0MkRjZ01FUmhwd1I3QT09<https://www.google.com/url?q=https://rwth.zoom.us/j/92047949381?pwd%3DLzIwU…>
Meeting ID: 920 4794 9381
Password: unravel
https://www.unravel.rwth-aachen.de/go/id/kfsgc
Best regards
Helen Bolke-Hermanns
Helen Bolke-Hermanns
RTG UnRAVeL - RWTH Aachen University
Ahornstr. 55, D-52074 Aachen
Building E3, 2nd floor, Room 9218
Telefon: +49 (241) 80-21 004
Fax: +49 (241) 80-22 215
E-Mail: Helen.Bolke-Hermanns(a)Informatik.RWTH-Aachen.de<mailto:Helen.Bolke-Hermanns@Informatik.RWTH-Aachen.de>
Web: www.unravel.rwth-aachen.de<http://www.unravel.rwth-aachen.de/>
[rwth_unravel_en_rgb]
Liebe IT-Studierende,
alle von euch werden während des Studiums vermutlich schon etwas über agile Methoden erfahren haben. Jetzt habt ihr die Möglichkeit Praxiserfahrung zu sammeln und agile Methoden in der Praxis anzuwenden. Ich würde euch gerne herzlich zum kostenlosen ONLINE Agile Programming Training bei andrena objects einladen:
Agile Programming Training - Online (aus Köln) - 04.01.-08.01.2021
Was erwartet euch?
- Agile Methoden und andrena objects kennenlernen
- Interaktive Scrum-Sprints in Gruppen
- Netzwerken und Austausch über Einstiegsmöglichkeiten
- Kostenloses Teilnehmerzertifikat für eure Bewerbungsunterlagen
Um eine kurze Anmeldung wird gebeten unter:
https://ws4s.de/veranstaltungen/online-agile-development-training/
Die Plätze sind begrenzt und in der Regel auch sehr schnell ausgebucht. Sichert euch jetzt einen der begrenzten Plätze! :-)
Wer ist andrena objects?
andrena objects ist ein mittelständisches Softwareunternehmen und einer der Vorreiter der agilen Softwareentwicklung. Bei diesem besonderen fünfttägigen Training bekommt ihr tiefe Einblicke in die Methoden der agilen Softwareentwicklung und den Arbeitsalltag bei andrena objects. Ihr durchlauft innerhalb dieser Tage in Teams (und mit Unterstützung von Experten) vier beispielhafte Scrum-Sprints und lernt dabei Arbeitspraktiken aus Scrum, Extreme Programming und der Projektpraxis von andrena und SAP kennen.
Bei Fragen zögert nicht, mir einfach direkt zu schreiben.
Viele Grüße,
Marcel vom Workshops4Students-Team
_____________________________
Co-Founder Workshops4Students
E-Mail: studenten(a)ws4s.de mailto:studenten@ws4s.de
Web: http://www.ws4s.de
This is a short reminder.
Prof. Gabriele Kern-Isberner (TU Dortmund)
Title: Towards Lifted Inference: Counting Strategies for Relational Maximum Entropy Reasoning
Abstract:
The principle of maximum entropy (MaxEnt) constitutes a meaningful methodology for drawing non monotonic inferences from probabilistic conditional knowledge as it satisfies some fundamental principles from commonsense reasoning. Similar to alternative approaches to probabilistic reasoning in relational settings, straightforward maximum entropy computations suffer from an exponential dependence from the size of the underlying domain which can lead to intractability in many cases. To overcome this problem, we adopt techniques from weighted first-order model counting (WFOMC) which exploit symmetries and interchangeabilities among the domain elements in order to count models of sentences more efficiently. To meet the requirements of our formalization of knowledge by conditionals, we assign a type to the counted models which captures the three-valued evaluation of the conditionals, i.e. the conditional structure of the models. We present the resulting variant of model counting which we call 'typed model counting' and discuss its benefits by means of some illustrating examples.
https://www.informatik.rwth-aachen.de/go/id/jgfze
Today, 10:30
https://rwth.zoom.us/j/92047949381?pwd=LzIwUW96WEM0MkRjZ01FUmhwd1I3QT09<https://www.google.com/url?q=https://rwth.zoom.us/j/92047949381?pwd%3DLzIwU…>
Meeting ID: 920 4794 9381
Password: unravel
Helen Bolke-Hermanns
Fachgruppe Informatik
RWTH Aachen University
Ahornstr. 55, D-52074 Aachen
Building E3, 2nd floor, Room 9218
Telefon: +49 (241) 80-21-004
Fax: +49 (241) 80-22 215
E-Mail: Helen.Bolke-Hermanns(a)Informatik.RWTH-Aachen.de<mailto:Helen.Bolke-Hermanns@Informatik.RWTH-Aachen.de>
[rwth_informatik_bild_rgb]
+**********************************************************************
*
*
* Einladung
*
*
*
* Informatik-Oberseminar
*
*
*
+**********************************************************************
Zeit: Montag, 16. November 2020, 13:00 Uhr
Zoom:
https://rwth.zoom.us/j/94172280001?pwd=K2Z6Z1ZHbUVVdDBDbjNSWVBVUTZoZz09
Meeting ID: 941 7228 0001
Passcode: N.gdJ5
Referent: Jan Dreier, M.Sc.
Lehr- und Forschungsgebiet Theoretische Informatik
Thema: Two New Perspectives on Algorithmic Meta-Theorems
Abstract:
We present two algorithmic meta-theorems. Our first meta-theorem is an
approximation scheme of the parameterized model-checking problem for the
first-order counting logic FO({>0}). This logic extends first-order
logic with the ability to count the number of satisfying assignments of
a variable and compare it to fixed but arbitrarily large numbers. The
model-checking scheme runs in linear fpt time (parameterized by formula
length) on structures with bounded expansion and either gives the
correct answer or says ``I do not know.'' The latter answer may only be
given if small perturbations in the number-symbols of the formula could
make it both satisfied and unsatisfied. As a consequence, every
property expressible in FO({>0}) can be approximated in linear time on
bounded expansion graph classes. A restricted set of counting properties
can also be solved exactly in linear time, leading for example to an fpt
algorithm for partial dominating set on bounded expansion graph classes.
These results are complemented by showing that exactly solving the
model-checking problem for FO({>0}) is already hard on trees of bounded
depth and just slightly increasing the expressiveness of FO({>0}) makes
even approximation hard on trees.
Our second meta-theorem is a first-order model-checking algorithm for
random graphs and complex networks. Roughly speaking, we say a random
graph model is alpha-power-law-bounded if it is unclustered and the
fraction of vertices with degree k is O(k^-alpha). We solve the
parameterized first-order model-checking problem in almost linear fpt
time on random graph models satisfying this property with alpha >= 3.
This means in particular that one can solve every first-order property
in almost linear expected time on these random graph models. This
includes many popular models of complex networks such as preferential
attachment graphs, Chung–Lu graphs, configuration graphs, and sparse
Erdős–Rényi graphs. We present hardness results that show intractability
for alpha < 3, assuming we allow an adversary to color the vertices of
the input graph. We base our algorithms on a new structural
decomposition of alpha-power-law-bounded random graphs and on new
concentration bounds for the degree evolution in preferential attachment
graphs.
Es laden ein: die Dozentinnen und Dozenten der Informatik
--
Jan Dreier
RWTH Aachen University
https://tcs.rwth-aachen.de/~dreier/