December 2019 - informatik-vortraege

Einladung Informatik-Oberseminar Malte Nuhn
by Sekretariat I6 30 May '22

30 May '22

+********************************************************************** * * * Einladung * * * * Informatik-Oberseminar * * * +********************************************************************** Zeit: Freitag, 12. Juli 2019, 10.00 Uhr Ort: Informatikzentrum, E3, Raum 9222 Referent: Dipl.-Inform. Malte Nuhn Thema: Unsupervised Training with Applications in Natural Language Processing// Abstract: The state-of-the-art algorithms for various natural language processing tasks require large amounts of labeled training data. At the same time, obtaining labeled data of high quality is often the most costly step in setting up natural language processing systems.Opposed to this, unlabeled data is much cheaper to obtain and available in larger amounts.Currently, only few training algorithms make use of unlabeled data. In practice, training with only unlabeled data is not performed at all. In this thesis, we study how unlabeled data can be used to train a variety of models used in natural language processing. In particular, we study models applicable to solving substitution ciphers, spelling correction, and machine translation. This thesis lays the groundwork for unsupervised training by presenting and analyzing the corresponding models and unsupervised training problems in a consistent manner.We show that the unsupervised training problem that occurs when breaking one-to-one substitution ciphers is equivalent to the quadratic assignment problem (QAP) if a bigram language model is incorporated and therefore NP-hard. Based on this analysis, we present an effective algorithm for unsupervised training for deterministic substitutions. In the case of English one-to-one substitution ciphers, we show that our novel algorithm achieves results close to human performance, as presented in [Shannon 49]. Also, with this algorithm, we present, to the best of our knowledge, the first automatic decipherment of the second part of the Beale ciphers.Further, for the task of spelling correction, we work out the details of the EM algorithm [Dempster & Laird + 77] and experimentally show that the error rates achieved using purely unsupervised training reach those of supervised training.For handling large vocabularies, we introduce a novel model initialization as well as multiple training procedures that significantly speed up training without hurting the performance of the resulting models significantly.By incorporating an alignment model, we further extend this model such that it can be applied to the task of machine translation. We show that the true lexical and alignment model parameters can be learned without any labeled data: We experimentally show that the corresponding likelihood function attains its maximum for the true model parameters if a sufficient amount of unlabeled data is available. Further, for the problem of spelling correction with symbol substitutions and local swaps, we also show experimentally that the performance achieved with purely unsupervised EM training reaches that of supervised training. Finally, using the methods developed in this thesis, we present results on an unsupervised training task for machine translation with a ten times larger vocabulary than that of tasks investigated in previous work. Es laden ein: die Dozentinnen und Dozenten der Informatik _______________________________________________ -- -- Stephanie Jansen Faculty of Mathematics, Computer Science and Natural Sciences HLTPR - Human Language Technology and Pattern Recognition RWTH Aachen University Ahornstraße 55 D-52074 Aachen Tel. Frau Jansen: +49 241 80-216 06 Tel. Frau Andersen: +49 241 80-216 01 Fax: +49 241 80-22219 sek(a)i6.informatik.rwth-aachen.de www.hltpr.rwth-aachen.de Tel: +49 241 80-216 01/06 Fax: +49 241 80-22219 sek(a)i6.informatik.rwth-aachen.de www.hltpr.rwth-aachen.de

3 15

Einladung: Informatik-Kolloquium Michael Schaub (08.01.2020, 14:00)
by Thomas Noll 13 Dec '19

13 Dec '19

*********************************************************************** * * * Einladung * * * Informatik-Kolloquium * * *********************************************************************** Zeit: Mittwoch, 8. Januar 2020, 14:00 Uhr Ort: Raum 9222, Gebäude E3, Informatikzentrum Referent: Michael Schaub University of Oxford, UK and MIT, USA Thema: Data Science for Networks Abstract: Networks have become a widely adopted model for a range of systems, cutting across Science and Engineering. However, our theoretical understanding of many fundamental phenomena that arise in complex networks and networked systems is still limited. My vision is to develop a data science for networks and dynamical systems that will contribute to addressing this challenge, by combining data-driven and model-based approaches, using the language of graphs and networks. In this talk, we will give a brief overview of such a Data Science for Networks. We first discuss how networks appear naturally within models in different research domains and illustrate the underlying scientific questions via examples drawn from applications. We then examine in some detail the problem of feature learning from graphs with unobserved edges, in which we aim to learn certain aspects of a graph solely from dynamical observations on its nodes, without knowledge of the edge-set of the graph. We conclude with a brief outlook on future challenges and open problems. Es laden ein: die Dozentinnen und Dozenten der Informatik

1 0

[Informatik-assistenten] Einladung: Informatik-Oberseminar Christoph Matheja
by Matheja, Christoph 11 Dec '19

11 Dec '19

+********************************************************************** * * * Einladung * * * * Informatik-Oberseminar * * * +********************************************************************** Zeit: Donnerstag, 9. Januar 2019, 14.00 Uhr Ort: Informatikzentrum, E3, Raum 9222 Referent: Christoph Matheja, M.Sc. Lehrstuhl für Informatik 2 (Software Modeling and Verification) Thema: Automated Reasoning and Randomization in Separation Logic Abstract: We study three aspects of program verification with separation logic: 1. Reasoning about quantitative properties, such as the probability of memory-safe termination, of randomized heap-manipulating programs. 2. Automated reasoning about the robustness of and entailments between formulas in the symbolic heap fragment of separation logic itself. 3. Automated reasoning about pointer programs by combining abstractions based on separation logic with the above techniques and model checking. Regarding the first item, we extend separation logic to reason about quantities, which evaluate to real numbers, instead of predicates, which evaluate to Boolean values. Based on the resulting quantitative separation logic, we develop a weakest precondition calculus à la Dijkstra for quantitative reasoning about randomized heap-manipulating programs. We show that this calculus is a sound and conservative extension of both separation logic and McIver and Morgan's weakest preexpectations which preserves virtually all properties of classical separation logic. We demonstrate its applicability by several case studies. Regarding the second item, we develop an algorithmic framework based on heap automata to compositionally check robustness properties, e.g., satisfiability or acyclicity, of symbolic heaps with inductive predicate definitions. We consider two approaches to discharge entailments for fragments of separation logic. In particular, this includes a pragmatic decision procedure with nondeterministic polynomial-time complexity for entailments between graphical symbolic heaps. Regarding the third item, we introduce Attestor - an automated verification tool for analyzing Java programs operating on dynamic data structures. The tool involves the generation of an abstract state space employing inductive predicate definitions in separation logic. Properties of individual states are defined by heap automata. LTL model checking is then applied to this state space, supporting the verification of both structural and functional correctness properties. Attestor is fully automated, procedure modular, and provides informative visual feedback including counterexamples for violated properties. Es laden ein: die Dozentinnen und Dozenten der Informatik

1 0

Einladung: Informatik-Oberseminar Daniel Neuen
by Daniel Neuen 03 Dec '19

03 Dec '19

+********************************************************************** * * * Einladung * * * * Informatik-Oberseminar * * * +********************************************************************** Zeit: Dienstag, 17. Dezember 2019, 12.00 Uhr Ort: Raum 5053.2 (großer Bit-Raum), Ahornstr. 55 Referent: Daniel Neuen M.Sc. Lehrstuhl Informatik 7 Thema: The Power of Algorithmic Approaches to the Graph Isomorphism Problem Abstract: The Graph Isomorphism Problem asks, given two input graphs, whether they are structurally the same, that is, whether there is a renaming of the vertices of the first graph in order to transform it to the second graph. By a recent breakthrough result of Babai, this problem can be solved in quasipolynomial time. However, despite extensive research efforts, it remains one of only few natural problems in NP that are neither known to be solvable in polynomial time nor known to be NP-complete. Over the past five decades several powerful techniques tackling the Graph Isomorphism Problem have been investigated uncovering various surprising links between different approaches. One of the most fundamental methods in the context of graph isomorphism testing is the Weisfeiler-Leman algorithm and the related paradigm of individualization and refinement. The latter is in particular employed in all practical tools tackling the isomorphism problem. We present new upper and lower bounds on the power of such purely combinatorial approaches. For example, this leads to the first exponential lower bounds on the worst-case complexity of all state-of-the-art isomorphism tools used in practice. A second crucial approach to the Graph Isomorphism Problem is based on group-theoretic techniques. In this direction, one of the algorithmic cornerstones is Luks polynomial time algorithm for testing isomorphism of bounded degree graphs. Adapting the novel group-theoretic methods by Babai developed for his quasipolynomial time isomorphism test we give an isomorphism test for graphs of maximum degree d with a running time bounded by a polynomial of degree polylogarithmic in d. As a consequence of this result we also obtain an improved fpt algorithm for testing isomorphism of graphs of bounded tree-width. Es laden ein: die Dozentinnen und Dozenten der Informatik

1 0