March 2020 - informatik-vortraege - lists.rwth-aachen.de

Einladung Informatik-Oberseminar Malte Nuhn
by Sekretariat I6 30 May '22

30 May '22

+********************************************************************** * * * Einladung * * * * Informatik-Oberseminar * * * +********************************************************************** Zeit: Freitag, 12. Juli 2019, 10.00 Uhr Ort: Informatikzentrum, E3, Raum 9222 Referent: Dipl.-Inform. Malte Nuhn Thema: Unsupervised Training with Applications in Natural Language Processing// Abstract: The state-of-the-art algorithms for various natural language processing tasks require large amounts of labeled training data. At the same time, obtaining labeled data of high quality is often the most costly step in setting up natural language processing systems.Opposed to this, unlabeled data is much cheaper to obtain and available in larger amounts.Currently, only few training algorithms make use of unlabeled data. In practice, training with only unlabeled data is not performed at all. In this thesis, we study how unlabeled data can be used to train a variety of models used in natural language processing. In particular, we study models applicable to solving substitution ciphers, spelling correction, and machine translation. This thesis lays the groundwork for unsupervised training by presenting and analyzing the corresponding models and unsupervised training problems in a consistent manner.We show that the unsupervised training problem that occurs when breaking one-to-one substitution ciphers is equivalent to the quadratic assignment problem (QAP) if a bigram language model is incorporated and therefore NP-hard. Based on this analysis, we present an effective algorithm for unsupervised training for deterministic substitutions. In the case of English one-to-one substitution ciphers, we show that our novel algorithm achieves results close to human performance, as presented in [Shannon 49]. Also, with this algorithm, we present, to the best of our knowledge, the first automatic decipherment of the second part of the Beale ciphers.Further, for the task of spelling correction, we work out the details of the EM algorithm [Dempster & Laird + 77] and experimentally show that the error rates achieved using purely unsupervised training reach those of supervised training.For handling large vocabularies, we introduce a novel model initialization as well as multiple training procedures that significantly speed up training without hurting the performance of the resulting models significantly.By incorporating an alignment model, we further extend this model such that it can be applied to the task of machine translation. We show that the true lexical and alignment model parameters can be learned without any labeled data: We experimentally show that the corresponding likelihood function attains its maximum for the true model parameters if a sufficient amount of unlabeled data is available. Further, for the problem of spelling correction with symbol substitutions and local swaps, we also show experimentally that the performance achieved with purely unsupervised EM training reaches that of supervised training. Finally, using the methods developed in this thesis, we present results on an unsupervised training task for machine translation with a ten times larger vocabulary than that of tasks investigated in previous work. Es laden ein: die Dozentinnen und Dozenten der Informatik _______________________________________________ -- -- Stephanie Jansen Faculty of Mathematics, Computer Science and Natural Sciences HLTPR - Human Language Technology and Pattern Recognition RWTH Aachen University Ahornstraße 55 D-52074 Aachen Tel. Frau Jansen: +49 241 80-216 06 Tel. Frau Andersen: +49 241 80-216 01 Fax: +49 241 80-22219 sek(a)i6.informatik.rwth-aachen.de www.hltpr.rwth-aachen.de Tel: +49 241 80-216 01/06 Fax: +49 241 80-22219 sek(a)i6.informatik.rwth-aachen.de www.hltpr.rwth-aachen.de

3 15

Einladung: Informatik Oberseminar Markus Hoehnerbach
by Markus Hoehnerbach 19 Mar '20

19 Mar '20

+********************************************************************** * * * Einladung * * * * Informatik-Oberseminar * * * +********************************************************************** Zeit: Montag, 23. Maerz 2020, 10.00 Uhr Ort: Raum 115, Rogowski-Gebaeude, Referent: Markus Hoehnerbach M.Sc. High-Performance and Automatic Computing Thema: A Framework for the Vectorization of Molecular Dynamics Kernels Abstract: We introduce a domain-specific language (DSL) for many-body potentials, which are used in molecular dynamics (MD) simulations in the area of materials science. We also introduce a compiler to translate the DSL into high-performance code suitable for modern supercomputers. We begin by studying ways to speedup up potentials on supercomputers using two case studies: The Tersoff and the AIREBO potentials. In both case studies, we identify a number of optimizations, both domain-specific and general, to achieve speedups of up to 5x; we also introduce a method to keep the resulting code performance portable. During the AIREBO case study, we also discover that the existing code contains a number of errors. This experience motivates us to include the derivation step, the most error-prone step in manual optimization, in our automation effort. After having identified beneficial optimization techniques, we create a ``potential compiler'', short PotC, which generates fully-usable performance-portable potential implementations from specifications written in our DSL. DSL code is significantly shorter (20x to 30x) than a manual code, reducing both manual work and opportunities to introduce bugs. We present performance results on five different platforms: Three CPU platforms (Broadwell, Knights Landing, and Skylake) and two GPU platforms (Pascal and Volta). While the performance in some cases remains far below that of hand-written code, it also manages to match or exceed manually written implementations in other cases. For these cases, we achieve speedups of up to 9x compared to non-vectorized code. Es laden ein: die Dozentinnen und Dozenten der Informatik

1 1

Einladung: Informatik-Oberseminar Sandra Kiefer
by Sandra Kiefer 03 Mar '20

03 Mar '20

+********************************************************************** * * * Einladung * * * * Informatik-Oberseminar * * * +********************************************************************** Zeit: Montag, 16. März 2020, 10:45 Uhr Ort: Raum 9222, Gebäude E3, Ahornstr. 55 Referentin: Sandra Kiefer, M.Sc. Lehrstuhl Informatik 7 Thema: Power and Limits of the Weisfeiler-Leman Algorithm Abstract: The Weisfeiler-Leman (WL) algorithm is a fundamental combinatorial procedure used to classify graphs and other relational structures. Through its connections to many research areas such as logics and machine learning, surprising characterisations of the algorithm have been discovered. We combine some of these to obtain powerful proof techniques. For every k, the k-dimensional version of the algorithm (k-WL) iteratively computes a stable colouring of the vertex k-tuples of the input graph. The larger k, the more powerful k-WL becomes with respect to the distinguishability of graphs. We have studied two central parameters of the algorithm, its number of iterations until stabilisation and its dimension. The results enable a precise understanding of 1-WL, namely we have determined its iteration number and have developed a complete characterisation of the graphs for which 1-WL correctly decides isomorphism. In higher dimensions, however, the situation is different. For example, it is often not clear at all how to decide if k-WL distinguishes two particular graphs. By our results, 3-WL identifies every planar graph, which drastically improves upon all previously known bounds. Generalising this insight, we obtain the first explicit parametrisation of the WL dimension by the Euler genus of the input graph. Es laden ein: die Dozentinnen und Dozenten der Informatik

1 0