informatik-vortraege April 2022

informatik-vortraege@lists.rwth-aachen.de

3 Teilnehmer
6 Diskussionen

Einladung Informatik-Oberseminar Malte Nuhn

von Sekretariat I6

+********************************************************************** * * * Einladung * * * * Informatik-Oberseminar * * * +********************************************************************** Zeit: Freitag, 12. Juli 2019, 10.00 Uhr Ort: Informatikzentrum, E3, Raum 9222 Referent: Dipl.-Inform. Malte Nuhn Thema: Unsupervised Training with Applications in Natural Language Processing// Abstract: The state-of-the-art algorithms for various natural language processing tasks require large amounts of labeled training data. At the same time, obtaining labeled data of high quality is often the most costly step in setting up natural language processing systems.Opposed to this, unlabeled data is much cheaper to obtain and available in larger amounts.Currently, only few training algorithms make use of unlabeled data. In practice, training with only unlabeled data is not performed at all. In this thesis, we study how unlabeled data can be used to train a variety of models used in natural language processing. In particular, we study models applicable to solving substitution ciphers, spelling correction, and machine translation. This thesis lays the groundwork for unsupervised training by presenting and analyzing the corresponding models and unsupervised training problems in a consistent manner.We show that the unsupervised training problem that occurs when breaking one-to-one substitution ciphers is equivalent to the quadratic assignment problem (QAP) if a bigram language model is incorporated and therefore NP-hard. Based on this analysis, we present an effective algorithm for unsupervised training for deterministic substitutions. In the case of English one-to-one substitution ciphers, we show that our novel algorithm achieves results close to human performance, as presented in [Shannon 49]. Also, with this algorithm, we present, to the best of our knowledge, the first automatic decipherment of the second part of the Beale ciphers.Further, for the task of spelling correction, we work out the details of the EM algorithm [Dempster & Laird + 77] and experimentally show that the error rates achieved using purely unsupervised training reach those of supervised training.For handling large vocabularies, we introduce a novel model initialization as well as multiple training procedures that significantly speed up training without hurting the performance of the resulting models significantly.By incorporating an alignment model, we further extend this model such that it can be applied to the task of machine translation. We show that the true lexical and alignment model parameters can be learned without any labeled data: We experimentally show that the corresponding likelihood function attains its maximum for the true model parameters if a sufficient amount of unlabeled data is available. Further, for the problem of spelling correction with symbol substitutions and local swaps, we also show experimentally that the performance achieved with purely unsupervised EM training reaches that of supervised training. Finally, using the methods developed in this thesis, we present results on an unsupervised training task for machine translation with a ten times larger vocabulary than that of tasks investigated in previous work. Es laden ein: die Dozentinnen und Dozenten der Informatik _______________________________________________ -- -- Stephanie Jansen Faculty of Mathematics, Computer Science and Natural Sciences HLTPR - Human Language Technology and Pattern Recognition RWTH Aachen University Ahornstraße 55 D-52074 Aachen Tel. Frau Jansen: +49 241 80-216 06 Tel. Frau Andersen: +49 241 80-216 01 Fax: +49 241 80-22219 sek(a)i6.informatik.rwth-aachen.de www.hltpr.rwth-aachen.de Tel: +49 241 80-216 01/06 Fax: +49 241 80-22219 sek(a)i6.informatik.rwth-aachen.de www.hltpr.rwth-aachen.de

2 Jahre, 1 Monat

TOMORROW UnRAVeL Survey Lecture "What's New in UnRAVeL?"

von Tim Seppelt

Dear all, this is a reminder for Nils Nießen's talk on Acceptance of Driverless Trains <https://www.unravel.rwth-aachen.de/go/id/taywq?#aaaaaaaaaatayyk> taking place *tomorrow at 16:30* in room 5053.2 and on Zoom. Please find the details below. > Digitalisation and automation are also making progress in rail > transportation. In isolated networks, such as metros, trains can > already run driverless today. The talk will highlight the > opportunities and risks of driverless driving on rail. > > A novel system can only be successfully implemented if it is also > accepted by the users. One focus of the talk will therefore be the > analysis of passenger acceptance of driverless rail transport. > Part of the programme of the research training group UnRAVeL is a series of introductory lectures on the topics of "randomness" and "uncertainty" in UnRAVeL’s research thrusts: Algorithms and complexity, verification, logic and languages, and their application scenarios. The main aim is to provide doctoral researchers as well as master students a broad overview of the subjects of UnRAVeL. Science undergoes continuous change and lives from the constant quest for novel and better results, which are presented at conferences and in journals. This year, 10 UnRAVeL professors will present some of their most recent research successes. Everyone interested, in particular doctoral researchers and master students, are invited to attend the UnRAVeL lecture series 2022 and engage in discussions with the researchers. The talks take place on Tuesdays, 16:30–18:00 in room 5053.2 in the ground floor of building E2. All events are hybrid. To join remotely, please use https://rwth.zoom.us/j/96003885007?pwd=aUczMVdVU0ZXVGtQUFpwQnJHQUFhUT09 / Meeting ID: 960 0388 5007 / Passcode: 273710 Please find a list of all upcoming talks on the UnRAVeL website <https://www.unravel.rwth-aachen.de/cms/UnRAVeL/Studium/~pzix/Ringvorlesung-…> and below: * 26/04/2022 Nils Nießen: Acceptance of Driverless Trains * 03/05/2022 Jürgen Giesl: Improving Automatic Complexity Analysis of Probabilistic and Non-Probabilistic Integer Programs * 10/05/2022 Martin Grohe: Graph Representations Based on Homomorphisms * 17/05/2022 Christina Büsing: A Branch & Bound Algorithm for Robust Binary Optimization with Budget Uncertainty * 24/05/2022 Erika Ábrahám: The Challenge of Compositionality for Stochastic Hybrid Systems * 21/06/2022 Sebastian Trimpe: Uncertainty Bounds for Gaussian Process Regression with Applications to Safe Control and Learning * 28/06/2022 Britta Peis: Stackelberg Network Pricing Games * 05/07/2022 Gerhard Lakemeyer: Tractable Reasoning in First-Order Knowledge Bases We are looking forward to seeing many of you in the UnRAVeL survey lecture "What's New in UnRAVeL?". Best regards, Andreas Klinger, Birgit Willms, and Tim Seppelt Logo

2 Jahre, 3 Monate

Informatik-Oberseminar Matthias Volk

von Birgit Willms

********************************************************************** * * * Einladung * * * * Informatik-Oberseminar * * * +********************************************************************** Zeit: Donnerstag, 28. April 2022, 10:30 Uhr Ort: Raum 9222, Geb. E3, 2. Etage, Informatikzentrum, Ahornstr. 55 Der Vortrag ist auch online über Zoom zu verfolgen: https://rwth.zoom.us/j/99709768339?pwd=MndDQ1MxMVdQWVpYZGpvYSt4bmdKdz09 Meeting-ID: 997 0976 8339, Kenncode: 975390 Referent: Matthias Volk, M.Sc. (Lehrstuhl Informatik 2) Thema: Dynamic Fault Trees: Semantics, Analysis and Applications Abstract: Safe and reliable systems are crucial in today’s society. Fault trees are a prominent and widely-used model to assess and improve the reliability of systems. Fault trees model how component failures propagate through a system and lead to a failure of the overall system. Dynamic fault trees (DFTs) are an extension of (static) fault trees and allow more modelling flexibility by introducing dynamic gates, spare management, functional dependencies and failure restrictions. In this presentation, we investigate dynamic fault trees in detail and consider three main aspects: (1) the precise semantics of DFTs, (2) the analysis of DFTs by model checking techniques, and (3) the application of DFTs, for example in the railway domain. We first specify the semantics of dynamic fault trees in terms of generalized stochastic Petri nets (GSPNs). We investigate multiple semantic questions resulting from the combination of DFT elements. Our resulting GSPN framework subsumes the major existing DFT semantics and allows to pinpoint their differences. Secondly, we present analysis techniques for DFTs based on probabilistic model checking. We introduce several (orthogonal) optimisation techniques which exploit symmetries, irrelevant failures and independent subtrees to improve the state-space generation times. We also show an approximation algorithm based on partial state-space exploration. All presented approaches are implemented in the open-source model checker Storm and evaluated on a DFT benchmark suite. The evaluation shows that our tool Storm-dft is state-of-the-art for DFT analysis. Third, we present the application of DFTs in the railway domain. The case study considers train routing options in railway station areas in terms of available infrastructure elements. We analyse how switch failures impact the potential train routes in a station and determine the most critical components. Es laden ein: die Dozentinnen und Dozenten der Informatik

2 Jahre, 3 Monate

TODAY UnRAVeL Survey Lecture "What's New in UnRAVeL?"

von Tim Seppelt

Dear all, this is a reminder for Michael Schaub's talk on Signal processing on graphs and complexes <https://www.unravel.rwth-aachen.de/go/id/tbarp?lidx=1> taking place *today at 16:30* in room 5053.2 and on Zoom. Please find the details below. > Graph signal processing (GSP) tries to device appropriate tools to > process signals supported on graphs by generalizing classical methods > from signal processing of time-series and images -- such as smoothing, > filtering and interpolation of signals supported on the nodes of a > graph. Typically, this involves leveraging the structure of the graph > as encoded in the spectral properties of the graph Laplacian. > In certain scenarios, such as traffic network analysis, the signals of > interest are however naturally defined on the edges of a graph, rather > than on the nodes. After a brief recap of the central ideas of GSP, we > examine why standard tools from GSP may not be suitable for the > analysis of such edge signals. More specifically, we discuss how the > underlying notion of 'signal vs noise' inherited from typically > considered variants of the graph Laplacian are not suitable when > dealing with edge signals that encode flows. To overcome this > limitation, we devise signal processing tools based on the > Hodge-Laplacian and the associated discrete Hodge Theory for > simplicial (and cellular) complexes. We discuss applications of these > ideas for signal smoothing, semi-supervised and active learning for > edge-flows on discrete or discretized spaces. Part of the programme of the research training group UnRAVeL is a series of introductory lectures on the topics of "randomness" and "uncertainty" in UnRAVeL’s research thrusts: Algorithms and complexity, verification, logic and languages, and their application scenarios. The main aim is to provide doctoral researchers as well as master students a broad overview of the subjects of UnRAVeL. Science undergoes continuous change and lives from the constant quest for novel and better results, which are presented at conferences and in journals. This year, 10 UnRAVeL professors will present some of their most recent research successes. Everyone interested, in particular doctoral researchers and master students, are invited to attend the UnRAVeL lecture series 2022 and engage in discussions with the researchers. The talks take place on Tuesdays, 16:30–18:00 in room 5053.2 in the ground floor of building E2. All events are hybrid. To join remotely, please use https://rwth.zoom.us/j/96003885007?pwd=aUczMVdVU0ZXVGtQUFpwQnJHQUFhUT09 / Meeting ID: 960 0388 5007 / Passcode: 273710 Please find a list of all scheduled talks on the UnRAVeL website <https://www.unravel.rwth-aachen.de/cms/UnRAVeL/Studium/~pzix/Ringvorlesung-…> and below: * 19/04/2022 Michael Schaub: Signal processing on graphs and complexes * 26/04/2022 Nils Nießen: Acceptance of Driverless Trains * 03/05/2022 Jürgen Giesl: Improving Automatic Complexity Analysis of Probabilistic and Non-Probabilistic Integer Programs * 10/05/2022 Martin Grohe: Graph Representations Based on Homomorphisms * 17/05/2022 Christina Büsing: A Branch & Bound Algorithm for Robust Binary Optimization with Budget Uncertainty * 24/05/2022 Erika Ábrahám: The Challenge of Compositionality for Stochastic Hybrid Systems * 21/06/2022 Sebastian Trimpe: Uncertainty Bounds for Gaussian Process Regression with Applications to Safe Control and Learning * 28/06/2022 Britta Peis: Stackelberg Network Pricing Games * 05/07/2022 Gerhard Lakemeyer: Tractable Reasoning in First-Order Knowledge Bases We are looking forward to seeing many of you in the UnRAVeL survey lecture "What's New in UnRAVeL?". Best regards, Andreas Klinger, Birgit Willms, and Tim Seppelt Logo

2 Jahre, 3 Monate

UnRAVeL Survey Lecture "What's New in UnRAVeL?"

von Tim Seppelt

Dear all, part of the programme of the research training group UnRAVeL is a series of introductory lectures on the topics of "randomness" and "uncertainty" in UnRAVeL’s research thrusts: Algorithms and complexity, verification, logic and languages, and their application scenarios. The main aim is to provide doctoral researchers as well as master students a broad overview of the subjects of UnRAVeL. Science undergoes continuous change and lives from the constant quest for novel and better results, which are presented at conferences and in journals. This year, 10 UnRAVeL professors will present some of their most recent research successes. Everyone interested, in particular doctoral researchers and master students, are invited to attend the UnRAVeL lecture series 2022 and engage in discussions with the researchers. The talks take place on Tuesdays, 16:30–18:00 in room 5053.2 in the ground floor of building E2. All events are hybrid. To join remotely, please use https://rwth.zoom.us/j/96003885007?pwd=aUczMVdVU0ZXVGtQUFpwQnJHQUFhUT09 / Meeting ID: 960 0388 5007 / Passcode: 273710 Please find a list of all scheduled talks on the UnRAVeL website <https://www.unravel.rwth-aachen.de/cms/UnRAVeL/Studium/~pzix/Ringvorlesung-…> and below: * 19/04/2022 Michael Schaub: Signal processing on graphs and complexes * 26/04/2022 Nils Nießen: Acceptance of Driverless Trains * 03/05/2022 Jürgen Giesl: Improving Automatic Complexity Analysis of Probabilistic and Non-Probabilistic Integer Programs * 10/05/2022 Martin Grohe: Graph Representations Based on Homomorphisms * 17/05/2022 Christina Büsing: A Branch & Bound Algorithm for Robust Binary Optimization with Budget Uncertainty * 24/05/2022 Erika Ábrahám: The Challenge of Compositionality for Stochastic Hybrid Systems * 21/06/2022 Sebastian Trimpe: Uncertainty Bounds for Gaussian Process Regression with Applications to Safe Control and Learning * 28/06/2022 Britta Peis: Stackelberg Network Pricing Games * 05/07/2022 Gerhard Lakemeyer: Tractable Reasoning in First-Order Knowledge Bases We are looking forward to seeing many of you in the UnRAVeL survey lecture "What's New in UnRAVeL?". Best regards, Andreas Klinger, Birgit Willms, and Tim Seppelt Logo

2 Jahre, 3 Monate

Einladung: Informatik-Oberseminar Krishna Subramanian

von Subramanian, Krishna

+********************************************************************** * * * Einladung * * * * Informatik-Oberseminar * * * +********************************************************************** Zeit: Freitag, 8. April 2022, 12.15 Uhr Zoom URL: https://rwth.zoom.us/j/97644054920 Referent: Krishna Subramanian, M.Sc. Lehrstuhl für Informatik 10 Thema: Lowering the Barriers to Hypothesis-Driven Data Science Abstract: Data science is a frequent task in academia and industry. One common use of data science is to validate hypotheses, in which the analyst uses significance-based hypothesis testing to draw insights about a population distribution based on experimental data. Apart from data scientists, who are professionally trained in data science and are highly skilled, many non-professional analysts also carry out data analysis. These non-professionals, who we refer to as data workers, are domain experts who lack expertise in data science, such as academic researchers, project managers, and sales managers. Through interviews, observations, online surveys, and content analyses, we aim to understand data workers' workflows across important tasks in hypothesis testing: learning theoretical and practical statistics, selecting statistical procedures, using data science programming IDEs to experiment with ideas in source code, refine and refactor source code, and disseminating findings from an analysis.  We present our findings grouped into two steps when performing data science tasks: 1. Preparing to perform data science tasks: We discuss our findings about the impact of formal training on real-world statistical practice; trade-offs among information sources used for selecting statistical procedures; perceived complexity and uncertainty about statistical procedure selection; and reluctance among data workers to adopt alternative methods of analysis.  Based on the above findings, we present design recommendations and two artifacts to improve data workers' workflows. Our artifacts include Statsplorer, a web-based tool to help data workers kickstart analysis and learn about common issues in statistical practice, such as over-testing, overlooking assumptions, and selecting the appropriate test; and StatPlayground, an interactive simulation tool that can be used to self-learn or teach statistical concepts and statistical procedure selection.  2. Performing data science tasks: Our findings include an overview of data workers' workflows when performing hypothesis testing using programming IDEs, which follows an exploratory programming workflow; and a comparison of existing interfaces for data science programming, namely computational notebooks, scripts, and consoles, and a discussion of how well they support various steps in hypothesis testing.  To improve data workers' workflows when performing data science tasks, we contribute design recommendations and two artifacts. Our artifacts include StatWire, an experimental hybrid-programming interface that encourages data workers to write high-quality source code; and Tractus, an interactive visualization that can lower the cost of working with experimental source code. Based on our work, we present four takeaways that can be used by researchers, software developers, and educators to lower the barriers to hypothesis testing. --- Es laden ein: die Dozentinnen und Dozenten der Informatik

2 Jahre, 3 Monate

2024

2023

2022

2021

2020

2019

2018

informatik-vortraege April 2022