Jonathan Nöther
I am a doctoral student at Max-Planck Institute for Software Systems interested in secure and safe machine learning. I am co-adviced by Adish Singla and Goran Radanovic.
Publications
- MaMa: A Game-Theoretic Approach for Designing Safe Agentic Systems
With Adish Singla and Goran Radanovic
Preprint, Under Review
TL;DR: Automatic Design of Safe Agentic Systems using a two-player game between a system designer and an attacker - AgenticRed: Optimizing Agentic Systems for Automated Red-teaming
With Jiayi Yuan, Natasha Jaques, and Goran Radanovic
Preprint, Under Review
TL;DR: Automatically design red-teaming workflows without human intervention - Benchmarking the Robustness of Agentic Systems to Adversarially-Induced Harmful Actions
With Adish Singla and Goran Radanovic
TL;DR: Benchmark for testing the robustness of LLM-based agents against adversaries that aim to manipulate them into performing dangerous actions - Text-Diffusion Red-Teaming of Large Language Models: Unveiling Harmful Behaviors with Proximity Constraints
With Adish Singla and Goran Radanovic
AAAI (Oral)
TL;DR: Applying text-diffusion models to red-teaming to satisfy proximity constraints with regards to a reference prompt - Defending Against Unknown Corrupted Agents: Reinforcement Learning of Adversarially Robust Nash Equilibria
With Andi Nika, Adish Singla, Goran Radanovic
TMLR
TL;DR: Training robust agents in an MARL setting where an attacker can abitrarily corrupt a subset of peer agents of a given cardinality - Implicit poisoning attacks in two-agent reinforcement learning: Adversarial policies for training-time attacks
With Mohammad Mohammadi, Debmalya Mandal, Adish Singla, Goran Radanovic
AAMAS 2023
TL;DR: Attacking an agent my poisoning the policy of a peer agent during training
Projects
- Inpaiting Detection
Combine automatic segmentation with inpainting to automatically create edited images. Additionally experimented with detecting these faked images. - Safe Streets
Extend pedestrian route recommendation by taking into account the safety of the route (e.g. lights, open shops). - Interview Performance Prediction and Lie Detection
Implementation of model that evaluated the performance and detected lies of a participant of mock-job interviews.
Experience
- 08/2022-07/2024: Research assistant in the Machine Teaching Group at MPI-SWS
Teaching Experience
- Winter 2024/2025: Teaching Assistant for the Course “Generative AI”
- Summer 2024: Teaching Assistant for the Seminar “Trustworthiness of Foundation Models”
- Summer 2022: Teaching Assistant for the Lecture “Statistics Lab”
- Summer 2022: Teaching Assistant for the Lecture “Artificial Intelligence”
- Winter 2019/2020: Teaching Assistant For “Programming 1”
Education
- 10/2024-ongoing: PhD in Computer Science at the Max Planck Instutute for Software Systems
- 12/2022-08/2024: M.Sc. in Data Science and Artificial Intelligence at Saarland University
- 10/2019-11/2022: B.Sc in Data Science and Artificial Intelligence at Saarland University
