Jonathan Nöther

I am a doctoral student at the Max-Planck Institute for Software Systems, where my research focuses on secure and safe machine learning. I am co-advised by Adish Singla and Goran Radanovic.

Publications

MaMa: A Game-Theoretic Approach for Designing Safe Agentic Systems
With Adish Singla and Goran Radanovic
Preprint, Under Review
TL;DR: Automatic Design of Safe Agentic Systems using a two-player game between a system designer and an attacker
AgenticRed: Optimizing Agentic Systems for Automated Red-teaming
With Jiayi Yuan, Natasha Jaques, and Goran Radanovic
Preprint, Under Review
TL;DR: Automatically design red-teaming workflows without human intervention
Benchmarking the Robustness of Agentic Systems to Adversarially-Induced Harmful Actions
With Adish Singla and Goran Radanovic
TL;DR: Benchmark for testing the robustness of LLM-based agents against adversaries that aim to manipulate them into performing dangerous actions
Text-Diffusion Red-Teaming of Large Language Models: Unveiling Harmful Behaviors with Proximity Constraints
With Adish Singla and Goran Radanovic
AAAI (Oral)
TL;DR: Applying text-diffusion models to red-teaming to satisfy proximity constraints with regards to a reference prompt
Defending Against Unknown Corrupted Agents: Reinforcement Learning of Adversarially Robust Nash Equilibria
With Andi Nika, Adish Singla, Goran Radanovic
TMLR
TL;DR: Training robust agents in an MARL setting where an attacker can abitrarily corrupt a subset of peer agents of a given cardinality
Implicit poisoning attacks in two-agent reinforcement learning: Adversarial policies for training-time attacks
With Mohammad Mohammadi, Debmalya Mandal, Adish Singla, Goran Radanovic
AAMAS 2023
TL;DR: Attacking an agent my poisoning the policy of a peer agent during training

Projects

Inpaiting Detection
Combine automatic segmentation with inpainting to automatically create edited images. Additionally experimented with detecting these faked images.
Safe Streets
Extend pedestrian route recommendation by taking into account the safety of the route (e.g. lights, open shops).
Interview Performance Prediction and Lie Detection
Implementation of model that evaluated the performance and detected lies of a participant of mock-job interviews.

Experience

08/2022-07/2024: Research assistant in the Machine Teaching Group at MPI-SWS

Teaching Experience

Winter 2024/2025: Teaching Assistant for the Course “Generative AI”
Summer 2024: Teaching Assistant for the Seminar “Trustworthiness of Foundation Models”
Summer 2022: Teaching Assistant for the Lecture “Statistics Lab”
Summer 2022: Teaching Assistant for the Lecture “Artificial Intelligence”
Winter 2019/2020: Teaching Assistant For “Programming 1”

Education

10/2024-ongoing: PhD in Computer Science at the Max Planck Instutute for Software Systems
12/2022-08/2024: M.Sc. in Data Science and Artificial Intelligence at Saarland University
10/2019-11/2022: B.Sc in Data Science and Artificial Intelligence at Saarland University