Speaker Schedule (2025)

DateSpeakerTalk InformationRecommended Reading
4/3Eddie Zhang,
OpenAI
Eddie Zhang
On Mitigating Hallucinations in Large Language Models

Edwin Zhang is a researcher specializing in AI safety, reinforcement learning, and language modeling, currently at OpenAI. He was previously pursuing a PhD in Computer Science at Harvard University. His research focuses on applying AI to real-world challenges for social good, in domains such as safety, economics, and policy-making.
Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models

A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions

Extrinsic Hallucinations in LLMS

LLM Hallucination Survery (Github)
4/10Jerry Lopez, Torc Robotics


TBD
4/17Eugenia Kim,
Microsoft Research
Eugenia Kim
Inside the AI Red Team: Our Multidisciplinary Approach

Eugenia Kim is a software engineer on Microsoft’s AI Red Team, where she focuses on developing tools for red teaming generative AI systems. Combining her engineering expertise with research into ethical AI, Eugenia is dedicated to creating safer and more responsible AI products. Her prior research in bias mitigation and AI sustainability has shaped her multidisciplinary approach to advancing AI safety.
4/24Masha Itkina,
Toyota Research Institute
Title: TBD

Masha Itkina is a Research Scientist at the Toyota Research Institute (TRI) where she co-leads the Trustworthy Learning under Uncertainty (TLU) research direction in the context of Large Behavior Models (LBMs). She is currently researching statistical robot evaluation, failure detection, and active learning. She received her PhD from the Stanford Intelligent Systems Laboratory (SISL), where her research focused on perception for self-driving cars. In particular, perception systems that robustly predict the evolution of the dynamic environment and make inference into spatially occluded regions, while taking into account uncertainty.
5/1Shannon Novak,
Safe Space Alliance
When Friends Turn Fatal: Conversational AI’s Safety Meltdown

Note: This talk will discuss sensitive topics like depression, self-harm, and suicide

There are increasing reports of conversational AI agents causing, or having the potential to cause significant harm to people through failure or malfunction including death. As the director of a nonprofit organisation dedicated to the safety of queer people around the world I receive reports almost daily of conversational AI agents gone bad including agents that: threaten to out queer people, encourage suicide, provide misinformation about queer communities, have extreme biases against queer people, and misuse data.

In this talk I’ll expand beyond (but still include) queer communities to all communities and discuss the alarming lack of safety in 14 conversational AI agents (including ChatGPT, Grok, Replika, and children’s AI companions). I’ll introduce the “Maida Test”, a new framework for stress testing conversational agents, a proposed conversational AI agent safety benchmark system, and a newly developed “Conversational AI Agent Safety Rating (CAASR)” system. I’ll talk about the CAASR rating (from A+ to F) calculated for each agent (the average across all agents being an “F”), and why an immediate and collaborative response is needed between developers, policy makers, researchers, and users to address the safety threats identified.

Shannon is the director of the Safe Space Alliance, a queer-led nonprofit organization that helps people identify, navigate, and create safe spaces for queer communities worldwide. He spent 20 years in the education and technology sectors across industries before moving into his current role. His current research focuses on reducing rates of anxiety, suicide, and depression for queer communities, and emerging technologies and their relationships with queer communities.
5/8Somin Bansal,
Stanford
Somil Bansal
Title: TBD

Somil is an assistant professor in the Department of Aeronautics and Astronautics at Stanford, where he leads the Safe and Intelligent Autonomy Lab. His research focuses on understanding how machine learning methods can be combined with classical, model-based planning and control methods with the goal of enabling intelligent and safe decision making in complex and uncertain environments, especially when the robot relies on high-dimensional sensory inputs and data for decision making and control.
5/15Dimitra Giannakopoulou,
Amazon Web Services
TBD
5/22Mantas Mazeika,
Center for AI Safety
Mantas
TBD
5/29Nicholas Carlini,
Deepmind
Nicholas Carlini
LLMs for Security: Where are we now?

LLMs are amazingly capable at many tasks. But how good are they at being useful in computer security? In this talk I first discuss one attempt at studying to what extent LLMs are capable of performing challenging research-level security tasks. As it turns out, they’re not very good at this yet. So then I consider a second question: what could LLMs do to change security today? I argue there are many domains where even the capabilities of current models would more than suffice to fundamentally alter the economics of how attacks are performed and monetized. Finally, I conclude with some thoughts looking towards the future.

Nicholas is a research scientist at Google DeepMind (formerly at Google Brain) working at the intersection of machine learning and computer security. His most recent line of work studies properties of neural networks from an adversarial perspective. He received his Ph.D. from UC Berkeley in 2018, and his B.A. in computer science and mathematics (also from UC Berkeley) in 2013.