Schedule
April 26, 2026 - Rio de Janeiro, Brazil
Location: Room 211, Riocentro Convention and Event Center
Location: Room 211, Riocentro Convention and Event Center
09:00 AM - 09:05 AM
09:05 AM - 09:50 AM
Abstract: In this talk, I will present a vision for human-centered artificial intelligence that emphasizes two (non-exhaustive) properties: AI that understands humans, and AI that helps people understand AI systems. Alignment is often cast as the process of making AI systems helpful, harmless, and honest; we can also think of these principles as AI system needing to understand what behaviors human users view as helpful and harmful. Explainable AI (XAI) is one path toward AI that helps its users understand its behaviors. We explore how reasoning can act as a framework for unifying perspectives on alignment and explainability. I will present work from my lab on alignment, explainability, and reasoning as partial glimpses at what might be possible.
Invited Talk
09:50 AM - 10:00 AM Short break
10:00 AM - 11:00 AM
11:00 AM - 11:05 AM
Title: Moral Preferences of LLMs Under Directed Contextual Influence
Contributed Talk
11:05 AM - 11:10 AM
Title: When AI Describes Race? Unveiling Racial Bias in Vision-Language Models in Brazilian People
Contributed Talk
11:10 AM - 11:15 AM
Title: Operationalizing Fairness in Text-to-Image Models: A Survey of Bias, Fairness Audits and Mitigation Strategies
Contributed Talk
11:15 AM - 11:20 AM Short break
11:20 AM - 12:05 PM
Abstract: As generative models evolve into autonomous agents, evaluation must shift from static "Pixel Parity" to dynamic "Procedural Parity". While traditional safety frameworks audit a single output, agentic workflows introduce sequential risks like behavioral drift over long horizons. This talk outlines a roadmap toward a formal Evaluation Science, focusing on critical safety aspects like fairness and alignment in multimodal agents. We bridge foundational statistical baselines used to measure multimodal alignment and distributional skew with Equality of Service, assessing whether quality remain uniform across demographic intersections. By investigating how agents resolve query ambiguity, we highlight the importance of evaluating the equity of the decision-making process itself. Finally, we describe a statistical method that unifies human and automated signals, providing a scalable path to help realize the ultimate goal of treating responsible AI as a measurable, predictable property of autonomous systems.
Invited Talk
12:05 PM - 12:15 PM Short break
12:15 PM - 01:00 PM
Discussions
Lead: Francielle Vargas (University of São Paulo)
Lead: Yatong Chen (MPI)
Lead: Andreas Haupt (Stanford University)
01:00 PM - 02:00 PM Lunch break
02:00 PM - 02:45 PM
Abstract: We present a unified perspective on test-time thinking as a lens for improving generative AI agents through finer-grained reward modeling, data-centric reasoning, and robust alignment. Beginning with GenARM, we introduce an inductive bias for denser, token-level reward modeling that guides generation during decoding, enabling token-level alignment without retraining. While GenARM targets reward design, ThinkLite-VL focuses on the data side of reasoning. It proposes a self-improvement framework that selects the most informative samples via MCTS-guided search, yielding stronger visual reasoning with fewer labels. Taking this a step further, MORSE-500 moves beyond selection to creation: it programmatically generates targeted, controllable multimodal data to systematically probe and stress-test models’ reasoning abilities. We then interrogate a central assumption in inference-time alignment: Does Thinking More Always Help? Our findings reveal that increased reasoning steps can degrade performance--not due to better or worse reasoning per se, but due to rising variance in outputs, challenging the naive scaling paradigm. Finally, AegisLLM applies test-time thinking in the service of security, using an agentic, multi-perspective framework to defend against jailbreaks, prompt injections, and unlearning attacks--all at inference time. Together, these works chart a path toward generative agents that are not only more capable, but more data-efficient, introspective, and robust in real-world deployment.
Invited Talk
02:45 PM - 02:55 PM Short break
02:55 PM - 03:55 PM
03:55 PM - 04:00 PM
Title: Lost in Simulation: LLM-Simulated Users are Unreliable Proxies for Human Users in Agentic Evaluations
Contributed Talk
04:00 PM - 04:05 PM
Title: Mechanics of Bias and Reasoning: Interpreting the Impact of Chain-of-Thought Prompting on Gender Bias in LLMs
Contributed Talk
04:05 PM - 04:10 PM
Title: Fairness Failure Modes of Multimodal LLMs
Contributed Talk
04:10 PM - 04:15 PM Short break
04:15 PM - 04:55 PM
Mark Riedl (Georgia Tech)
Furong Huang (University of Maryland)
Isabela Albuquerque (Google DeepMind)
Discussions
04:55 PM - 05:05 PM