Agents-X

8th July, 2025

PyVision: Agentic Vision with Dynamic Tooling.

We investigate Python code as visual primitives for image manipulation and reasoning. GPT-4.1 can generate code for image modifications, offering a novel approach to Visual Question Answering (VQA).

Nov. 2025

TIR-Bench: A Comprehensive Benchmark for Agentic Vision.

We collect a series of diverse VQA tasks, designed for agentic vision, including color recognition, low-light, instrument reading, jigsaw, math, maze, rotated OCR, proportion, rotation, spot the difference, symbolic reasoning, visual search, and word search.

Jan. 2026

PyVision-RL: Forging Open Agentic Vision Models via RL.

We build PyVision-Image and PyVision-Video via RL, achieving state-of-the-art on visual search, multi-modal reasoning, agentic reasoning and spatial reasoning tasks.

Agents-X

Latest Research

PyVision: Agentic Vision with Dynamic Tooling.

TIR-Bench: A Comprehensive Benchmark for Agentic Vision.

PyVision-RL: Forging Open Agentic Vision Models via RL.