Qi Qi

Hi! I am currently a Master of Science student in Computer Science at the University of California San Diego (UCSD). At UCSD, I work with Prof. Yiying Zhang on Efficient Systems for Machine Learning. I also worked with Prof. Hao Wang and Prof. Hong Zhang on Serverless Computing for AI. Previously, I obtained my Bachelor of Engineering in Computer Science and Technology from Huazhong University of Science and Technology (HUST). During my undergraduate studies, I've also done research work on Computer Vision and Multimodal Learning. I also participated in an exchange program at the Technical University of Munich (TUM).

Email  /  Github

profile photo

Research

My current research focuses on Efficient Systems for Machine Learning, especially for reasoning and agentic systems. I also built serverless systems to accelerate machine learning inference and training.

Efficient Reasoning and Agentic Systems

Demystifying Delays in Reasoning: A Pilot Temporal and Token Analysis of Reasoning Systems
Qi Qi, Reyna Abhyankar, Yiying Zhang
NeurIPS 2025 Workshop on Efficient Reasoning, (ER '25) [Blog] [Poster]

We present the first systematic temporal study of three representative reasoning models and agents, OpenAI o3-deep-research, GPT-5, and the LangChain Deep Research Agent on DeepResearch Bench. By instrumenting each system, we decompose end-toend request latency and token costs across reasoning, web search, and answer generation. We find that web search often dominates end-to-end request latency and that final answer generation consumes most tokens due to the lengthy retrieved context, implying that tool latency and retrieval design are primary levers for speeding up reasoning end-to-end.

OSWorld-Human: Benchmarking the Efficiency of Computer-Use Agents
Reyna Abhyankar*, Qi Qi*, Yiying Zhang
Full Paper Under Peer Review

To understand the cause behind this and to guide future developments of computer agents, we conduct the first study on the temporal performance of computer-use agents on OSWorld, the flagship benchmark in computer-use AI. We find that large model calls for planning, reflection, and judging account for most of the overall latency, and as an agent uses more steps to complete a task, each successive step can take 3x longer than steps at the beginning of a task. We then construct OSWorld-Human, a manually annotated version of the original OSWorld dataset that contains a human-determined trajectory for each task. We evaluate 16 agents on their efficiency using OSWorld-Human with grouped actions and Weighted Efficiency Score (WES), and found that even the best agents take 1.5-2.4x more steps than necessary.

OSWorld-Human: Benchmarking the Efficiency of Computer-Use Agents
Reyna Abhyankar, Qi Qi, Yiying Zhang
ICML 2025 Workshop on Computer Use Agents, (WUCA '25) [Blog]

Serverless Systems for Machine Learning

GearUp: Accelerating Serverless DAG-based Inference with Joint Pre-warming and Pre-loading
Supervised by: Hao Wang, Hong Zhang
Collaborators: Yifan Sui

Presentations

SRC 2025 PRISM Annual Review
Nov.12-13, 2025
[Poster] [Video] [Slide]

Teaching and Mentoring

Teaching Assistant
CSE 291 Systems for LLMs and AI Agents (Graduate level, Fall 2025)

Template stolen from Haoran Zhang and Jon Barron.