Uber AI Solutions is one of Uber's biggest bets with the ambition to build one of the world's largest data foundries for AI applications and evolve into a platform of choice for a variety of online tasks. The Moonshot AI team focuses on accelerating human-in-the-loop data annotation and collection with automation and developing robust automated evaluation systems. In this role, you will collaborate closely with research scientists, engineers, and cross-functional teams to deliver real-world impact through your research. You'll help grow Uber AI Solutions into a leader in the space. What the candidate will do: Drive research in areas such as LLM post-training (RLHF, GRPO, instruction tuning), data efficiency, and the design of benchmarks to evaluate LLM capabilities across safety, reasoning, and domain-specific performance. Design and run experiments to validate hypotheses and iterate on research ideas. Collaborate with research scientists and engineers to prototype and evaluate novel approaches. Produce publication-ready research targeting top-tier AI/ML conferences. Requirements: Currently pursuing a Ph.D. in Computer Science, Machine Learning, Natural Language Processing, or a related field. Published work at top-tier AI/ML conferences (e.g., NeurIPS, ICML, ICLR, ACL, EMNLP, CVPR, COLM). Deep expertise in at least one of the following: LLM post-training (RLHF, instruction tuning), LLM evaluation, reasoning and agents, data efficiency, or alignment and safety. Proficiency in Python and deep learning frameworks (e.g., PyTorch, JAX). Hands-on experience training or fine-tuning large language models. Preferred Qualifications First-author publications at top-tier AI/ML conferences. Experience with distributed training frameworks (e.g., DeepSpeed, FSDP, Megatron). Contributions to open-source LLM projects or frameworks. Demonstrated ability to rapidly prototype and iterate on research ideas. Current research interests Real-world LLM Benchmarking: Moving beyond standard metrics to create benchmarks that map model performance to real-world business impact and responsible usage. Agentic Quality Evaluation: Developing agentic systems to automatically evaluate dataset quality and adherence to requirements. Few-Shot Grounding: Utilizing small subsets of annotated data (e.g., 10%) to significantly boost ML assistance for the remaining 90%. Human-in-the-Loop Optimization: Minimizing human intervention in annotation tasks by integrating robust automated checks and feedback loops. Apply for this job online

2026 PhD Research Intern

Job Description

Requirements

About Uber

Resume Reviewer

Trending Jobs

AI/ML Engineer

Associate AI/ML Engineer

AI Engineer Intern

Ready to Start Your Journey?