Meta/Facebook, AI and Systems

Position ID:

Meta/Facebook-AI and Systems-RESEARCH [#29126]

Position Title:

Research Scientist

Position Type:

Government or industry

Position Location:

Menlo Park, California 94025, United States of America

Subject Area:

Computer Science / All areas

Appl Deadline:

finished (2024/11/02, finished 2025/11/08, listed until 2025/05/02)

Position Description:

*** this position has been closed. ***

The AI and Systems Co-Design team at Meta (formerly known as Facebook) is hiring PhDs graduating in 2025 for full-time positions and PhDs graduating in 2026 for summer internships. Additionally, we are hiring PhDs with industry experience for full-time positions. Please submit your resume to codesign@meta.com.

The AI and Systems Co-Design team, led by Chunqiang Tang (a.k.a. CQ Tang), consists of over 100 employees, mostly PhDs, including many world-class research scientists and engineers. As reflected in our team name "co-design", we conduct interdisciplinary research and development across AI, hardware, and software, with a focus on performance, efficiency, and scalability.

We own the company's overall strategy for exploring innovative hardware technologies for CPUs, GPUs, memory, storage, and Meta's custom AI chips. We directly productionize them in Meta's hyperscale fleet of O(1,000,000) servers and O(100,000) GPUs, powering all Meta products such as Facebook, Instagram, and meta.ai.
We apply novel software optimizations across the whole stack---from ML models and applications to the Linux kernel---to achieve optimal performance on the hardware.
We develop innovative AI technologies for large language models (Llama), ranking systems, and more.

Overall, our work largely corresponds to the research communities of hardware architecture (ISCA, ASPLOS), systems for ML (MLSys and ML-related parts of SOSP, OSDI, SIGCOMM, NSDI), ML (NeurIPS, ICML, ICLR) and supercomputing (SC, ICS).

Here are selected publications that showcase our work in diverse areas.

AI chip and server design

Systems for AI

The Llama 3 Herd of Models.

Our contributions include much of the work described in the paper's Section 3.3 "Infrastructure, Scaling, and Efficiency", Section 6 "Inference", and Section 7.3 "Model Scaling".

Llama 2: Open foundation and fine-tuned chat models.

Our contributions include re-architecting Llama's training infrastructure and transitioning it from a research environment to Meta's hyperscale production infrastructure, enabling future Llama training to scale to tens of thousands of GPUs and beyond.

Software-Hardware Co-design for Fast and Scalable Training of Deep Learning Recommendation Models
PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel

ML models and kernels

ML numerics, pruning, distillation, and optimizer

Microscaling Data Formats for Deep Learning

INT4 Decoding GQA CUDA Optimizations for LLM Inference

HPC and collective communications library (MPI, NCCL, RCCL)

Performance benchmarking and projection

Hardware and software co-design

Like research labs, our team consists primarily of PhDs, and we strongly encourage and excel in research publications. However, we differ from traditional research labs in several key ways:

Production systems: Our primary goal is to develop forward-looking innovations in AI, hardware, and software, and directly implement them in production systems that serve billions of people. The billions of users of Meta products and Meta's hyperscale fleet of O(1,000,000) servers and O(100,000) GPUs are, in effect, our lab. In contrast, traditional research labs often rely on technology transfer for a less direct impact.
Direct ownership: Like traditional research labs, we build strong partnerships with numerous teams across diverse areas for broad influence. However, what sets us apart is our direct ownership of the hardware strategy for Meta's hyperscale fleet. This enables us to lead in many areas while fostering seamless partnerships in others.
Impact: Our impact is widely recognized across the company. We drive Meta's hardware strategy to save billions of dollars, and directly develop innovative technologies in Meta's flagship products like Llama and Ads ranking models.

See our list of publications for more details.

We are not accepting applications for this job through AcademicJobsOnline.Org right now. Please apply at https://2024resumedropco-design.splashthat.com/ .

Contact: Chunqiang Tang

Email: email address

Postal Mail:

: 1 Hacker Way, Menlo Park, CA 94025

Web Page: https://aisystemcodesign.github.io/