Rakuten logo
EDB-IPP Project: LLM Model Compression and Acceleration
full-timeSingapore

Summary

Location

Singapore

Type

full-time

Explore Jobs

About this role

Job Description:

Rakuten Asia, in partnership with the Economic Development Board (EDB) through the Industrial Postgraduate Programme (IPP), is seeking new PhD students. We are looking for individuals with a robust understanding of deep learning, machine learning, and natural language processing to contribute to our innovative research projects.

Essential requirements include proven hands-on expertise and strong engineering skillsets, specifically in the development and training of PyTorch models.

IPP Programme Benefits
Candidates successfully selected for this programme will receive full sponsorship for their postgraduate studies and will be hired by Rakuten Asia upon successful completion.

Collaboration Model

The collaboration will include joint PhD student supervision, shared access to computational resources for large-scale model compression experiments, and regular research exchanges. Output will include high-impact publications, open-source tools, and demonstrable prototypes of efficient AI.

Project Outline

Introduction

The rapid advancements in large-scale AI models, including Large Language Models (LLMs), Multimodal Large Language Models (MLLMs), and Diffusion Models, have unleashed unprecedented capabilities across diverse domains. However, the immense computational and memory demands of these “big models” pose significant challenges for their widespread deployment, real-time inference, and sustainable operation. To truly democratize and scale the power of modern AI, Big Model Compression and Acceleration is not just an optimization; it is a fundamental requirement.

Objectives

The collaboration aims to:

  • Develop foundational techniques for compressing large AI models, specifically targeting LLMs, MLLMs, and Diffusion Models, to significantly reduce their parameter count and memory footprint without compromising performance.
  • Advance methods for accelerating the inference of these big models, enabling real-time responsiveness and high-throughput processing across various applications, from natural language understanding to high-fidelity image generation.
  • Prototype and validate efficient AI systems for real-world applications, demonstrating significant gains in speed, energy efficiency, and deployability for LLMs, MLLMs, and Diffusion Models.
  • Nurture PhD-level talent through joint supervision and research internships, fostering expertise in the deployment and scaling of efficient AI.

Proposed Research Areas

We propose collaboration across the following topics, with openness to refining based on shared interests:

  • Advanced Quantization Techniques for LLMs, MLLMs, and Diffusion Models:

Expore novel quantization methods (e.g. beyond 8-bit, mixed-precision, adaptive) to drastically reduce model size and accelerate computation while maintaining high accuracy. This includes investigating learned quantization schemes and robust post-training quantization, specially tailored for the unique architectures and data distributions of LLMs, MLLMs (e.g., multimodal embeddings), and Diffusion Models (e.g., generative quality).

  • Structured and Unstructured Pruning for Large Generative Models:

Develop sophisticated pruning algorithms to remove redundant parameters from LLMs, MLLMs, and Diffusion Models. Focus will be on achieving high sparsity without significant accuracy or quality loss, through techniques like dynamic, magnitude-based, and Hessian-aware pruning. We will specifically consider their impact on text coherence, image fidelity, cross-modal alignment, and the preservation of emergent capabilities, ensuring high-quality output and avoiding issues like “hallucinations” or mode collapse.

  • Efficient Knowledge Distillation for Diverse Model Modalities:

Investigate novel knowledge distillation approaches to transfer knowledge from large “teacher” models (LLMs, MLLMs, Diffusion Models) to smaller, more efficient “student” models. This includes exploring various distillation objectives, multi-teacher, and progressive distillation, accounting for the nuances of language, visual, and multimodal data. Research will cover distilling reasoning from LLMs, multimodal knowledge transfer from MLLMs, and accelerating Diffusion Model sampling without quality degradation.

  • Dynamic Token Pruning and Efficient Sequence Processing:

Explore novel methods for reducing computational costs for long text sequences (tokens) and high-resolution visual data (visual tokens/patches). This involves developing strategies for adaptive text token dropping in LLMs/MLLMs and vision token/patch pruning in MLLMs/Diffusion Models, selectively discarding less informative data. Research also includes advanced techniques like sparse attention mechanisms to reduce quadratic complexity, and token merging/condensation for compact representations. The aim is to significantly reduce FLOPs and memory footprint during inference while maintain performance, quantifying efficiency gains by reducing effective sequence length.

  • Efficient Generative Sampling and Inference Optimization:

Focus on accelerating the sampling process for generative models (LLMs, MLLMs, Diffusion Models) without compromising output quality. This includes research into faster text decoding strategies (e.g., speculative, tree-based, parallel-decoding) for LLM/MLLM inference. For Diffusion Models, this involves developing advanced sampling techniques (e.g., novel schedules, consistency models, score distillation) to significantly reduce generation steps. We will also optimize inference pipelines for conditional generation tasks, alongside theoretical analysis of generation speed versus quality trade-offs.

Other facts

Tech stack
Deep Learning,Machine Learning,Natural Language Processing,PyTorch,Model Compression,Model Acceleration,Quantization Techniques,Pruning Algorithms,Knowledge Distillation,Dynamic Token Pruning,Generative Sampling,Inference Optimization,AI Systems Prototyping,Research Internships,Computational Resources,High-Impact Publications

About Rakuten

Rakuten Group, Inc. (TSE: 4755) is a global technology leader in services that empower individuals, communities, businesses and society. Founded in Tokyo in 1997 as an online marketplace, Rakuten has expanded to offer services in e-commerce, fintech, digital content and communications to 2 billion members around the world. The Rakuten Group has more than 30,000 employees, and operations in 30 countries and regions. For more information visit https://global.rakuten.com/corp/.

Team size: 10,001+ employees
LinkedIn: Visit
Industry: Software Development

What you'll do

  • Contribute to innovative research projects focused on large-scale AI model compression and acceleration. Collaborate on developing techniques for efficient AI systems and participate in joint PhD supervision.

Ready to join Rakuten?

Take the next step in your career journey

Frequently Asked Questions

What does a EDB-IPP Project: LLM Model Compression and Acceleration do at Rakuten?

As a EDB-IPP Project: LLM Model Compression and Acceleration at Rakuten, you will: contribute to innovative research projects focused on large-scale AI model compression and acceleration. Collaborate on developing techniques for efficient AI systems and participate in joint PhD supervision..

Why join Rakuten as a EDB-IPP Project: LLM Model Compression and Acceleration?

Rakuten is a leading Software Development company.

Is the EDB-IPP Project: LLM Model Compression and Acceleration position at Rakuten remote?

The EDB-IPP Project: LLM Model Compression and Acceleration position at Rakuten is based in Singapore, Singapore. Contact the company through Clera for specific work arrangement details.

How do I apply for the EDB-IPP Project: LLM Model Compression and Acceleration position at Rakuten?

You can apply for the EDB-IPP Project: LLM Model Compression and Acceleration position at Rakuten directly through Clera. Click the "Apply Now" button above to start your application. Clera's AI-powered platform will help match your profile with this opportunity and guide you through the application process. You can also learn more about Rakuten on their website.