Job Description:

Rakuten Asia, in partnership with the Economic Development Board (EDB) through the Industrial Postgraduate Programme (IPP), is seeking new PhD students. We are looking for individuals with a robust understanding of deep learning, machine learning, and natural language processing to contribute to our innovative research projects.

Essential requirements include proven hands-on expertise and strong engineering skillsets, specifically in the development and training of PyTorch models.

IPP Programme Benefits
Candidates successfully selected for this programme will receive full sponsorship for their postgraduate studies and will be hired by Rakuten Asia upon successful completion.

Collaboration Model

The collaboration will include joint PhD student supervision, shared access to computational resources for large-scale model compression experiments, and regular research exchanges. Output will include high-impact publications, open-source tools, and demonstrable prototypes of efficient AI.

Project Outline

Introduction

The rapid advancements in large-scale AI models, including Large Language Models (LLMs), Multimodal Large Language Models (MLLMs), and Diffusion Models, have unleashed unprecedented capabilities across diverse domains. However, the immense computational and memory demands of these “big models” pose significant challenges for their widespread deployment, real-time inference, and sustainable operation. To truly democratize and scale the power of modern AI, Big Model Compression and Acceleration is not just an optimization; it is a fundamental requirement.

Objectives

The collaboration aims to:

Develop foundational techniques for compressing large AI models, specifically targeting LLMs, MLLMs, and Diffusion Models, to significantly reduce their parameter count and memory footprint without compromising performance.
Advance methods for accelerating the inference of these big models, enabling real-time responsiveness and high-throughput processing across various applications, from natural language understanding to high-fidelity image generation.
Prototype and validate efficient AI systems for real-world applications, demonstrating significant gains in speed, energy efficiency, and deployability for LLMs, MLLMs, and Diffusion Models.
Nurture PhD-level talent through joint supervision and research internships, fostering expertise in the deployment and scaling of efficient AI.

Proposed Research Areas

We propose collaboration across the following topics, with openness to refining based on shared interests:

Advanced Quantization Techniques for LLMs, MLLMs, and Diffusion Models:

Expore novel quantization methods (e.g. beyond 8-bit, mixed-precision, adaptive) to drastically reduce model size and accelerate computation while maintaining high accuracy. This includes investigating learned quantization schemes and robust post-training quantization, specially tailored for the unique architectures and data distributions of LLMs, MLLMs (e.g., multimodal embeddings), and Diffusion Models (e.g., generative quality).

Structured and Unstructured Pruning for Large Generative Models:

Develop sophisticated pruning algorithms to remove redundant parameters from LLMs, MLLMs, and Diffusion Models. Focus will be on achieving high sparsity without significant accuracy or quality loss, through techniques like dynamic, magnitude-based, and Hessian-aware pruning. We will specifically consider their impact on text coherence, image fidelity, cross-modal alignment, and the preservation of emergent capabilities, ensuring high-quality output and avoiding issues like “hallucinations” or mode collapse.

Efficient Knowledge Distillation for Diverse Model Modalities:

Investigate novel knowledge distillation approaches to transfer knowledge from large “teacher” models (LLMs, MLLMs, Diffusion Models) to smaller, more efficient “student” models. This includes exploring various distillation objectives, multi-teacher, and progressive distillation, accounting for the nuances of language, visual, and multimodal data. Research will cover distilling reasoning from LLMs, multimodal knowledge transfer from MLLMs, and accelerating Diffusion Model sampling without quality degradation.

Dynamic Token Pruning and Efficient Sequence Processing:

Explore novel methods for reducing computational costs for long text sequences (tokens) and high-resolution visual data (visual tokens/patches). This involves developing strategies for adaptive text token dropping in LLMs/MLLMs and vision token/patch pruning in MLLMs/Diffusion Models, selectively discarding less informative data. Research also includes advanced techniques like sparse attention mechanisms to reduce quadratic complexity, and token merging/condensation for compact representations. The aim is to significantly reduce FLOPs and memory footprint during inference while maintain performance, quantifying efficiency gains by reducing effective sequence length.

Efficient Generative Sampling and Inference Optimization:

Focus on accelerating the sampling process for generative models (LLMs, MLLMs, Diffusion Models) without compromising output quality. This includes research into faster text decoding strategies (e.g., speculative, tree-based, parallel-decoding) for LLM/MLLM inference. For Diffusion Models, this involves developing advanced sampling techniques (e.g., novel schedules, consistency models, score distillation) to significantly reduce generation steps. We will also optimize inference pipelines for conditional generation tasks, alongside theoretical analysis of generation speed versus quality trade-offs.

Summary

Location

Type

Company links

About this role

Other facts

About Rakuten

What you'll do

Ready to join Rakuten?

Frequently Asked Questions

What does a EDB-IPP Project: LLM Model Compression and Acceleration do at Rakuten?

Why join Rakuten as a EDB-IPP Project: LLM Model Compression and Acceleration?

Is the EDB-IPP Project: LLM Model Compression and Acceleration position at Rakuten remote?

How do I apply for the EDB-IPP Project: LLM Model Compression and Acceleration position at Rakuten?