Job Description:
Rakuten Asia, in partnership with the Economic Development Board (EDB) through the Industrial Postgraduate Programme (IPP), is seeking new PhD students. We are looking for individuals with a robust understanding of deep learning, machine learning, and natural language processing to contribute to our innovative research projects.
Essential requirements include proven hands-on expertise and strong engineering skillsets, specifically in the development and training of PyTorch models.
IPP Programme Benefits
Candidates successfully selected for this programme will receive full sponsorship for their postgraduate studies and will be hired by Rakuten Asia upon successful completion.
Collaboration Model
The collaboration will include joint PhD student supervision, shared access to computational resources for large-scale model compression experiments, and regular research exchanges. Output will include high-impact publications, open-source tools, and demonstrable prototypes of efficient AI.
Project Outline
Introduction
The rapid advancements in large-scale AI models, including Large Language Models (LLMs), Multimodal Large Language Models (MLLMs), and Diffusion Models, have unleashed unprecedented capabilities across diverse domains. However, the immense computational and memory demands of these “big models” pose significant challenges for their widespread deployment, real-time inference, and sustainable operation. To truly democratize and scale the power of modern AI, Big Model Compression and Acceleration is not just an optimization; it is a fundamental requirement.
Objectives
The collaboration aims to:
Proposed Research Areas
We propose collaboration across the following topics, with openness to refining based on shared interests:
Expore novel quantization methods (e.g. beyond 8-bit, mixed-precision, adaptive) to drastically reduce model size and accelerate computation while maintaining high accuracy. This includes investigating learned quantization schemes and robust post-training quantization, specially tailored for the unique architectures and data distributions of LLMs, MLLMs (e.g., multimodal embeddings), and Diffusion Models (e.g., generative quality).
Develop sophisticated pruning algorithms to remove redundant parameters from LLMs, MLLMs, and Diffusion Models. Focus will be on achieving high sparsity without significant accuracy or quality loss, through techniques like dynamic, magnitude-based, and Hessian-aware pruning. We will specifically consider their impact on text coherence, image fidelity, cross-modal alignment, and the preservation of emergent capabilities, ensuring high-quality output and avoiding issues like “hallucinations” or mode collapse.
Investigate novel knowledge distillation approaches to transfer knowledge from large “teacher” models (LLMs, MLLMs, Diffusion Models) to smaller, more efficient “student” models. This includes exploring various distillation objectives, multi-teacher, and progressive distillation, accounting for the nuances of language, visual, and multimodal data. Research will cover distilling reasoning from LLMs, multimodal knowledge transfer from MLLMs, and accelerating Diffusion Model sampling without quality degradation.
Explore novel methods for reducing computational costs for long text sequences (tokens) and high-resolution visual data (visual tokens/patches). This involves developing strategies for adaptive text token dropping in LLMs/MLLMs and vision token/patch pruning in MLLMs/Diffusion Models, selectively discarding less informative data. Research also includes advanced techniques like sparse attention mechanisms to reduce quadratic complexity, and token merging/condensation for compact representations. The aim is to significantly reduce FLOPs and memory footprint during inference while maintain performance, quantifying efficiency gains by reducing effective sequence length.
Focus on accelerating the sampling process for generative models (LLMs, MLLMs, Diffusion Models) without compromising output quality. This includes research into faster text decoding strategies (e.g., speculative, tree-based, parallel-decoding) for LLM/MLLM inference. For Diffusion Models, this involves developing advanced sampling techniques (e.g., novel schedules, consistency models, score distillation) to significantly reduce generation steps. We will also optimize inference pipelines for conditional generation tasks, alongside theoretical analysis of generation speed versus quality trade-offs.
Rakuten Group, Inc. (TSE: 4755) is a global technology leader in services that empower individuals, communities, businesses and society. Founded in Tokyo in 1997 as an online marketplace, Rakuten has expanded to offer services in e-commerce, fintech, digital content and communications to 2 billion members around the world. The Rakuten Group has more than 30,000 employees, and operations in 30 countries and regions. For more information visit https://global.rakuten.com/corp/.
Take the next step in your career journey