MONAKA

About FUJITSU-MONAKA:

The HPC AI team at FRIPL is dedicated to advancing capabilities of AI and high-performance computing (HPC) for the next-generation ARM architecture-based 2nm CPU, FUJITSU-MONAKA. This processor, set to be released in 2027, is designed for use in data centers with a focus on AI-driven applications such as machine learning, deep learning, real-time big data processing, and large language models (LLMs).
Our team is at the forefront of optimizing AI frameworks to fully harness the power of FUJITSU-MONAKA, ensuring state-of-the-art performance and energy efficiency. We are deeply committed to sustainability, contributing to the realization of a carbon-neutral society through innovative, power-efficient computing solutions.

In collaboration with the open-source software community, we actively contribute to the development of AI-accelerated software stacks tailored for data-intensive workloads. This collaboration ensures that FUJITSU-MONAKA meets the demands of modern computing and aligns with Fujitsu’s vision of creating a more sustainable world through trust and innovation.

The HPC ML team at FRIPL is committed to pushing boundaries of computational innovation with ML algorithms. We focus on AI/ML framework engineering with advanced optimization for high-performance applications on FUJITSU-MONAKA. Our work spans a variety of cutting-edge technologies and methods to ensure that AI/ML framework engineering is efficient and scalable.

Our teams:

Our team’s expertise spans several key areas

Computational Data Science: Our research enhances computational performance of foundational frameworks, e.g., Scikit-Learn, XGBoost, Statsmodels (Time Series), OpenBLAS etc., to maximize efficiency for high-performance data science with large-scale data processing and predictive modelling.

Advanced Numerical Libraries: Optimizing complex computations with Scalable Vector Extensions (SVE), tailored for advanced ML applications ensuring high efficiency and reliability as required by modern AI models.

Scalability and Multithreading: Building scalable AI/ML software with efficient parallelism, reducing synchronization overhead, and enhancing workload distribution across multiple cores and multi-nodes. Our innovations with OpenMP, pthreads, TBB, etc. enable AI models to scale with high performance as data volumes and model complexities increase.

Key OSS Contributions & Optimizations

PR#2614:
Enabled oneDAL and Scikit Learn Intelex on Arm using OpenBLAS as OSS reference backend for performance acceleration of ML workloads. This was one of the first contributions to UXL Foundation.
PR#4741:
Contributed on Scalable AI with enhancing Core Utilization in OpenBLAS as Math Computational Library for AI/ML frameworks with Pthreads & OpenMP threading backend to improve workload distribution across multi-core systems.
PR#2917:
Scalable Vector Extension (SVE) based tuning optimization enhances numerical computing performance for AI/ML models ensuring to fully leverage vectorized computations on Arm-based FUJITSU-MONAKA.
PR#2807:
Development of SPBLAS (Sparse BLAS) and VSL (Vector Statistical Library) kernels to optimize sparse & vectorized numerical operations, accelerating core ML computations.
PR#5091:
Small GEMM kernel tuning boosts the efficiency of matrix multiplications, improving execution times for HPC and AI/ML workloads.

A. Deep Learning Team

As artificial intelligence (AI) continues to drive innovation across various sectors, the role of deep learning has emerged as a cornerstone in this transformative era. HPC Deep Learning Team at FRIPL is at the forefront of technological innovation, tasked with the critical mission of advancing the capabilities and performance of deep learning frameworks for FUJITSU – MONAKA. Our ongoing research aims to accelerate the inference & fine-tuning performance of deep learning models on CPUs by efficient AI framework engineering.

Our team’s expertise spans several key areas:

Foundational Frameworks:
Our research enhances computational performance of AI frameworks such as PyTorch, TensorFlow, JAX, ONNX etc. to maximize the performance for various Deep Learning applications at scale serving to solve complex problems around the globe.
Advanced Compute Libraries:
Optimizing complex computations tailored for advanced DL Algorithms and models by leveraging advanced Arm supported Single Instruction Multiple Data (SIMD) such as Scalable Vector Extensions (SVE), ensuring high accuracy and reliability with enhanced performance.
Scalability and Multithreading:
Building scalable AI Software with efficient parallelism, reducing synchronization overhead, and enhancing workload distribution across multiple cores and multi-node. Our focus is to work with OpenMP, TBB threading backends to enhance the scalability of DL workloads for high number of CPU Cores.

Key OSS Contributions & Optimizations:

PR# 1818 :
Enabled BRGEMM based MatMul JIT kernels in oneDNN accelerating Deep Learning & Generative AI Workloads on Arm CPUs
PR# 119571:
Extending the PyTorch vec backend for SVE (ARM). Initially only NEON vec backend was available for Arm CPUs, with this work latest SIMD has been implemented in PyTorch.

B. Data Platform Team

Data Platform team primarily focuses on enabling and optimizing various databases and distributed framework. Due to the fast development of open-source databases and its growing commercial popularity, our team focusses on various categories of data platforms like Relational databases, vector databases, NoSQL databases, graph databases, data lakes, large-scale real-time data processing & analytics engine, etc. to optimize for FUJITSU-MONAKA. Some of our major focus areas are described in the subsequent sections.

Our team’s expertise spans several key areas:

Relational database:
PostgreSQL is one of the most popular open-source relational databases. Tuning PostgreSQL for FUJITSU-MONAKA and benchmark using the global benchmark setups is the key focus of our team.
Vector Database:
A vector database stores and manages vector embeddings for efficient search and retrieval, vastly used in RAG and LLM applications. Milvus is an open-source vector database, created for similarity searches on large datasets of high dimensional vectors. Our team focuses on efficient utilization of SVE to optimize Milvus for FUJITSU-MONAKA.
NoSQL Databases:
The ideal database for managing erratic, unconnected, or quickly changing data is a NoSQL database. MongoDB is one of the most popular NoSQL Databases. Our team focuses on optimizing MongoDB for FUJITSU-MONAKA.
Large scale data processing:
Apache Spark is one of the most popular SQL analytic engines for large-scale real-time data processing. It also supports various ML algorithms for training and inferencing. Our focus is to fine tune software stack for Spark to get better performance, which in turn would help to reduce carbon footprint for processing large data.

C. LLM Research Team

Harnessing the power of “Generative AI”, LLM Research Team at FRIPL is driven by a vision to push boundaries & accelerate the inference of Large Language Models on Arm based CPUs & FUJITSU – MONAKA. The team works on the latest cutting-edge technologies such as Quantization, Pruning & Knowledge distillation to optimize the large models and make the inference possible on CPUs.

Our team’s expertise spans several key areas:

Foundational Frameworks: Our focus is on optimizing & enhancing the foundational frameworks utilized for LLMs such as Llama.cpp, PyTorch, JAX, vLLM etc. to accelerate inference performance over CPUs.
Advanced Accelerator toolkits: Optimizing LLMs is essential to meet the compute and memory demand when inferring on CPUs. Our team focus is to work with software libraries such as OpenVINO which provides flexibility of Neural network compression and Quantization for faster inference performance.
Scalability and Multithreading: Building scalable AI Software for utilization of multiple cores & maintaining high performance. Our team works with threading backend such as OpenMP, pthreads etc. to enhance scalability for LLM workloads over Arm CPUs.

Key OSS Contributions & Optimizations:

PR# 7433:
Implement of SVE SIMD in dot product kernels for 8 bit & 4 bit quantized LLM inference in Llama.cpp FW.
PR# 7606:
Support OpenMP for multi-thread processing in Llama.cpp FW. Earlier only Pthreads were utilized, OpenMP integration boosted both performance & scalability on larger cores.
PR# 9288:
vLLM Arm Enablement for CPUs. vLLM is the latest innovation in LLM serving providing continuous batching and Paged Attention. This works bring the support of this SW to Arm CPUs.

D. Confidential Computing team

Our team at FRIPL is dedicated to advancing secure and private computing with ARM Confidential Computing technologies where processors serve as the root of trust. It guarantees that all data and computations are securely processed. We specialize in building CCA framework SDKs to create confidential AI solutions for FUJITSU-MONAKA, ensuring the protection of sensitive data in untrusted environments. Our team’s expertise spans several key areas:

Confidential Computing:
Evaluating ARM CCA for FUJITSU-MONAKA which conceptually works by encrypting each VM's memory with a unique key generated by the processor's hardware. This technology isolates user VMs not only from each other but also from cloud operators, ensuring that user data and computation remain secure, even if the cloud infrastructure is compromised.
Confidential AI:
Our team pioneers the integration of confidential computing with AI, ensuring sensitive data and model used in AI training and inference to stay within secure environments. We work on secure AI frameworks that support multi-party computation, federated learning and secured model deployment with ARM CCA.
ARM CCA Platform:
We are developing a comprehensive SDK for ARM CCA, to build, deploy, and manage confidential applications on FUJITSU-MONAKA. This SDK will provide tools and APIs for creating secure applications, managing attestation processes, and integrating confidential AI capabilities.

Meet our leadership

Dr. Priyanka Sharma

With a work experience of around 25 years in Technology Leadership that spans Industry, Academia, and Research, Dr Priyanka Sharma, specializes in leading AI-enabled system design, development, and deployment using core technologies involving high-performance computing, AI and systems engineering.

Dr Priyanka is currently the Director of Software Engineering & Business head of MONAKA R&D Unit, focusing on AI-HPC convergence for next-generation energy-efficient computing for 2nm Arm chip, Arm based AI frameworks acceleration for building next-gen green data-centers/supercomputers. She also represents Fujitsu as one of the Founding Steering Committee Members of Linux Foundation's Unified Accelerator (UXL) Foundation and works with global tech leaders for actively driving open source accelerator enhancements. She is also the Governing Board member of IWA under Indian National Science Academy.

With over 50 research papers published in SCI and Scopus-indexed International Journals and conferences, she has mentored several solution architects and AI startup ecosystem in building AI based technology stack, strategic planning and product branding. Priyanka has a passion for travel and has collaborated across the globe in the field of R&D and Data analytics. She is an avid reader and enjoys writing about Science, Spirituality and AI!