Compute Infrastructure Engineer - AION AI Cloud Platform needed in Bengaluru, Karnataka, India

Compute Infrastructure Engineer - AION AI Cloud Platform

Company: AION

Job Location: Bengaluru, Karnataka, India

Job Type: FULL_TIME - (HYBRID)

Date Posted: April 06, 2025

External

About AION

AION is building the next generation of AI cloud platform by transforming the future of high-performance computing (HPC) through its decentralized AI cloud. Purpose-built for bare-metal performance, AION democratizes access to compute power for AI training, fine-tuning, inference, data labeling, and beyond.

By leveraging underutilized resources such as idle GPUs and data centers, AION provides a scalable, cost-effective, and sustainable solution tailored for developers, researchers, and enterprises. The platform's innovative Proof of Compute Contribution (PoCC) protocol rewards contributors based on performance, creating a transparent and efficient ecosystem.

Integrated with Tether (USD₮ & USD₮0) for stability and regulatory clarity, AION eliminates volatility, ensuring predictable costs and seamless transactions. With cutting-edge partnerships and a USD-backed economy, AION is pioneering the commoditization of high-performance compute, empowering global innovation and bridging the AI wealth gap.

Led by high-pedigree founders with previous exits, AION is well-funded by major VCs with strategic global partnerships. Headquartered in the US with global presence, the company is building its initial core team in India.

Who you are

You are a visionary infrastructure architect passionate about democratizing AI compute at global scale. You thrive on solving complex technical challenges that create elegant, accessible systems from intricate infrastructure. With deep expertise in secure multi-tenancy environments, you understand how to design and implement comprehensive isolation guarantees across hardware, network, and storage layers for both VM and container workloads.

You're excited to join an ambitious AI infrastructure startup at the ground floor, where your work will directly unlock siloed compute resources and remove barriers limiting AI advancement. You have the technical depth to architect platform systems that seamlessly connect compute providers with AI engineers while maintaining robust security foundations that scale to serve diverse client requirements and compliance needs.

You're motivated by the opportunity to build something transformative—creating the infrastructure that will make high-performance compute more accessible, affordable, and user-friendly for the next generation of AI innovation.

Technical Skills & Experience

6-10 years of experience in infrastructure engineering with containerization and virtualization (exceptional candidates with different experience profiles will be considered)
Platform Engineering expertise designing self-service infrastructure platforms, creating robust abstractions, and building intuitive provider/user onboarding experiences
Advanced Kubernetes expertise including custom controllers, operators, and API extensions
HPC Systems knowledge with experience in SLURM, job schedulers, and MPI-based workloads
GPU Infrastructure expertise including NVIDIA ecosystem, NCCL tuning, and distributed training optimization
Networking proficiency with CNI plugins, RDMA, SR-IOV, and performance tuning for low latency
API design experience creating coherent, intuitive interfaces for infrastructure consumption
Security experience in multi-tenant isolation, zero-trust networking, and container security
Storage systems knowledge including parallel file systems, CSI drivers, and performance tuning
Programming skills in Go, Python, or Rust; experience with infrastructure-as-code tools like Terraform
Distributed systems experience including fault tolerance design and recovery strategies

Key Responsibilities

Platform Engineering Excellence: Design and implement intuitive, self-service interfaces that enable frictionless onboarding for both compute providers and end-users
Provider Experience: Create automated systems for compute providers to easily integrate their hardware, with clear abstractions that hide underlying complexity
User Experience: Build streamlined interfaces that enable AI engineers to deploy and manage workloads without infrastructure expertise
Dynamic Infrastructure Management: Design systems that seamlessly migrate workloads when underlying hardware becomes unavailable
Multi-tenant Platform Architecture: Build secure isolation mechanisms for running diverse customer workloads on shared infrastructure
Infrastructure as Code: Develop comprehensive APIs and declarative interfaces for infrastructure provisioning
Performance Optimization: Implement networking and compute configurations that maximize performance for distributed AI training
Resource Orchestration: Create abstractions that normalize heterogeneous hardware from different providers into a unified compute platform
Kubernetes & HPC Integration: Design architectures that bridge container-based and traditional HPC environments

Location

Individuals in this role are expected to relocate to Bangalore, though exceptions can be made. We offer a hybrid working setup with 3 days in-office setup. Employees would have flexibility to work from anywhere for a few months during a year.

Why Join AION

Be part of a mission-driven team at the intersection of web3 and AI, tackling some of the most exciting challenges in the industry.
Join the ground floor of an AI startup, with the opportunity to make a significant impact on the company and the industry.
Collaborate with top-tier talent from the tech industry.
Competitive salary and benefits package.
Flexible work environment with opportunities for professional growth and development.

This role offers a unique opportunity for top-tier infrastructure engineers who want to solve some of the most challenging problems in AI compute and make a significant industry impact.

Your time is limited, so don’t waste it living someone else’s life.