AI Infrastructure Architect

microTECH Global Limited

Biggar, United Kingdom

2 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Job location

Biggar, United Kingdom

Tech stack

Artificial Intelligence

Batch Processing

C++

Cloud Computing

Profiling

Python

Systems Architecture

Data Logging

Network Switches

Scripting (Bash/Python/Go/Ruby)

Large Language Models

Kubernetes

Operational Systems

Serverless Computing

Job description

Check you match the skill requirements for this role, as well as associated experience, then apply with your CV below.

Design a unified AI Infra & Serving architecture platform for composite AI workloads such as LLM Training & Inference, RLHF, Agent, and Multimodal processing. This platform will integrate inference, orchestration, and state management, defining the technical evolution path for Serverless AI + Agentic Serving

Design a heterogeneous execution framework across CPU/GPU/NPU for agent memory, tool invocation, and long-running multi-turn conversations and tasks. Build an efficient memory/KV-cache/vector store/logging and state-management subsystem to support agent retrieval, planning, and persistent memory.

Build a high-performance Runtime/Framework that defines the next-generation Serverless AI foundation through elastic scaling, cold start optimization, batch processing, function-based inference, request orchestration, dynamic decoupled deployment, and other features to support performance scenarios such as multiple models, multi-tenancy, and high concurrency.

Requirements

Strong foundational knowledge in system architecture, or computer architecture, operating systems, and runtime environments;
Hands-on experience with Serverless architectures and cloud-native optimization technologies such as containers, Kubernetes, service orchestration, and autoscaling
vLLM, SGLang, Ray Serve, etc.); understand common xwwtmva optimization concepts such as continuous batching, KV-Cache reuse, parallelism, and compression/quantization/distillation
Proficient in using Profiling/Tracing tools; experienced in analyzing and optimizing system-level bottlenecks regarding GPU utilization, memory/bandwidth, Interconnect Fabric, and network/storage paths
Proficient in at least one system-level language (e.g., C/C++, Go, Rust) and one scripting language (e.g., Python)