Principal GPU/NPU Software Architect
Role details
Job location
Tech stack
Job description
Architect the Co-Processing Driver Stack: Design and implement the low-level driver, compiler, and runtime for a tightly integrated GPU/NPU architecture. Enable efficient memory sharing, low-latency synchronization, and collaborative execution of graphics and neural workloads.
Pioneer Neural Rendering Infrastructure: Develop the core driver and API extensions to support emerging neural rendering techniques, such as neural radiance fields (NeRFs), neural texture compression, AI-based denoising, and AI-powered upscaling within the rendering pipeline.
Drive Hardware-Software Co-Design: Work directly with hardware engineers to influence the design of future GPU/NPU architectures, ensuring the ISA, memory hierarchy, and interconnects are optimized for next-generation graphics and AI workloads.
Performance Analysis & Optimization: Attain unparalleled performance by deep-diving into the hardware pipeline. Identify and eliminate bottlenecks in the interaction between graphics shaders and neural network execution.
Define the Programming Model: Create and document the programming models, APIs, and developer tools that will allow internal and external graphics engineers to leverage the combined power of the GPU and NPU effectively.
This job description is only an outline of the tasks, responsibilities and outcomes required of the role. The jobholder will carry out any other duties as may be reasonably required by his/her line manager. The job description and personal specification may be reviewed on an ongoing basis in accordance with the changing needs of Huawei Research and Development UK Limited., Title: Forward Deployed Engineer - AI & Customer-Facing Technical Leader Location: Remote Travel: Customer visits 4-5 days/month; occasional team gatherings Salary: Senior: €130-150k base + growth trajectory Principal: €160-200k base + package discussion Tech: Python,..., Senior Principal System Architect - Principal Consultant | Software Engineering, AI & CV/ML, Semiconductors & ElectronicsEuropean Tech Recruit is working closely with a multinational semiconductor company based in Cambridge, looking for a talented Senior Principal System..., Senior Rust Engineer (High-Frequency Trading) Contract Length: 6 months (with potential extension) Location: London (4 days per week onsite) Day Rate: Up to £800 (Outside IR35) We're looking for an experienced Senior Rust Engineer to take ownership of a greenfield..., This is a key position as a senior architect - to work on the latest developments in CPU architecture - for improvements in processing power, performance and optimisation. Salary available is dependent on experience: circa £100k and above Main Responsibilities -..., Senior C++/Rust Software Engineer Top of the market salaries - Hybrid Working - Cambridge A Senior C++ Software Engineer is required to join an exciting technology firm with the opportunity to design and implement high-performance software. You'll also collaborate with..., The role You will be part of a diverse and distributed team of engineers who maintain and develop our GPU compiler software, supporting a range of graphics and compute APIs while targeting multiple GPU generations with varying ISAs. The GPU compiler is a central part of...
Requirements
Master's or PhD in Computer Science, Electrical Engineering, or a related field, or equivalent practical experience.
10+ years of proven, deep experience in developing low-level GPU drivers, runtime, or compiler technology for a major mobile GPU architecture.
Expert-level knowledge of modern graphics APIs (Vulkan, DirectX 12) and their compute shader pipelines.
Fluency in C/C++ and a strong understanding of computer architecture.
Must have a proven track record in co-processor design (e.g., GPU/CPU, GPU/DPU) or developing drivers for heterogeneous systems.
Desired
Direct experience with NPU/AI accelerator architecture or driver development. You understand the nuances of mapping neural networks efficiently onto tensor cores.
Hands-on experience implementing or optimizing neural rendering technologies (e.g., NeRF, DLSS/FSR, neural graphics primitives).
Deep understanding of the ML compiler stack (e.g., MLIR, LLVM).
Experience with hardware virtualization (SR-IOV) for GPU/NPU resources.