Systems Engineer / DevOps Engineer

* Cdi

Paris, France

3 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

Paris, France

Tech stack

Agile Methodologies

Artificial Intelligence

Amazon Web Services (AWS)

Apache HTTP Server

Systems Engineering

Azure

Ubuntu (Operating System)

CentOS

Cloud Computing

Continuous Integration

Linux

DevOps

InfiniBand

Python

Linux Distribution

Machine Learning

Nginx

Node.js

Openshift

Red Hat Enterprise Linux - RHEL

Ansible

Prometheus

Simple Object Access Protocol (SOAP)

Systems Architecture

Web Services

CircleCI

Data Logging

Google Cloud Platform

Enterprise Software Applications

Istio

System Availability

Grafana

Gitlab-ci

Kubernetes

Information Technology

Deployment Automation

Operational Systems

Web Technologies

REST

Terraform

Api Management

Docker

ELK

Jenkins

Microservices

Job description

We are seeking a skilled and forward-thinking Systems Engineer with deep expertise in Linux-based operating systems, RedHat OpenShift, Kubernetes and cloud-native web services. The ideal candidate will have hands-on experience managing large-scale, GPU-accelerated environments and a strong grasp of DevOps practices. You will play a pivotal role in deploying and maintaining AI/ML infrastructure, ensuring high availability, performance, and security across OpenShift clusters, containerized workloads, and high-throughput networking fabrics such as RoCE and InfiniBand. Your contributions will directly support the scalability and reliability of our AI and data-driven platforms., Manage, configure, and optimize Linux/Unix-based Operating Systems to support enterprise applications and services. Design, deploy, and maintain Kubernetes clusters in production, ensuring reliability, scalability, and security. Design and manage RedHat OpenShift clusters with a focus on integrating AI/ML workflows, leveraging platforms such as OpenShift AI, Mistral AI, ClearML, and ZenML. Architect and operate OpenShift environments optimized for GPU workloads, leveraging NVIDIA Enterprise, RUN:AI, and related orchestration tools to enable efficient resource allocation and accelerate AI/ML model training and inference at scale. Implement, monitor, and troubleshoot Web Services (REST, SOAP, microservices) ensuring high availability and performance. Collaborate with development teams to automate CI/CD pipelines using tools like Jenkins, GitLab CI, or similar. Monitor system health and performance metrics; proactively address issues to minimize downtime. Implement security best practices for OS, container orchestration, and web services. Manage container lifecycle, including image creation, registry management, and deployment automation. Work with cloud platforms (AWS, Azure, GCP) to deploy and maintain infrastructure components. Provide support for incident management and root cause analysis. Maintain infrastructure as code using tools like Terraform, Ansible, or similar. Collaborate with cross-functional teams to enhance overall system architecture and deployment workflows. Document system configurations, procedures, and best practices.

Requirements

Strong experience with Operating Systems (Linux distributions such as Ubuntu, CentOS, RedHat, or similar). Hands-on experience with RedHat OpenShift administration, including cluster setup, networking, ingress, storage, and security. Experience with GPU resource management in OpenShift, including configuration, scheduling, and monitoring using NVIDIA Enterprise Suite, RUN:AI, and related tools to support high-performance AI/ML workloads. Good understanding of Web Services architecture, API management, and common protocols (HTTP, REST, SOAP). Experience with containerization tools like Docker. Familiarity with CI/CD pipelines and DevOps tooling (e.g., Jenkins, GitLab CI/CD, CircleCI). Basic scripting and automation skills using Shell, Python, or similar languages. Knowledge of monitoring and logging tools (Prometheus, Grafana, ELK stack, etc.). Understanding of cloud infrastructure (AWS, Azure, or GCP) and infrastructure as code. Ability to troubleshoot and resolve complex infrastructure and deployment issues. Strong collaboration, communication, and documentation skills.

Preferred (Bonus) Skills

Experience with web technologies (Node.js, Nginx, Apache, Traefik or similar). Familiarity with AI and Machine Learning frameworks and platforms (RedHat OpenShift AI, Mistral AI, ZenML, ClearML). Familiarity with DevOps practices and tools like Helm, Istio, or other service mesh solutions. Knowledge of security standards, SSL/TLS, and compliance frameworks. Experience working in Agile and CI/CD environments. Certification in Kubernetes (CKA/CKAD), Linux (RHCE), or Cloud platforms., Bachelor's degree in Computer Science, Information Technology, or a related field (or equivalent experience).