Menu
Gold Media Tech
Close menu Gold Media Tech
Request Free Consultation

Job: L1 Support Engineer

System Administration, HPC System, Python
Remote
Full Time
Apply for this job - or - Join our talent network

About the Company

 Our client is an AI cloud. They work with many of the top AI companies on the planet, including Poolside, Meta, Modal, Reka, and many more.

About the role:

Their Support Engineers offer top tier support to their customers and make sure our GPU infrastructure is working at peak performance.

At its core, you will have three main responsibilities:

  • Support. This will be a client facing role – you will work closely with our customers to make sure that they are able to utilize our infrastructure to achieve their goals. You will work on everything from GPU debugging, Slurm management, to build and improve our documentation.
  • Deployment. Our client will be onboarding new clusters at least monthly – you will help take bare-metal servers and deploy them for our customers as high performance compute as a service.
  • Automation. Our GPU fleet is large and growing. You will help them to automate many of their processes and systems to allow them to support their client continuing to scale.

Skills & Experience

  • Experience with System Administration or HPC systems
  • Experience with large scale workloads utilizing orchestrators like Slurm or Kubernetes.
  • Experience with automation of bare-metal machines and containers, using tools such as Ansible, Bash, or Python.
  • Strong interest in large-scale GPU systems, working with Nvidia GPUs and Infiniband networks.
  • Fast learner, adaptable, and passionate about their’s mission!