High Performance Computing (HPC) Linux Administrator
Terrestrial Energy
Interested in helping us transform thermal and electric energy?
Become a part of the leading Generation-IV nuclear plant development team.
Terrestrial Energy is developing for near-term commercial operation, a zero-emissions cogeneration plant for global industry using its proprietary Integral Molten Salt Reactor (IMSR) fission technology in an innovative, small and modular plant design. The IMSR is a non-Light Water Reactor of the Generation IV class that operates at the high temperature required for broad industrial relevance with transformative economic potential. The IMSR plant is capable of grid-based electric power generation and industrial cogeneration in many energy-intensive industries, including petrochemical and chemical synthesis for hydrogen and ammonia production. The IMSR plant offers a near 50 percent improvement in efficiency of electric power generation compared to Light Water Reactor nuclear plants. Its industrial cogeneration capability delivers to today’s markets industrial competitiveness, security of energy, and zero-emissions industrial production. The IMSR plant’s use of existing industrial materials, components, and fuels supports its near-term deployment, setting the stage for a rapid global decarbonization of the primary energy system. To execute this plan, we are now looking to add talented people to the team, each of whom will:
- improve our team by adding diverse perspectives and innovative ways of problem solving
- have demonstrated exceptional results in engineering projects
- be a team player with the ability to collaborate closely and interact with other groups
- be flexible and adaptable to change
- have skillset and experience that relate to the following role:
The HPC Linux Administrator, under the direction of the IT Infrastructure Manager, is primarily responsible in the building and maintenance of a High Performance Computing RHEL Linux environment for Terrestrial Energy's various engineering teams
As a High-Performance Computing (HPC) Linux Administrator at Terrestrial Energy, you will play a crucial role in ensuring the stability, security, and performance of our physical and virtual Linux based HPC environments. The role involves supporting and optimizing our RHEL HPC infrastructure, working with researchers or engineers to design and implement computational workflows, and troubleshooting system performance issues to ensure optimal operation. As a specialist in this role, you will be responsible for deploying, maintaining, and optimizing HPC hardware and software environments. You will collaborate with diverse teams, ensuring that the HPC systems meet the computational needs of research, simulation, or other high-performance tasks.
Other responsibilities include:
- Deploy, configure, and maintain HPC systems (RHEL), including clusters, servers, storage systems, and networking hardware
- Analyze and optimize system performance, including job scheduling, resource allocation, and load balancing
- Implement and maintain high-performance software libraries, tools, and frameworks such as MPI, OpenMP, CUDA, and/or other parallel programming environments.
- Monitor server health, resource utilization, and system logs to troubleshoot and resolve issues related to system performance, hardware/software failures and network problems
- Work closely with engineering teams to optimize computational workflows and advise on best practices for scaling simulations and computations.
- Maintain detailed documentation for systems, configurations, and workflows. Provide training to end-users to help optimize usage of HPC resources.
- Keep up-to-date with the latest industry trends, tools and technologies in high-performance computing and parallel processing
- Participate in on-call and in-office support rotations, and provide 24/7 support for critical issues.
Core Competencies
- Highly motivated and strong attention to detail
- Hands-on, problem-solving approach to managing and troubleshooting complex technical environments
- Strong understanding of programming fundamentals with a passion for coding
- Excellent verbal and written communication
- Results orientated, critical thinker, and excellent planning and organizational skills
- Excellent troubleshooting and problem-solving skills.
- Team player with the ability to collaborate and interact with internal and external clients
- Flexible and adaptable to change
- Proactive, addressing potential problems before they occur
Requirements
- Degree or diploma in Computer Science. Mathematics or Engineering or equivalency through more than 5 years systems administration in a UNIX/Linux environment or HPC environment
- Certification in Red Hat Certified Systems Administrator (RHCSA), an entry-level certification linked to the Red Hat Enterprise Linux (RHEL) system and/or Red Hat Certified Engineer (RHCE)
- 3+ years of proven, hands-on experience: Linux/UNIX systems administration preferably in a large-scale computing environment
- Proven experience managing an HPC grid, Slurm or equivalent scheduler
- Experience with programming languages commonly used in HPC such as C, C++, Python, Fortran, or similar.
- Expert knowledge of RHEL and other Linux based operating systems
- Knowledge of network protocols and interconnects (e.g., InfiniBand, Ethernet).
- Knowledge of enterprise storage systems, including parallel file systems
- Good understanding of virtualization and virtual machine technologies such as VMware vSphere and HyperV
Assets
- Hands-on experience in managing HPC workload management systems such as, Slurm, SGE, Moab/Torque or equivalent schedulers
- Experience supporting large storage infrastructure devices (SAN/NAS) and a good understanding of file systems such as ZFS and GPFS
- Knowledge of modern CPU and GPU architectures and optimization techniques for these systems
- Good understanding and experience with data management at scale, including performance, backups, archive, and monitoring
- Experience maintaining application tools and databases, MySQL, postgreSQL
- Experience with open source infrastructure systems, openLDAP, NFS, openZFS, 2FA systems
- Experience with container management and orchestration for HPC workloads (e.g., Kubernetes, Singularity).
Benefits
- Extended Healthcare Plan
- A vacation policy designed to support your work-life balance
- EAP Programs available to you and your family
- Wellness Subsidy
- Annual Performance Review
- Paid Volunteer Days – A chance to give back!
- Career development opportunities
Please submit a Resume and Cover Letter
Candidates must be legally authorized to work in Canada without the need for sponsorship for employment visa status.
Terrestrial Energy is an Equal Opportunity Employer – Minority / Female / Disability / Gender Identity / Sexual Orientation / Age. The Company encourages applications from all qualified individuals.
If you require accommodation during the application or interview process, please advise us as soon as possible so appropriate arrangements can be made. If you require technical support in a format that is accessible to you, please contact Accessibility@terrestrialenergy.com