Benchmarking and System Validation Software Engineer Job at Enfabrica, Raleigh, NC

YU1TVkduY0xUT09YbjN6TEQxL083SUQxWkE9PQ==
  • Enfabrica
  • Raleigh, NC

Job Description

Summary

We are seeking a talented Software engineer to join our Durham, North Carolina team focused on Benchmarking, System Validation and Test Automation for large-scale distributed systems.  In this role, you will be involved with writing applications to benchmark next-generation computing infrastructure at performance and scale with real-world Machine Learning workloads along with building system topologies to validate our customer use cases.

 

Roles and Responsibilities:

  • Model and Benchmark large scale Machine Learning workloads 
  • Characterize performance of distributed deep learning applications with data and model parallelism, and model sharding across devices and memories 
  • Write applications, libraries and kernel modules that stress I/O technology capabilities including those that stress NCCL and CUDA GPU technology
  • Develop low-level SW applications to test I/O performance of next-gen compute systems
  • Validate customer use cases using our technology, and assist with such deployments
  • Implement broad System and Solution Level testing
  • Create White Papers that showcase Data Center I/O technology

Desired Knowledge and Skill Set:

  • Hands on experience with ML Collective Communication and CUDA programming
  • Hands on experience with ML frameworks such as PyTorch and TensorFlow
  • Familiarity with standard Machine Learning workload benchmarks for Training and Inference
  • Strong coding skills in multiple languages such as Python, C and C++
  • Background in low-level I/O performance analysis of networking and server systems 
  • Good knowledge of TCP/IP and performance of other networking protocols 
  • Detailed understanding of server components and applicable drivers for CPUs, memory, GPUs, networking devices and storage
  • Experience validating large scale, Data Center networking and server solutions
  • Working knowledge of high performance communication technologies like MPI, Infiniband, RDMA, GPU-Direct and NVLink is desirable
  • Linux systems knowledge
  • 5+ years of software development experience working closely with hardware

This role will require employee to be on-site in the Raleigh, North Carolina office. No hybrid work option.

About Us 

Enfabrica is on a mission to revolutionize AI compute systems and infrastructure at scale through the  development of superior-scaling networking silicon and software which we call the Accelerated Compute Fabric. Founded and led by an executive team assembled from first-class semiconductor and distributed systems/software companies throughout the industry, Enfabrica sets themselves apart from other startups with a very strong engineering pedigree, a proven track record of delivering, deploying and scaling products in data center production environments, and significant investor support for our ambitious journey! Together, with their differentiated approach to solving the I/O bottlenecks in distributed AI and accelerated compute clusters, Enfabrica is unleashing the revolution in next-gen computing fabrics.

Job Tags

Full time,

Similar Jobs

FESCO, Ltd

PRODUCTION TESTER Job at FESCO, Ltd

# Descripion The Production Tester will conduct full scale back pressure tests and well flowback operations. They must have the ability to rig up/rig down various types of related equipment. They are to do routine field calculations and are expected to present neat and... 

Closets by Design Houston South

Carpenter/Installer Job at Closets by Design Houston South

Carpenter/InstallerClosets by Design is a nationally recognized leader in home organizing systems. We design, manufacture, and install a complete line of custom closets, home office furniture, pantries, garage cabinetry, media room systems, wall-beds, and more. We employ... 

American Airlines

Engineer/Sr Engineer, IT Infrastructure Job at American Airlines

 ...Intro Are you ready to explore a world of possibilities, both at work and during your time off? Join our American Airlines family, and youll travel the world, grow your expertise and become the best version of you. As you embark on a new journey, youll tackle challenges... 

Red Rocket Logistics LLC

Remote Proofreader Job at Red Rocket Logistics LLC

 ...As a Proofreader you will be responsible for reviewing and editing various documents including marketing materials reports and contracts...  ...purchase plan(ESPP) Paid Time Off and Holidays. These take place after working for the company after 30 days. Remote Work : No... 

DiscoverU Health

Healthcare Content Writer Intern Remote Job at DiscoverU Health

 ...Healthcare Content Writer Intern (Remote) Recruiting NOW for a 6month unpaid internship (20 hours/week). About the Role: Join...  ...channels. Attention to detail and ability to edit and proofread effectively. Preferred experience in B2B messaging SEO and healthcare...