Senior DevOps Engineer
Company: Quality Control Specialist - Pest Control
Location: Santa Clara
Posted on: June 2, 2025
|
|
Job Description:
NVIDIA is seeking a passionate, motivated and technical
Architect/Engineer to join its dynamic and fast-paced
Infrastructure, Planning and Processes organization where you will
be working as a Principal DevOps & SRE Engineer to support the
design and implementation of AI tools solutions on Kubernetes for
the company's Cloud Platform. The position will be part of a
fast-paced crew that develops and maintains sophisticated build &
test environments for a multitude of hardware platforms both NVIDIA
GPUs and Tegra Processors along with various operating systems
(Windows/Linux/Android). The team works with various other business
units within NVIDIA Software such as Graphics Processors, Mobile
Processors, Deep Learning, Artificial Intelligence, Robotics and
Autonomous cars to cater to their infrastructure & system's
needs.What you'll be doing:Craft the overall architecture for
integrating coding assistance & Trustworthy AI tools into the
existing infrastructure, ensuring alignment with reliable,
scalable, and secure standard methodologies. Design for
scalability, ensuring the implementation can support current and
future workloads without degrading system performanceIdentify and
automate repetitive or toilsome production tasks related to code
deployment, validation, and review, leveraging coding assistance
tools to improve operational efficiencyImplement robust monitoring
and observability for coding assistance/Trustworthy AI tools &
application services, ensuring their availability and performance
within the production environmentIntegrate security best practices
throughout the development lifecycle, ensuring coding assistance
tools do not introduce vulnerabilities or compliance
risksCollaborate closely with software engineers, product teams and
security teams to align the coding assistance/Trustworthy AI tool's
capabilities with organizational goals and developer needs.
Establish feedback mechanisms to gather insights from developers,
product/engineering teams on the effectiveness of coding
assistance/Trustworthy AI tools, iterating on integrations and
configurations for continuous improvementMaintain comprehensive
documentation for architecture decisions, integration processes,
operational runbooks, and troubleshooting guidesWhat we need to
see:Kubernetes domain expertise with extensive experience building
scalable, resilient platforms in both public and private cloud
capable of providing platform engineering / architecture standard
methodologies (including experience with architecting and
implementing the overall platform, administration & configuration,
orchestration, security, and monitoring ecosystem). Experience of
maintaining cloud infrastructure (On-prem & CSP) and highly
available production environment.Strong Programming background in
python and/or similar scripting languages. Excellent problem
solving, communication, and teamwork skillsStrong understanding of
architectural requirements and development processes involved in
building reliable, robust, scalable data products and
pipelines.Demonstrating the ability to automate processes using
Continuous Integration /Continuous Delivery (CI/CD) tools.
Proficient in using Configuration as Code, infrastructure-as-code
tools such as ansible, puppet, chef & terraform. Strong background
with Gitlab, GitHub, Perforce, Jenkins and/or other CI/CD systems &
Artifactory.Experienced with data analytics/visualization &
monitoring tools like Kibana, Grafana, Splunk, Zabbix, Prometheus
and/or similar systems etc. Experience in Databases both SQL
(MySQL) and NoSQL (Elastic Search /MongoDB/Cassandra).10+ years of
proven experience with Bachelor's or Master's degree in computer
science, Software Engineering, or equivalent experienceWays to
stand out from the crowd:Solid understanding of containerization
and microservices architecture. Certified Kubernetes Administrator
(CKA), Certified Kubernetes Security Specialist (CKS) & Certified
Kubernetes Application Developer (CKAD) preferred.Prior experience
on implementation and management of Trustworthy AI tools (QuantPi,
Credo AI, Armilla AI) , Coding Assistance AI tools (Cursor,
Sourcegraph Cody) & code review AI tools (CodeRabbit)Thrives in a
multi-tasking environment with constantly evolving
priorities.Ability to analyze complex problems into simple sub
problems and then reuse available solutions to implement most of
those. Ability to design simple systems that can work efficiently
without needing much support.Prior experience with large scale
operations team. Experience with using and improving data centers.
Background with computer algorithms and ability to choose the best
possible algorithms to meet the scaling challenge.With competitive
salaries and a generous benefits package, we are widely considered
to be one of the technology world's most desirable employers. We
have some of the most forward-thinking and hardworking people in
the world working for us and, due to outstanding growth, our
exclusive engineering teams are rapidly growing. If you're a
creative and autonomous engineer with a real passion for
technology, we want to hear from you.The base salary range is
168,000 USD - 333,500 USD. Your base salary will be determined
based on your location, experience, and the pay of employees in
similar positions.NVIDIA is seeking a passionate, motivated and
technical Architect/Engineer to join its dynamic and fast-paced
Infrastructure, Planning and Processes organization where you will
be working as a Principal DevOps & SRE Engineer to support the
design and implementation of AI tools solutions on Kubernetes for
the company's Cloud Platform. The position will be part of a
fast-paced crew that develops and maintains sophisticated build &
test environments for a multitude of hardware platforms both NVIDIA
GPUs and Tegra Processors along with various operating systems
(Windows/Linux/Android). The team works with various other business
units within NVIDIA Software such as Graphics Processors, Mobile
Processors, Deep Learning, Artificial Intelligence, Robotics and
Autonomous cars to cater to their infrastructure & system's
needs.What you'll be doing:
#J-18808-Ljbffr
Keywords: Quality Control Specialist - Pest Control, Fairfield , Senior DevOps Engineer, Engineering , Santa Clara, California
Click
here to apply!
|