The LLM Operations Engineer serves as the DevOps specialist within our AI team, focusing on
managing the operational aspects of our AI platform, particularly our Large Language Models
(LLMs) that power Our client's One Intelligence. You will build and maintain robust LLM
workflows, implement monitoring systems, integrate feedback loops, and optimize the
...
performance of our AI solutions. You'll work closely with AI Engineers and Product Owners to
ensure our AI systems are reliable, secure, observable, and continuously improving.
Advantages
Preferred Qualifications
Experience with prompt engineering and testing tools like Promptfoo
Familiarity with vector databases and retrieval-augmented generation (RAG) systems
Knowledge of serverless architectures and event-driven systems
Experience with AWS Guardrails for LLM security
Background in data engineering or machine learning operations
Understanding of financial systems and data security requirements in the finance industry
Familiarity with implementing technical solutions to meet compliance requirements
outlined in SOC2, ISAE 3402, and ISO 27001
Responsibilities
What You Will Do
Design, implement, and maintain LLM operations workflows using tools like Langfuse to
monitor performance, track usage, and create feedback loops for continuous improvement
Develop and maintain infrastructure-as-code for AI deployments using Terraform and
AWS services (Lambda, SQS, API Gateway, OpenSearch, CloudWatch)
Build and enhance monitoring, logging, and alerting systems to ensure optimal
performance and reliability of our LLM infrastructure
Collaborate with AI engineers to design and implement evaluation frameworks (including
LLM-as-judge systems) to measure and improve model performance
Manage prompt versioning, testing, and deployment pipelines through Concourse CI/CD
and custom tooling
Implement and maintain security guardrails for LLM interactions, ensuring compliance
with best practices
Create comprehensive documentation for LLM operations, including runbooks for
production incidents
Participate in on-call rotations to support mission-critical AI systems
Drive innovation in LLM operations by researching and implementing best practices and
emerging tools in the rapidly evolving GenAI space
Qualifications
Required Qualifications
3+ years of experience in DevOps, SRE, or similar roles, with at least 1 year specifically
working with LLMs or AI systems in production
Strong hands-on experience with AWS cloud services, particularly Bedrock, Lambda,
SQS, API Gateway, OpenSearch, and CloudWatch
Experience with infrastructure-as-code using Terraform, CloudFormation, or similar tools
Proficiency in Python and experience building automation tooling and pipelines
Familiarity with LangOps platforms such as Langfuse for LLM observability and
evaluation
Experience with CI/CD pipelines using Concourse or similar tools
Knowledge of logging, monitoring, and alerting systems
Understanding of security best practices for AI systems, including prompt injection
mitigation techniques
Excellent troubleshooting and problem-solving skills
Strong communication skills and ability to work effectively with cross-functional teams
Must be legally entitled to work in the country where the role is located
Summary
To succeed in this role, you will need a combination of experience, technology skills, personal
qualities, and education.
Randstad Canada is committed to fostering a workforce reflective of all peoples of Canada. As a result, we are committed to developing and implementing strategies to increase the equity, diversity and inclusion within the workplace by examining our internal policies, practices, and systems throughout the entire lifecycle of our workforce, including its recruitment, retention and advancement for all employees. In addition to our deep commitment to respecting human rights, we are dedicated to positive actions to affect change to ensure everyone has full participation in the workforce free from any barriers, systemic or otherwise, especially equity-seeking groups who are usually underrepresented in Canada's workforce, including those who identify as women or non-binary/gender non-conforming; Indigenous or Aboriginal Peoples; persons with disabilities (visible or invisible) and; members of visible minorities, racialized groups and the LGBTQ2+ community.
Randstad Canada is committed to creating and maintaining an inclusive and accessible workplace for all its candidates and employees by supporting their accessibility and accommodation needs throughout the employment lifecycle. We ask that all job applications please identify any accommodation requirements by sending an email to accessibility@randstad.ca to ensure their ability to fully participate in the interview process.
show more
The LLM Operations Engineer serves as the DevOps specialist within our AI team, focusing on
managing the operational aspects of our AI platform, particularly our Large Language Models
(LLMs) that power Our client's One Intelligence. You will build and maintain robust LLM
workflows, implement monitoring systems, integrate feedback loops, and optimize the
performance of our AI solutions. You'll work closely with AI Engineers and Product Owners to
ensure our AI systems are reliable, secure, observable, and continuously improving.
Advantages
Preferred Qualifications
Experience with prompt engineering and testing tools like Promptfoo
Familiarity with vector databases and retrieval-augmented generation (RAG) systems
Knowledge of serverless architectures and event-driven systems
Experience with AWS Guardrails for LLM security
Background in data engineering or machine learning operations
Understanding of financial systems and data security requirements in the finance industry
Familiarity with implementing technical solutions to meet compliance requirements
outlined in SOC2, ISAE 3402, and ISO 27001
Responsibilities
...
What You Will Do
Design, implement, and maintain LLM operations workflows using tools like Langfuse to
monitor performance, track usage, and create feedback loops for continuous improvement
Develop and maintain infrastructure-as-code for AI deployments using Terraform and
AWS services (Lambda, SQS, API Gateway, OpenSearch, CloudWatch)
Build and enhance monitoring, logging, and alerting systems to ensure optimal
performance and reliability of our LLM infrastructure
Collaborate with AI engineers to design and implement evaluation frameworks (including
LLM-as-judge systems) to measure and improve model performance
Manage prompt versioning, testing, and deployment pipelines through Concourse CI/CD
and custom tooling
Implement and maintain security guardrails for LLM interactions, ensuring compliance
with best practices
Create comprehensive documentation for LLM operations, including runbooks for
production incidents
Participate in on-call rotations to support mission-critical AI systems
Drive innovation in LLM operations by researching and implementing best practices and
emerging tools in the rapidly evolving GenAI space
Qualifications
Required Qualifications
3+ years of experience in DevOps, SRE, or similar roles, with at least 1 year specifically
working with LLMs or AI systems in production
Strong hands-on experience with AWS cloud services, particularly Bedrock, Lambda,
SQS, API Gateway, OpenSearch, and CloudWatch
Experience with infrastructure-as-code using Terraform, CloudFormation, or similar tools
Proficiency in Python and experience building automation tooling and pipelines
Familiarity with LangOps platforms such as Langfuse for LLM observability and
evaluation
Experience with CI/CD pipelines using Concourse or similar tools
Knowledge of logging, monitoring, and alerting systems
Understanding of security best practices for AI systems, including prompt injection
mitigation techniques
Excellent troubleshooting and problem-solving skills
Strong communication skills and ability to work effectively with cross-functional teams
Must be legally entitled to work in the country where the role is located
Summary
To succeed in this role, you will need a combination of experience, technology skills, personal
qualities, and education.
Randstad Canada is committed to fostering a workforce reflective of all peoples of Canada. As a result, we are committed to developing and implementing strategies to increase the equity, diversity and inclusion within the workplace by examining our internal policies, practices, and systems throughout the entire lifecycle of our workforce, including its recruitment, retention and advancement for all employees. In addition to our deep commitment to respecting human rights, we are dedicated to positive actions to affect change to ensure everyone has full participation in the workforce free from any barriers, systemic or otherwise, especially equity-seeking groups who are usually underrepresented in Canada's workforce, including those who identify as women or non-binary/gender non-conforming; Indigenous or Aboriginal Peoples; persons with disabilities (visible or invisible) and; members of visible minorities, racialized groups and the LGBTQ2+ community.
Randstad Canada is committed to creating and maintaining an inclusive and accessible workplace for all its candidates and employees by supporting their accessibility and accommodation needs throughout the employment lifecycle. We ask that all job applications please identify any accommodation requirements by sending an email to accessibility@randstad.ca to ensure their ability to fully participate in the interview process.
show more