ABOUT THE POSITION
We are looking for a Senior AI Engineer to help design and deliver agentic AI systems that power R&D tooling for video game asset pipelines and production workflows. You will help shape the technical direction of our internal agent platform and drive engineering practices around agent loops, memory, evaluation, and safe deployment of LLM-driven applications.
This is a senior, hands-on individual contributor role: you will write code, help design the agentic architecture, and partner with stakeholders across studios to turn emerging AI capabilities into production-grade tools.
Agent platform
- Design and build key parts of our internal agent libraries - the core abstractions and developer ergonomics that let teams across the company build agents quickly and consistently.
- Help shape the architecture of our central agent runtime - the runtime, registry, and observability surface where agents are deployed, monitored, and governed.
- Help define and evolve the agent loop / harness: prompt orchestration, tool invocation, sub-agent delegation, and recovery behavior.
- Bring in reference patterns from the broader ecosystem (e.g. open-source agent loops and harness projects) and adapt them to our use cases.
Agent loop & harness engineering
- Drive prompting strategy at scale: system prompt design, guardrails, mitigation of context poisoning and pollution, and management of hyperparameters (context window sizing, lost-in-the-middle effects, temperature, top-k).
- Design tool interfaces for agents: MCP servers, structured inputs/outputs for context, and sub-agent composition patterns.
- Advocate for best practices for typed-agent frameworks, with first-class observability and telemetry baked into every agent.
- Evaluate and integrate local LLM options where latency, cost, or data-residency requirements demand it.
Agent memory
- Design and build the key parts of the memory layer used across our agents: conversation history management, context chaining, and episodic memory.
- Help define the boundary between short-term working context and long-term persistent memory, including decay/retention policies.
- Apply RBAC and tenant isolation to memory so agents can be safely shared across teams and projects.
Test- and eval-driven development - Build out the evaluation discipline for agentic systems: golden traces, regression evals, offline + online metrics, and red-team prompts.
- Build the harnesses and CI gates that let us iterate on prompts, models, and tools with confidence.
- Uphold evals as the unit of progress - no agent change ships without a measurable signal.
Backend & platform foundations - Design and build scalable backend services and secure RESTful APIs in Python (FastAPI), with strong data modeling across relational and non-relational stores.
- Enforce authentication/authorization (RBAC), input validation, and robust error handling for agent-facing endpoints.
- Implement caching, queues, and vector storage where the agent workload requires it.
Quality, delivery & collaboration
- Drive performance tuning, code reviews, and technical documentation within your area of the AI platform.
- Maintain CI/CD with Git/GitLab and Docker; ensure reproducible local-dev and deployment pipelines.
- Partner with UI/UX, production, SRE, IT, and game-team stakeholders to translate workflows into agentic solutions.
- Contribute to architectural decisions and share agentic-systems expertise with peers.
- Work within agile methodologies.
Foundation (must-have software-engineering baseline)
- 3+ years of professional experience building production applications, with recent depth in AI/LLM-based systems.
- Strong proficiency in at least one of Python, TypeScript, or JavaScript - Python expertise is required for our stack (FastAPI, Pydantic, SQLAlchemy or equivalent).
- Solid database skills across relational (PostgreSQL) and non-relational systems (e.g. MongoDB, vector databases); familiar with caching/queues (Redis) where applicable.
- Working knowledge of RBAC, authn/authz patterns, and secure API design.
- Comfortable with Git, GitLab CI/CD, and Docker/containers.
- Proven testing mindset and experience with automated test suites (e.g. pytest).
Agent loop / harness engineering
- Demonstrated experience designing and operating agent loops in production - not just prompt-tuning a chatbot.
- Deep, practical understanding of prompting: guardrails, context poisoning/pollution, and the hyperparameters that govern model behavior (context window size, lost-in-the-middle effects, temperature, top-k).
- Hands-on experience integrating tools into agents: MCP, structured I/O for context, and sub-agent orchestration.
- Experience with any agent development framework - e.g. LangChain, LangGraph, Claude Agent SDK, Pydantic AI, or comparable - is acceptable.
- Strong instincts for observability and telemetry in non-deterministic systems.
Agent memory
- Practical experience implementing memory for agents: history compaction, context chaining, episodic memory, and short-term vs long-term separation.
- Familiarity with retention/decay strategies and applying RBAC to multi-tenant memory.
Evaluation & quality
- Experience with test- and eval-driven development for LLM systems: building eval sets, regression suites, and CI gates around model/prompt changes.
Communication
- English communication is a MUST - strong written and verbal English is required, and fluency is a significant plus given our globally distributed teams.
- Comfortable communicating technical decisions and tradeoffs across cross-functional stakeholders.
Nice to have
- Experience running local LLMs (e.g. via vLLM, Ollama, llama.cpp) and reasoning about the cost/latency/quality tradeoffs vs hosted models.
- Contributions to or familiarity with open-source agent harnesses (e.g. OpenCode, OpenClaw, etc).
- Experience with agent development frameworks (LangChain/LangGraph/Claude Agent SDK/Pydantic AI) beyond prototype stage.



