CAPCO POLAND
Location: Warsaw, Poland
Pref. work model - 3x per week from office
• Trailblazers in banking, payments, capital markets, wealth, and asset management
• Champions of an agile, nimble, and innovative work environment
• Dedicated to building a team of top-notch professionals who share our drive and vision
ROLE OVERVIEW
We’re looking for a detail-oriented QA / AI QA Tester with experience in testing LLM- or agent-based systems (or strong QA experience with a focus on AI), who brings a structured approach to designing test cases and evaluation datasets, understands AI quality metrics, and is passionate about improving the reliability, stability, and overall quality of enterprise AI solutions.
Fluency in Polish is mandatory.
KEY RESPONSIBILITIES:
- Design and maintain business test suites (functional, scenario-based, regression) for the Master Agent and domain agents.
- Build evaluation datasets (PL/EN, domain-specific), including positive/negative queries, edge cases, and out-of-scope scenarios.
- Perform response quality evaluation using metrics such as:
- Accuracy
- Top-k recall
- Groundedness
- Hallucination rate
- Refusal policy compliance
- Conduct PII and compliance testing: validation of masking, anonymization, and sensitive data handling.
- Test guardrails, including:
- Undesired output handling
- Prompt security testing
- “I don’t know” policy enforcement
- Perform performance and resilience testing: latency, SLA compliance, pipeline stability.
- Validate conversational UX (conversation flow, intent recognition, fallback handling, language detection).
- Test integrations with:
- Copilot Studio
- Azure AI Search
- Azure OpenAI / Foundry
- Document Intelligence
- SharePoint Online
- Analyze logs and telemetry (App Insights, Log Analytics) and identify anomalies.
- Document test results, recommendations, and ensure traceability of test cases.
- Support AI and domain teams in diagnosing defects, data drift, and quality regression.
- Participate in periodic knowledge quality reviews and verify compliance with KM governance rules.
KEY TECHNOLOGIES USED BY THE TEAM:
- Copilot Studio (knowledge agents)
- Azure AI Search, Azure OpenAI / Foundry
- Document Intelligence (OCR, table extraction)
- SharePoint Online (knowledge sources)
- App Insights + Log Analytics (telemetry)
- Python (pandas, requests)
- GitHub Actions (CI/CD)
- BigQuery / Looker (analytics)
SKILLS & EXPERIENCES TO GET THE JOB DONE:
- Experience in testing LLM-based or agent-based systems, or classical QA experience with a strong interest in transitioning to AI QA.
- Ability to design business scenarios, test cases, and evaluation datasets.
- Basic Python skills (pandas, REST APIs, simple evaluation scripts).
- Familiarity with Copilot Studio and integration with domain agents.
- Basic knowledge of Azure AI Search, SharePoint Online, and Document Intelligence (ability to interpret OCR/DI outputs).
- Understanding of automated evaluation methods (LLM scoring, auxiliary models, benchmark evaluation).
Nice to have:
- Experience with multicloud testing (GCP BigQuery/Looker, Azure, optionally Fabric).
- Experience with Document Intelligence in the context of OCR and table extraction quality assessment.
- Experience working with GitHub Actions (CI) and automated testing pipelines.
- Basic understanding of the MCP protocol in agent-based systems.
- Experience in data drift analysis and automated evaluation frameworks.
IMPORTANT
- Fluent Polish (spoken and written) – mandatory.
- Good command of English for documentation and collaboration.
- Availability to work on-site, with partial remote work - 3 days per week from the office in Warsaw.
ONLINE RECRUITMENT PROCESS STEPS
- Screening call with the Recruiter
- Hiring Manager Technical Interview
- Client Interview
- Feedback/Offer
We offer a flexible collaboration model based on a B2B contract, with the opportunity to work on diverse projects.

