Company Introduction – T-Systems ICT India Pvt. Ltd. T-Systems Information and Communication Technology India Private Limited (T-Systems ICT India Pvt. Ltd.) is a certified Great Place To Work®, proudly recognized for its strong people-first culture and commitment to employee excellence. As a wholly owned subsidiary of T-Systems International GmbH, T-Systems India operates out of Pune and Bangalore, with a dynamic team of over 4,200 professionals delivering high-value IT services to group customers worldwide. T-Systems India plays a key role in this global vision by delivering integrated, end-to-end IT solutions and sector-specific software to drive transformation across industries, including automotive, manufacturing, logistics, transportation, healthcare, and the public sector. For over 25 years, T-Systems International GmbH has been at the forefront of digital innovation, driving progress and fostering digital optimism. As a leading European IT services provider and a proud part of Deutsche Telekom, T-Systems delivers transformative digitalization projects backed by deep expertise in consulting, cloud, AI, cybersecurity, and connectivity. With a global workforce of 26,000 employees across 26 countries, we set industry benchmarks in efficiency, sovereignty, security, and reliability—empowering organizations to unlock their full digital potential. With annual revenues exceeding EUR 4.0 billion (2024), T-Systems stands as one of Europe’s foremost digital transformation partners, committed to shaping the future of enterprise technology.
Job DescriptionPlatform Operations & Technical Ownership
3rd-Level Technical Support & Troubleshooting as key knowledge resource
- Acts as the primary 3rd-level contact for:
- Wazuh SIEM
- PostgreSQL
- S3 MinIO Object Storage
- DNS Infrastructure
- Remote platform access / bastion systems
- Linux OS (SuSE, RHEL, Ubuntu)
- NSX‑T networking and firewalling
- SuSE Manager
- Performs deep root-cause analyses including multi-system debugging.
- Handles cross-team, business-critical incidents requiring broad platform knowledge.
Capacity & Performance Management
- End-to-end responsibility for FCI and Kubernetes cluster capacity management.
- Continuous assessment of resource utilization, trends, and scaling requirements.
Platform Stability & Reliability
- Drives improvements in platform stability and deployment reliability.
- Optimizes operational models and CI/CD processes.
- Ensures smooth transitions from project delivery to stable operations.
2. Platform Engineering & Automation
- Prepares, designs, and executes Proofs of Concept (PoCs) for:
- Ansible / AWX to enable automated deployments and configuration management.
- Oracle-related technologies, including integration and migration scenarios.
- Develops automation strategies and contributes reusable modules and deployment templates.
- Defines technical standards for automated operations.
3. Security, Compliance & Governance
Audit Management & Collaboration with Auditors
- Designs, reviews, and explains technical audit controls to internal and external auditors.
- Coordinates audit activities for both platform and application-related topics.
Security-Driven Engineering
- Embeds security controls into automated deployment workflows.
- Creates and maintains compliance policies and technical guardrails.
Wazuh SIEM Responsibility
- Designs, maintains, and operates the Wazuh security platform.
- Develops use cases, alerts, dashboards, and security incident processes.
- Troubleshoots performance issues, agent behavior, and platform scalability.
4. Collaboration, Stakeholder Management & Enablement
- Coordinates work packages across AO teams, development teams, and infrastructure units.
- Works closely with software teams to onboard applications onto the platform.
- Supports service portfolio development and provides technical input for presales activities.
- Shares best practices and mentors engineers regarding platform processes and tools.
5. Architecture, Design & Technology Evaluation
- Executes PoCs and evaluates new platform components.
- Defines integration strategies for new technologies in alignment with architecture standards.
- Creates reference architectures, deployment blueprints, and operational concepts.
- Evaluates solutions based on scalability, resilience, security, and cost efficiency.
6. Project Involvement
Project: Icinga Replacement
- Coordinates work and dependencies with classic AO teams.
- Supports AO teams in deploying and configuring exporters/agents on legacy VMs.
- Standardizes client-side configurations and data mappings.
- Implements standardized dashboards for platform service observability.
- Defines monitoring and alerting for existing components and applications.
- Performs advanced troubleshooting, including:
- missing or incomplete metrics
- high scrape latency
- time-series cardinality challenges
- Kubernetes monitoring (Prometheus Operator, ServiceMonitor/PodMonitor resources)
Project: MIF
- Analysis of the existing application architecture and its components.
- Conducts PoC for Cognos.
- Supports DB2 → PostgreSQL migration, including data validation, performance assessment, and migration tooling.
7. Technical Skills & Competencies
Linux Platform Engineering & Operations
- Advanced administration of enterprise-grade Linux systems (RHEL, Ubuntu, hardened distributions).
- Deep OS-level troubleshooting (CPU, memory, IO bottlenecks, process diagnostics).
- Service lifecycle management using systemd, including journald log analysis.
- Kernel parameter tuning, optimization, and performance diagnostics.
- Host-level incident investigation and forensic log analysis.
- Definition and execution of patching and lifecycle management strategies.
- Filesystem operations and troubleshooting (LVM, XFS, ext4, mount and IO issues).
- User and remote access configuration, including SSH hardening and bastion host concepts.
Kubernetes Platform Operations
- Operational support for Kubernetes clusters across control plane and worker nodes.
- Troubleshooting pod failures, scheduling issues, container crashes, and resource exhaustion.
- Debugging of networking-related problems (CNI layers, service routing, DNS resolution).
- Management of persistent volumes, storage classes, and dynamic provisioning behaviors.
- Resource forecasting and capacity planning for cluster growth (CPU, memory, storage).
- Execution and validation of Kubernetes cluster upgrades.
- Operational support for multi-cluster and multi-environment setups.
- Analysis of Kubernetes system logs (kube-api, kubelet, controller-manager).
- Maintenance and enhancement of the Kubernetes stack, including version upgrades and feature adoption.
Observability & Security Platform (Wazuh)
- Design, deployment, and operational management of the Wazuh SIEM platform.
- Full lifecycle management of Wazuh agents, including policy enforcement and tuning.
- Troubleshooting log ingestion pipelines, decoders, enrichment rules, and alert logic.
- Integration of Wazuh with platform services and infrastructure.
- Analysis of security alerts and support of incident investigations.
- Performance optimization of SIEM components to ensure reliable event processing.
- Maintenance of compliance dashboards and generation of audit-relevant evidence.
- Continuous improvement of Wazuh stack via upgrades, new features, and configuration optimization.
Observability & Monitoring Platform (Prometheus / Grafana / Alerting)
- Deployment, configuration, and operations of Prometheus-based monitoring stacks (standalone and Kubernetes-integrated).
- Administration of scraping configurations, service discovery rules, and target troubleshooting.
- Design and maintenance of recording rules and alert rules for platform components.
- Alert noise reduction through tuning and improved signal quality.
- Integration and troubleshooting of exporters (node, database, Kubernetes, etc.).
- Resolution of metric gaps, scrape latency issues, and cardinality-related performance problems.
- Capacity planning for Prometheus TSDB retention, storage requirements, and query performance.
- Development and lifecycle management of Grafana dashboards for platform and infrastructure services.
- Troubleshooting dashboard performance, data source connectivity, and visualization accuracy.
- Implementation of standardized dashboard templates across platform services.
- Integration of alerting workflows into incident management systems.
- Definition of platform SLIs/SLOs and reliability indicators.
- Correlation of metrics and logs (including Wazuh and OS logs) for root-cause analysis.
- Support and lifecycle management of Kubernetes monitoring components (Prometheus Operator, ServiceMonitor/PodMonitor).
- Validation of monitoring coverage for newly onboarded components and applications.
Database Platform Operations (PostgreSQL / Oracle PoC)
- Operational management of PostgreSQL clusters across environments.
- Monitoring key metrics (connections, locks, long-running queries, replication lag).
- Backup, restore, and disaster recovery validation.
- Growth and capacity planning for compute and storage layers.
- Support for database failover scenarios and resilience testing.
- Preparation and execution of Oracle-related proofs of concept.
- Evaluation of database deployment models (VM-based, containerized, or managed).
- Maintenance and enhancement of the database stack, including upgrades and feature adoption.
Object Storage Platform (MinIO / S3 APIs)
- Deployment and operations of MinIO-based object storage clusters.
- Troubleshooting of S3 API access, authentication, and compatibility issues.
- Monitoring capacity usage, planning storage expansions, and scaling clusters.
- Configuration of lifecycle policies, data retention, and archival strategies.
- Integration of MinIO with platform workloads, CI/CD, and backup systems.
- Performance analysis and troubleshooting of replication and erasure coding.
Networking & Firewall Operations (VMware NSX-T)
- Operational support of software-defined networking environments using NSX-T.
- Troubleshooting of routing issues, overlay networking, and cross-segment connectivity.
- Management of distributed firewall policies and micro-segmentation rules.
- Support for load balancers, service exposure, and virtual networking components.
- Administration of DNS infrastructure (zones, records, service discovery).
- Throughput, latency, and capacity analysis for critical network paths.
Remote Platform Access & Identity Integration
- Design and support of secure remote access solutions using Apache Guacamole and Entra ID.
- Troubleshooting identity flows, authentication chains, and access control policies.
- Integration with enterprise identity providers using OIDC and directory services.
- Implementation of secure access patterns for administrators and application teams.
Automation & Platform Engineering (Ansible / AWX)
- Preparation and execution of Ansible and AWX proof-of-concepts.
- Development of automation playbooks for platform configuration, provisioning, and lifecycle tasks.
- Integration of configuration management workflows into operational routines.
- Evaluation and optimization of automated operational processes.
- Automated deployment validation and configuration compliance checks.
Incident Management & Reliability Engineering
- 3rd-level escalation point for complex incidents across infrastructure and platform services.
- Root cause analysis using logs, metrics, and system-level diagnostics.
- Coordination of incident response across multiple technical domains.
- Identification and remediation of recurring incident patterns.
- Implementation of platform stabilization and hardening measures.
- Transition of engineered solutions into long-term operational models.
Security, Compliance & Audit Support
- Design and discussion of audit controls with internal and external auditors.
- Preparation of audit evidence for platform and application compliance.
- Integration of security controls and guardrails into automated deployment workflows.
- Maintenance of compliance-sensitive configuration baselines.
- Support for remediation of audit findings and compliance gaps.
Architecture & Technology Evaluation
- Execution of proofs of concept for emerging technologies and platform components.
- Assessment of scalability, resilience, operational complexity, and security posture.
- Creation of technical blueprints and reference architectures.
- Definition of integration strategies for new services within existing platform ecosystems.
- Evaluation of cost efficiency, maintainability, and operational impact of architectural decisions.
Collaboration & Platform Enablement
- Coordination of cross-team technical work packages across operations and engineering units.
- Support for application onboarding to shared platform services.
- Documentation of platform standards, operational procedures, and best practices.
- Contribution to presales discussions and service portfolio evolution.
Delivery of knowledge transfer and enablement sessions for operations and development teams
Additional InformationPlease Note: Fraudulent job postings/job scams are increasingly common. Beware of misleading advertisements and fraudulent communication issuing 'offer letters' on behalf of T-Systems in exchange for a fee. Please look for an authentic T-Systems email id - [email protected].
Stay vigilant. Protect yourself from recruitment fraud!
To know more please visit : Fraud Alert
Top Skills
T-Systems ICT India Pvt. Ltd. Pune, Mahārāshtra, IND Office
Balewadi, Pune, Maharashtra, India


