Sr AI ML Engineer

india, Tamil Nadu, Chennai

Full–time

Posted on: 18 hours ago

TransUnion's Job Applicant Privacy Notice

What We'll Bring:
Lead the design and delivery of enterprise-scale AI/GenAI solutions (LLM apps, RAG pipelines, real-time processing, cloud-native services) across a polyglot stack (Python + Java).
Own the technical roadmap from concept to deployment, ensuring scalability, performance, security, and responsible AI (fairness, transparency, compliance).
Serve as a trusted technical leader, mentoring engineers, data scientists, and architects; define architecture standards, patterns, and best practices across teams.
Drive PoCs and technical evaluations of emerging AI/GenAI technologies (including LangChain/LangGraph & LangChain4j, DJL, ONNX Runtime Java), aligning innovations with business outcomes.
Bridge business stakeholders and engineering, translating complex requirements into robust designs and measurable impact.

What You'll Bring:

Architecture & Delivery
  • Architect end-to-end AI platforms integrating LLMs, RAG, streaming, vector search, and CI/CD—implemented via Python services and Java microservices (Spring Boot/Quarkus/Micronaut).
  • Define standards for REST/gRPC APIs, OAuth2/OIDC security, observability (Micrometer, OpenTelemetry), and SLIs/SLOs.
  • Establish coding, versioning, monitoring, governance for ML systems; champion reproducibility (MLflow/DVC) and model registries.

  • LLM & RAG Engineering
  • Lead LLM fine‑tuning/evaluation/deployment; design retrieval pipelines using Elasticsearch/OpenSearch/Vespa and vector stores (pgvector, Pinecone, Weaviate) with Java and Python clients.
  • Build LangChain4j pipelines (prompts, tools, agents) and interoperable services that consume Python-hosted model endpoints via REST/gRPC.
  • Optimize embeddings, chunking, retrieval/ranking for latency, precision, and cost; implement caching, batching, and circuit breakers.

  • Platforms & Cloud
  • GCP must have skill with Familiarity in AWS/Azure; 2+ years with CI/CD pipelines and 3+ years with Docker/Kubernetes.
  • Guide deployments on AWS/GCP/Azure using Docker/Kubernetes, Helm, service mesh (Istio/Linkerd), and managed ML services (SageMaker, Vertex AI, Azure ML).
  • Use DJL (Deep Java Library) and ONNX Runtime Java for on‑JVM inference where appropriate; integrate Spark/Databricks MLlib for large‑scale pipelines.

  • Leadership & Collaboration
  • Mentor engineers and architects; contribute reusable assets, reference implementations, and accelerators.
  • Engage vendors/partners; participate in industry forums; advocate responsible AI and internal knowledge-sharing.

  • Impact You'll Make:

    Technical Expertise (Python + Java)
  • Expert Python with PyTorch, TensorFlow, scikit-learn, Hugging Face Transformers.
  • Advanced Java (Java 8+), Spring Boot/Quarkus/Micronaut, Vert.x/Netty for high‑throughput services; concurrency, GC tuning, and performance engineering.
  • GenAI frameworks: LangChain/LangGraph (Python) and LangChain4j (Java) for agents, tools, and RAG workflows.
  • JVM ML/Inference: DJL, ONNX Runtime Java, TensorFlow Java; integration with Spark/Databricks MLlib.
  • APIs & Data: FastAPI/Flask (Python) and Spring Boot (Java); SQL/NoSQL (PostgreSQL, MongoDB, Cassandra), JPA/Hibernate, Redis.
  • Search & Vector: Elasticsearch/OpenSearch/Lucene, pgvector/Pinecone/Weaviate with Java/Python SDKs.
  • Streaming & Messaging: Kafka, gRPC, event‑driven patterns.
  • Agentic AI Dev skills : LangChain, LangGraph, CrewAI, AutoGen, Semantic Kernel, Spring AI (Java), MCP (Python/Java), LlamaIndex, RAG with Pinecone/Milvus/Weaviate/Qdrant/Chroma, vLLM, Ollama, Ray Serve, Langfuse, TruLens, MLflow, Python, Java, SQL + Vector DBs.
  • GCP Vertex AI, Google ADK and GCP AI skills

  • MLOps & Cloud
  • MLflow/DVC, model versioning/monitoring, CI/CD (Jenkins/GitHub Actions/Azure DevOps), Maven/Gradle, Terraform.
  • Containers & Orchestration: Docker, Kubernetes, KServe/Seldon Core, Helm; cloud services (AWS/GCP/Azure).

  • Analytical & Leadership
  • Strong statistics, hypothesis testing, experimental design; A/B testing frameworks.
  • Proven track record leading AI/ML teams/projects end‑to‑end; excellent stakeholder communication.

  • Preferred/Nice-to-have
  • Reinforcement learning, meta‑learning, unsupervised learning.
  • Contributions to the AI/ML community (OSS, publications, talks).
  • Experience with Databricks, OpenTelemetry, service mesh, Vault/Secrets.

This is a hybrid position and involves regular performance of job responsibilities virtually as well as in-person at an assigned TU office location for a minimum of two days a week.

TransUnion Job Title

Sr Developer, Applications Development