Gen AI/LLM Engineer_5+Yrs

india, Uttar Pradesh, Noida

Full–time

Posted on: 14 hours ago

A leading consulting firm operating in the Enterprise Generative AI and Large Language Model (LLM) services sector, delivering production-grade LLM solutions, retrieval-augmented systems, and custom generative AI products for enterprise clients across domains. The team focuses on building secure, scalable, low-latency inference services and automating model lifecycle workflows for on-prem and cloud deployments.

Position: LLM Engineer — On-site (India). We are hiring an experienced LLM engineer to design, fine-tune, and deploy LLM-based solutions that power search, summarization, agents, and domain-specific assistants.

Role & Responsibilities
  • Design, fine-tune, and validate LLMs for production use-cases—instruction tuning, supervised fine-tuning, and parameter-efficient tuning (LoRA/adapters).
  • Implement retrieval-augmented generation (RAG) pipelines: embeddings, vector search, chunking, and context assembly for high-recall responses.
  • Optimize inference for latency and cost: quantization, model pruning, batching, and deployment with optimized runtimes (CUDA, Triton, bitsandbytes where applicable).
  • Build backend services and APIs to serve LLM inference and orchestration using containerized deployments (Docker/Kubernetes) and CI/CD pipelines.
  • Collaborate with product, data engineering, and ML teams to integrate LLMs into production flows, monitor model performance, and set up automated retraining/rollbacks.
  • Create reproducible training pipelines, implement evaluation suites, and produce documentation and runbooks for model governance and observability.

  • Skills & Qualifications Must-Have
  • 4+ years of hands-on experience working with LLMs or advanced NLP models in production contexts.
  • Proficiency in Python for ML engineering and model development.
  • Experience with PyTorch and Hugging Face Transformers for training and fine-tuning.
  • Practical experience implementing RAG and vector search using tools like FAISS or similar vector databases.
  • Familiarity with LangChain (or equivalent orchestration) and integration with LLM APIs (OpenAI, Anthropic, etc.).
  • Experience containerizing and deploying ML services using Docker; familiarity with Kubernetes is a plus.

  • Preferred
  • Experience with inference optimizations: quantization (bitsandbytes), Triton, or GPU-accelerated serving.
  • Exposure to distributed training frameworks (DeepSpeed) and cloud MLOps platforms (SageMaker, Azure ML, GCP AI Platform).
  • Knowledge of monitoring, logging, and model-evaluation frameworks for production LLMs (MLflow, Prometheus, Grafana).

  • Benefits & Culture Highlights
  • Collaborative, engineering-driven culture with strong focus on ownership and rapid iteration.
  • Opportunity to build end-to-end LLM products for enterprise clients and influence architecture decisions.
  • On-site role with hands-on access to GPU infrastructure and cross-functional product teams.

Skills: python,docker,llm,agentic,pytorch,cuda