india, Uttar Pradesh, Noida

Full–time

Posted on: 14 hours ago

Full–time

Posted on: 14 hours ago

A leading consulting firm operating in the Enterprise Generative AI and Large Language Model (LLM) services sector, delivering production-grade LLM solutions, retrieval-augmented systems, and custom generative AI products for enterprise clients across domains. The team focuses on building secure, scalable, low-latency inference services and automating model lifecycle workflows for on-prem and cloud deployments.

Position: LLM Engineer — On-site (India). We are hiring an experienced LLM engineer to design, fine-tune, and deploy LLM-based solutions that power search, summarization, agents, and domain-specific assistants.

Role & Responsibilities

Design, fine-tune, and validate LLMs for production use-cases—instruction tuning, supervised fine-tuning, and parameter-efficient tuning (LoRA/adapters).
Implement retrieval-augmented generation (RAG) pipelines: embeddings, vector search, chunking, and context assembly for high-recall responses.
Optimize inference for latency and cost: quantization, model pruning, batching, and deployment with optimized runtimes (CUDA, Triton, bitsandbytes where applicable).
Build backend services and APIs to serve LLM inference and orchestration using containerized deployments (Docker/Kubernetes) and CI/CD pipelines.
Collaborate with product, data engineering, and ML teams to integrate LLMs into production flows, monitor model performance, and set up automated retraining/rollbacks.
Create reproducible training pipelines, implement evaluation suites, and produce documentation and runbooks for model governance and observability.

4+ years of hands-on experience working with LLMs or advanced NLP models in production contexts.
Proficiency in Python for ML engineering and model development.
Experience with PyTorch and Hugging Face Transformers for training and fine-tuning.
Practical experience implementing RAG and vector search using tools like FAISS or similar vector databases.
Familiarity with LangChain (or equivalent orchestration) and integration with LLM APIs (OpenAI, Anthropic, etc.).
Experience containerizing and deploying ML services using Docker; familiarity with Kubernetes is a plus.

Experience with inference optimizations: quantization (bitsandbytes), Triton, or GPU-accelerated serving.
Exposure to distributed training frameworks (DeepSpeed) and cloud MLOps platforms (SageMaker, Azure ML, GCP AI Platform).
Knowledge of monitoring, logging, and model-evaluation frameworks for production LLMs (MLflow, Prometheus, Grafana).

Collaborative, engineering-driven culture with strong focus on ownership and rapid iteration.
Opportunity to build end-to-end LLM products for enterprise clients and influence architecture decisions.
On-site role with hands-on access to GPU infrastructure and cross-functional product teams.

Skills: python,docker,llm,agentic,pytorch,cuda

GET IT ON

Google Play

india, Uttar Pradesh, Noida