This role is for one of the Weekday's clients
Salary range: Rs 3000000 - Rs 4000000 (ie INR 30-40 LPA)
Min Experience: 2 years
Location: Bangalore
JobType: full-time
We are looking for a skilled and driven NLP Engineer to help scale, optimize, and deploy large language model (LLM)-based solutions within the healthcare domain. Your primary focus will be on building and maintaining production-ready, end-to-end NLP systems—covering backend architecture, inference optimization, and efficient model deployment pipelines. While opportunities exist for fine-tuning LLMs for specific use cases, the core responsibility is ensuring these models run efficiently, reliably, and at scale in production environments.
Additionally, you will develop NLP pipelines leveraging pre-trained LLMs and embedding models, including retrieval-augmented generation (RAG) systems and agentic NLP solutions that integrate multiple models and data sources for real-time, context-aware processing.
Key Responsibilities
Production-Grade NLP Systems
- Design and implement scalable, efficient NLP pipelines using LLMs and embedding models.
- Integrate RAG and agentic components to enhance NLP capabilities and adaptability.
Inference Optimization & Deployment
- Optimize model inference performance, reduce latency, and improve throughput using frameworks like vLLM, TensorRT, Ray, etc.
- Implement best practices for containerization, CI/CD, monitoring, and observability to ensure stable, production-ready deployments.
Occasional Model Adaptation
- Assist with fine-tuning or adapting LLMs for specific healthcare applications, ensuring scalability and efficiency.
Collaboration & Continuous Improvement
- Work closely with NLP researchers, backend engineers, product managers, and frontend developers to build high-quality NLP solutions.
- Participate in code reviews, architectural discussions, and stay updated on emerging NLP and LLM optimization techniques.
Requirements (Must-Haves!)
- Bachelor's or Master’s degree in Computer Science or a related field.
- 2+ years of experience (or 1+ year with an advanced degree) in building and deploying ML/NLP systems using Python.
- Hands-on experience with NLP frameworks (e.g., spaCy, Hugging Face Transformers, LangChain) and deep learning libraries (e.g., PyTorch).
- Strong background in designing, implementing, and maintaining scalable backend architectures for NLP/LLM-based applications.
- Experience working with large datasets, including data cleaning, preprocessing, and structuring.
- Proficiency in containerization, CI/CD, and version control for production-grade deployments.
- Expertise in LLM inference optimization using vLLM, TensorRT, Ray, etc.
- Practical knowledge of deploying NLP models in production, including load balancing and latency reduction.
Preferred (Nice-to-Have!)
- Experience in building RAG pipelines and integrating embedding models into NLP workflows.
- Familiarity with agentic systems that leverage multiple models for dynamic, context-aware NLP solutions.
- Knowledge of prompt engineering, model fine-tuning, and large-scale inference optimization for LLMs.
The best way to predict the future is to create it.
“Peter Drucker”