NLP Engineer needed in Bengaluru, Karnataka, India

NLP Engineer

Company: Weekday AI

Job Location: Bengaluru, Karnataka, India

Job Type: FULL_TIME - (ON_SITE)

Date Posted: April 05, 2025

External

This role is for one of the Weekday's clients

Salary range: Rs 3000000 - Rs 4000000 (ie INR 30-40 LPA)

Min Experience: 2 years

Location: Bangalore

JobType: full-time

We are looking for a skilled and driven NLP Engineer to help scale, optimize, and deploy large language model (LLM)-based solutions within the healthcare domain. Your primary focus will be on building and maintaining production-ready, end-to-end NLP systems—covering backend architecture, inference optimization, and efficient model deployment pipelines. While opportunities exist for fine-tuning LLMs for specific use cases, the core responsibility is ensuring these models run efficiently, reliably, and at scale in production environments.

Additionally, you will develop NLP pipelines leveraging pre-trained LLMs and embedding models, including retrieval-augmented generation (RAG) systems and agentic NLP solutions that integrate multiple models and data sources for real-time, context-aware processing.

Key Responsibilities

Production-Grade NLP Systems

Design and implement scalable, efficient NLP pipelines using LLMs and embedding models.
Integrate RAG and agentic components to enhance NLP capabilities and adaptability.

Inference Optimization & Deployment

Optimize model inference performance, reduce latency, and improve throughput using frameworks like vLLM, TensorRT, Ray, etc.
Implement best practices for containerization, CI/CD, monitoring, and observability to ensure stable, production-ready deployments.

Occasional Model Adaptation

Assist with fine-tuning or adapting LLMs for specific healthcare applications, ensuring scalability and efficiency.

Collaboration & Continuous Improvement

Work closely with NLP researchers, backend engineers, product managers, and frontend developers to build high-quality NLP solutions.
Participate in code reviews, architectural discussions, and stay updated on emerging NLP and LLM optimization techniques.

Requirements (Must-Haves!)

Bachelor's or Master’s degree in Computer Science or a related field.
2+ years of experience (or 1+ year with an advanced degree) in building and deploying ML/NLP systems using Python.
Hands-on experience with NLP frameworks (e.g., spaCy, Hugging Face Transformers, LangChain) and deep learning libraries (e.g., PyTorch).
Strong background in designing, implementing, and maintaining scalable backend architectures for NLP/LLM-based applications.
Experience working with large datasets, including data cleaning, preprocessing, and structuring.
Proficiency in containerization, CI/CD, and version control for production-grade deployments.
Expertise in LLM inference optimization using vLLM, TensorRT, Ray, etc.
Practical knowledge of deploying NLP models in production, including load balancing and latency reduction.

Preferred (Nice-to-Have!)

Experience in building RAG pipelines and integrating embedding models into NLP workflows.
Familiarity with agentic systems that leverage multiple models for dynamic, context-aware NLP solutions.
Knowledge of prompt engineering, model fine-tuning, and large-scale inference optimization for LLMs.