LangChain: Chatbot

LangChain is an open-source framework designed to simplify the development of applications based on Large Language Models (LLMs). It provides a modular architecture that makes it easy to integrate language models, prompts, memory mechanisms, external APIs, and diverse data sources, enabling the creation of organized, flexible, and extensible solutions.

The goal of this post is to create a chatbot using LangChain. To achieve this, the topics covered throughout the post include:

Key concepts of LangChain
Environment setup
Chatbot creation
Memory testing
Chatbot testing

Key concepts of LangChain

LangChain provides several important functionalities, organized into modules, each responsible for handling a specific part of building LLM-based applications:

LLM Abstractions: LangChain provides abstract classes that unify access to different language model providers (OpenAI, Anthropic, etc.). This allows developers to switch between providers or specific models with minimal code changes, while keeping the same usage interface.
Prompts: LangChain provides utilities for creating, managing, and reusing prompts, including parameterized templates, variable validation, and serialization mechanisms. This makes it easier to maintain and evolve the instructions sent to LLMs over time.
Agents: LangChain has native support for agents, which are components capable of reasoning, selecting tools, and executing actions dynamically. This enables the creation of agentic applications in which the model decides, at runtime, which actions to execute based on the user’s input.
Tools: LangChain provides support for defining and integrating tools that can be used by agents to execute external actions, such as calling APIs, accessing databases, or performing calculations.
Chains: Chains are one of the core concepts of LangChain and represent the composition of multiple components into a sequence of operations, where the output of one step becomes the input of the next. Each step may involve a call to an LLM, the use of a prompt template, a data transformation, or even another chain.
Memory: LangChain provides memory components that allow chains and agents to maintain context across interactions, storing inputs, outputs, and intermediate states.

In summary, LangChain is a framework that offers many other functionalities beyond those mentioned above, making it easier to develop AI systems.

Environment setup

To build the chatbot with LangChain, Python 3.12.9 was used. Additionally, the requirements.txt file used in the project is shown below.

requirements Download

In addition, OpenAI was used as the LLM provider, so the OPENAI_API_KEY environment variable must be defined in the .env file.

Chatbot creation

Below is the model.py file, which is responsible for initializing the LLM, managing memory, and returning the model’s response based on the user’s question.

from datetime import datetime

from dotenv import load_dotenv
from langchain.chat_models import init_chat_model
from langchain_core.messages import HumanMessage, AIMessage, SystemMessage
from langchain_core.output_parsers import StrOutputParser

load_dotenv()

class LLMModel:
    def __init__(self, model_name: str, model_provider: str, temperature: float = 0):
        self.llm = init_chat_model(
            model=model_name,
            model_provider=model_provider,
            temperature=temperature
        )

        system_prompt = (
            "You are a helpful and reliable assistant whose goal is to answer any user question clearly and accurately.\n"
            "If the question does not depend on recent, current, or time-sensitive information, respond directly.\n"
            "If the question depends on events, rankings, titles, records, or facts that may have changed after your "
            "knowledge cutoff, you must not guess or assume. In such cases, clearly state that you do not have up-to-date "
            "information to answer with certainty.\n"
            "Always prioritize correctness, clarity, and objectivity in your responses, and always reply in the same language "
            "used by the user.\n"
            "Do not add unsolicited explanations, examples, lists, or follow-up questions unless explicitly requested.\n\n"
            f"Context: Today's date is {datetime.now().strftime('%Y-%m-%d')}."
        )
        
        self._messages = [SystemMessage(content=system_prompt)]

    def run(self, query: str):

        self._messages.append(HumanMessage(content=query))

        chain = self.llm | StrOutputParser()

        response = chain.invoke(input=self._messages)
        self._messages.append(AIMessage(content=response))

        return response

The class constructor (__init__) is responsible for initializing the LLM and the memory:

init_chat_model: allows a LangChain Chat Model to be initialized in a simplified way by specifying only the model name, provider, and parameters such as temperature. A Chat Model in LangChain is designed to work with a sequence of structured messages, enabling the development of conversational applications in an organized and extensible way. In this context, init_chat_model acts as an abstraction layer over different chat model implementations, such as ChatOpenAI, ChatAnthropic, ChatOllama, among others. This allows developers to switch between different LLM providers while keeping the same usage interface, without directly instantiating each specific class.
self._messages: is a list of LangChain messages that represents the conversation context. Each message corresponds to a unit of context provided to the LLM and has a conceptual role (system, user, assistant, etc.) along with associated content. In LangChain, these roles are represented by specific classes such as SystemMessage, HumanMessage, and AIMessage. The list is initialized with a SystemMessage, which defines the instructions and overall behavior of the model throughout the conversation.

The run method is responsible for actually executing the model:

First, the user input is added to the message list using a HumanMessage, ensuring that the conversation history is preserved.
Next, a chain is created. As discussed earlier, a chain represents a sequence of operations in which the output of one step becomes the input of the next. In this case, the chain consists of the LLM execution followed by the application of the StrOutputParser, which converts the model output into a string.
The chain is executed using the invoke method, with the message list passed as input.
Finally, the assistant’s response is added to the message list as an AIMessage, allowing the context to be preserved for subsequent chat interactions.

Below is the main.py file, which is responsible for actually running the chatbot.

from model import LLMModel


llm = LLMModel(
    model_name="gpt-5-mini",
    model_provider="openai",
)

if __name__ == "__main__":
    while True:
        query = input("Ask a question (type 'exit' to quit): ")
        
        if query.strip().lower() == "exit":
            break

        response = llm.run(query)
        print("Assistant: " + response + "\n------------------------")

Initially, the LLMModel class is instantiated using the gpt-5-mini model as the LLM. Next, an infinite loop (while True) is started, simulating a simple chat interface in the terminal. At each iteration, the user provides a question, which is sent to the run method responsible for executing the interaction flow with the LLM and returning the assistant’s response. The response is then displayed in the terminal. The loop is terminated when the user types "exit", ending the chatbot execution.

It is worth noting that this chatbot was intentionally simplified, meaning it does not use tools to make the assistant’s responses more robust.

Memory testing

The goal of this section is to evaluate the assistant’s memory behavior by verifying whether it can use information mentioned earlier in the conversation. To do so, we analyze whether the message history is correctly included in the model calls, enabling more context-aware responses.

To do this, a simple test was performed: first, I informed the assistant of my name, and in the following iteration, I asked what my name was. The result can be seen below.

Ask a question (type 'exit' to quit): Hello, my name is Edvaldo Melo
Assistant: Hello, Edvaldo Melo — nice to meet you.
------------------------
Ask a question (type 'exit' to quit): What's my name?
Assistant: Your name is Edvaldo Melo.
------------------------
Ask a question (type 'exit' to quit): exit

Note that the assistant was able to correctly respond with my name. To inspect the list of messages being sent, you can use print(llm._messages) outside the loop to check what is passed to the assistant. To improve readability, the output of this print statement was organized. Below is the list of messages sent in this test.

[SystemMessage(content="You are a helpful and reliable assistant whose goal is to answer any user question clearly and accurately.\nIf the question does not depend on recent, current, or time-sensitive information, respond directly.\nIf the question depends on events, rankings, titles, records, or facts that may have changed after your knowledge cutoff, you must not guess or assume. In such cases, clearly state that you do not have up-to-date information to answer with certainty.\nAlways prioritize correctness, clarity, and objectivity in your responses, and always reply in the same language used by the user.\nDo not add unsolicited explanations, examples, lists, or follow-up questions unless explicitly requested.\n\nContext: Today's date is 2026-01-30.", additional_kwargs={}, response_metadata={}), 
HumanMessage(content='Hello, my name is Edvaldo Melo', additional_kwargs={}, response_metadata={}), 
AIMessage(content='Hello, Edvaldo Melo — nice to meet you.', additional_kwargs={}, response_metadata={}, tool_calls=[], invalid_tool_calls=[]), 
HumanMessage(content="What's my name?", additional_kwargs={}, response_metadata={}),
AIMessage(content='Your name is Edvaldo Melo.', additional_kwargs={}, response_metadata={}, tool_calls=[], invalid_tool_calls=[])]

Chatbot testing

The goal of this section is to simulate a natural conversation with the assistant, evaluating its responses based on the user input and the conversation history. The simulation below consists of a conversation between the assistant and a data scientist.

Ask a question (type 'exit' to quit): Hello, I’m a data scientist
Assistant: Hi — nice to meet you. How can I help with your data‑science work?
------------------------
Ask a question (type 'exit' to quit): If my work often involves training and evaluating models, what skills would be most important for me to have?
Assistant: Good question — for work that focuses on training and evaluating models, the most important skills fall into a few clear categories:

Foundations
- Probability & statistics: hypothesis testing, distributions, confidence intervals, Bayesian basics.
- Linear algebra & calculus: for understanding model behavior and optimization.
- Optimization: gradient descent variants, convex vs nonconvex optimization, regularization.

Modeling & evaluation
- Machine‑learning algorithms: supervised (linear models, trees/ensembles, neural networks), unsupervised, and basics of deep learning.
- Model evaluation & metrics: selecting and interpreting appropriate metrics (precision/recall, ROC/AUC, calibration, regression metrics) and understanding class imbalance.
- Cross‑validation & experiment design: proper CV strategies, nested CV, avoiding leakage.
- Bias–variance tradeoff and diagnostic techniques: learning curves, residual analysis, error decomposition.
- Uncertainty estimation & calibration: Bayesian approaches, ensembles, conformal prediction.

Data & features
- Data cleaning and preprocessing: handling missing values, outliers, scaling, encoding categorical variables.
- Feature engineering and selection: domain features, interaction terms, dimensionality reduction (PCA), feature importance.

Practical ML engineering
- Programming & libraries: Python/R, scikit‑learn, pandas, NumPy, PyTorch/TensorFlow, XGBoost/LightGBM/CatBoost.
- Experiment tracking & reproducibility: MLflow, DVC, version control, deterministic training, seed management.
- Hyperparameter tuning: grid/random search, Bayesian optimization, early stopping.
- Deployment & production concerns: model packaging, APIs, latency/throughput constraints, offline vs online inference.
- Scaling & performance: GPU usage, distributed training, Spark/Dask for large datasets.

Monitoring, safety & governance
- Model monitoring & drift detection: data and concept drift, performance monitoring in production.
- Interpretability & fairness: SHAP/LIME, counterfactuals, bias detection and mitigation.
- Privacy & compliance: data anonymization, differential privacy basics where relevant.

Soft skills & domain
- Experimental mindset and critical thinking: designing robust experiments and questioning results.
- Communication: explaining model behavior and metrics to nontechnical stakeholders.
- Domain knowledge: translating business objectives to modeling goals and metrics.

If you want, I can prioritize this list for a specific role (research vs. ML engineer vs. applied data scientist) or suggest a learning path/resources for the top items.
------------------------
Ask a question (type 'exit' to quit): Now imagine that I’m spending more time putting those models into production. What parts of my daily work would likely change?
Assistant: Short answer: your focus shifts from model research and offline evaluation to reliability, automation, observability, latency/throughput constraints, and cross‑team coordination. The day‑to‑day becomes more operational and engineering‑heavy, with fewer quick experiments and more emphasis on safety, reproducibility, and running systems.

What changes — by area

- Priorities and success metrics
- From offline CV/holdout metrics to production KPIs: business impact, live accuracy, precision/recall on production labels, latency, throughput, error rates, cost, and SLAs.
- Greater emphasis on robustness, safety, fairness, and regulatory compliance.

- Daily tasks and cadence
- More time on deployments (CI/CD pipelines, canaries, rollbacks), production validation, and reviewing/approving releases.
- More time monitoring model health, interpreting alerts, and triaging incidents (data drift, model degradation, feature pipeline failures).
- Slower experiment cadence — experiments must consider production constraints, A/B testing, and release plans.
- More code reviews, documentation, and runbooks.

- Testing and validation
- Writing unit/integration tests for preprocessing, inference, and feature pipelines; end‑to‑end tests for model behavior.
- Shadow runs, canary tests, and online A/B experiments before full rollout.
- Validation of imported training data vs production features (data contracts).

- Automation and pipelines
- Building and maintaining automated training / retraining pipelines, scheduling, and dependency management (Airflow/Argo).
- Automating model packaging (Docker), model registry usage, and deployment orchestration (Kubernetes/serving infra).

- Infra, tooling and ops
- Frequent interaction with infra/DevOps: containerization, orchestration, autoscaling, GPUs, cost monitoring.
- Using/maintaining feature stores, model registries, inference servers, and CI for ML (MLflow, Feast/Tecton, Sagemaker, KFServing, BentoML, etc.).
- Logging, metrics, tracing for inference (Prometheus/Grafana, ELK, Sentry).

- Data engineering & contracts
- Stronger collaboration with data engineers to ensure reliable feature pipelines, schemas, and upstream SLAs.
- Defining and enforcing data contracts and feature validation to prevent silent breakages.

- Monitoring, drift detection & retraining
- Setting up and responding to data drift, concept drift, and calibration monitors; deciding retraining triggers (time vs performance).
- Establishing rolling windows, labels collection pipelines, and pipelines to incorporate feedback data.

- Reliability, performance and cost
- Optimizing latency and throughput (batch vs online inference), quantization/serving optimizations, autoscaling policies.
- Tracking and optimizing inference cost and resource usage.

- Governance, security, and compliance
- Producing model cards, lineage, provenance, and audit trails.
- Ensuring privacy, access controls, and adherence to regulatory requirements.

- Collaboration and communication
- More frequent syncs with product, SRE, legal, and business stakeholders to align on rollout plans, risk, and KPIs.
- Preparing clearer handoffs, runbooks, and post‑mortems.

Typical daily checklist (example)
- Review production model dashboards and recent alerts.
- Validate recent inference logs / sample predictions for obvious failures.
- Triage any incidents or anomalies; coordinate fixes/rollbacks if needed.
- Merge and review deployment PRs or CI run results.
- Work on automation: pipelines, tests, or infra improvements.
- Sync with data engineering/SRE/product on upcoming rollouts or feature changes.
- Plan or monitor A/B tests and scheduled retraining jobs.

Net effect on skills you’ll use more
- Engineering: CI/CD, containerization, orchestration, logging/monitoring, scalable inference.
- Software best practices: testing, observability, reliability engineering.
- Systems thinking: optimization for latency/cost/throughput and fault tolerance.
- Communication and process: runbooks, incident management, cross‑team coordination.

What you’ll do less
- Rapid exploratory modeling and many short experiments; deep offline-only algorithmic tinkering shifts to scheduled R&D sprints or a separate research track.

If you want, I can map this to a week‑by‑week routine (what to schedule daily vs weekly vs monthly), or produce a short production‑readiness checklist you can use before every rollout.
------------------------
Ask a question (type 'exit' to quit): If you were advising me on my next career step, what should I focus on improving?
Assistant: Good question — since you’re moving toward production work, focus on skills that increase reliability, scale, and business impact. Below are prioritized areas and concrete ways to improve them, plus a short 6‑month plan and measurable outcomes you can use to show progress.

Top priorities (why and how to improve)
1. Software engineering fundamentals
- Why: Production systems require maintainable, testable code.
- Improve by: writing unit/integration tests, using type hints, performing regular code reviews, practicing refactoring, and contributing to shared libraries.

2. CI/CD and reproducibility
- Why: Reliable deployments and reproducible experiments are core to production ML.
- Improve by: building automated pipelines (GitHub Actions/Jenkins/GitLab CI), packaging models with Docker, using model registries (MLflow) and experiment tracking.

3. Data engineering & feature reliability
- Why: Most production failures stem from bad or changing data.
- Improve by: learning feature-store concepts (Feast/Tecton), schema validation, data contracts, and implementing monitoring/alerts for pipeline failures.

4. Model serving & scalable inference
- Why: Deployment constraints (latency, throughput, cost) determine model viability.
- Improve by: deploying models with K8s, serverless, or inference servers (Triton/Bento/Seldon), measuring latency and throughput, and applying optimizations (batching, quantization).

5. Monitoring, observability, and drift detection
- Why: Detecting degradation early prevents business impact.
- Improve by: instrumenting metrics (Prometheus/Grafana), logging/trace systems, establishing baselines, and building automated drift/alerting pipelines.

6. Testing, validation & rollout strategies
- Why: Safe rollouts reduce risk and outages.
- Improve by: implementing shadow testing, canary rollouts, A/B testing frameworks, and end-to-end integration tests for feature pipelines + model.

7. Reliability, incident response & SRE practices
- Why: You’ll be on-call and need to restore service quickly.
- Improve by: learning incident response, runbooks, SLA/SLO concepts, post-mortem practices, and chaos experiments.

8. Cost, performance engineering & system design
- Why: Production must be cost-effective and resilient.
- Improve by: profiling inference, capacity planning, autoscaling policies, and basic distributed-systems design.

9. Governance, security & compliance
- Why: Models in production often require audits and privacy safeguards.
- Improve by: documenting lineage, creating model cards, practicing access control, and understanding basic privacy tools (PII handling, differential privacy where applicable).

10. Communication & product sense
- Why: You’ll need to translate technical tradeoffs into business impact.
- Improve by: practicing stakeholder briefings, building dashboards tied to KPIs, and writing clear runbooks/docs.

Concrete projects to practice
- Build and deploy an end-to-end pipeline: ingestion → feature store → training → model registry → CI/CD → serving + monitoring.
- Add automated data and prediction validation plus a drift detector.
- Reduce inference latency/cost for a model (quantization, batching, autoscaling) and measure improvements.
- Lead an on-call rotation or run a post‑mortem for a production incident.
- Create model cards, lineage reports, and runbooks for a deployed model.

6‑month improvement plan (practical, focused)
- Months 0–2: Strengthen engineering basics (tests, code reviews, Docker). Set up CI for a small model.
- Months 2–4: Implement an automated pipeline with a model registry and deploy a model to a real serving stack. Add logging and basic metrics.
- Months 4–6: Add drift detection, canary deployments / A/B testing, cost/performance optimizations, and write runbooks. Lead a rollout and a post‑deployment review.

Measurable outcomes to track
- Number of models in automated deployment pipeline.
- Mean inference latency and 95th percentile latency before/after optimization.
- Uptime/SLO compliance and mean time to recovery (MTTR).
- Percentage of production features covered by schema validation / tests.
- Number of incidents and post‑mortems completed; actions implemented.
- Cost per prediction and overall inference spend reduction.

Career path tie‑ins (what to emphasize next)
- Move toward ML Engineer / MLOps: emphasize infra, CI/CD, serving, and SRE skills.
- Move toward Platform/Architect: emphasize automation, multi-team APIs, and system design.
- Move to management: emphasize cross‑team coordination, hiring, and stakeholder communication while keeping technical credibility.

If you’d like, I can: (a) tailor the 6‑month plan to your current stack/tools, or (b) list targeted resources/tutorials for any specific area above.
------------------------
Ask a question (type 'exit' to quit): exit

The goal of this article was to demonstrate, in a practical way, how to build a simple chatbot using LangChain. However, it is important to highlight that this chatbot can be improved in several aspects. One of the main points is memory management, since as the conversation evolves, the message list grows continuously, which can impact both performance and cost. In addition, the chatbot does not have access to the external world, such as internet search tools or API integrations. Another relevant improvement would be providing context that is not part of the model’s knowledge base through Retrieval-Augmented Generation (RAG) techniques, enabling more contextualized responses.