Chat models
Overviewβ
Large Language Models (LLMs) are advanced machine learning models that excel in a wide range of language-related tasks such as text generation, translation, summarization, question answering, and more, without needing task-specific fine tuning for every scenario.
Modern LLMs are typically accessed through a chat model interface that takes a list of messages as input and returns a message as output.
The newest generation of chat models offer additional capabilities:
- Tool calling: Many popular chat models offer a native tool calling API. This API allows developers to build rich applications that enable LLMs to interact with external services, APIs, and databases. Tool calling can also be used to extract structured information from unstructured data and perform various other tasks.
- Structured output: A technique to make a chat model respond in a structured format, such as JSON that matches a given schema.
- Multimodality: The ability to work with data other than text; for example, images, audio, and video.
Featuresβ
LangChain provides a consistent interface for working with chat models from different providers while offering additional features for monitoring, debugging, and optimizing the performance of applications that use LLMs.
- Integrations with many chat model providers (e.g., Anthropic, OpenAI, Ollama, Microsoft Azure, Google Vertex, Amazon Bedrock, Hugging Face, Cohere, Groq). Please see chat model integrations for an up-to-date list of supported models.
- Use either LangChain's messages format or OpenAI format.
- Standard tool calling API: standard interface for binding tools to models, accessing tool call requests made by models, and sending tool results back to the model.
- Standard API for structuring outputs via the
with_structured_output
method. - Provides support for async programming, efficient batching, a rich streaming API.
- Integration with LangSmith for monitoring and debugging production-grade applications based on LLMs.
- Additional features like standardized token usage, rate limiting, caching and more.
Integrationsβ
LangChain has many chat model integrations that allow you to use a wide variety of models from different providers.
These integrations are one of two types:
- Official models: These are models that are officially supported by LangChain and/or model provider. You can find these models in the
langchain-<provider>
packages. - Community models: There are models that are mostly contributed and supported by the community. You can find these models in the
langchain-community
package.
LangChain chat models are named with a convention that prefixes "Chat" to their class names (e.g., ChatOllama
, ChatAnthropic
, ChatOpenAI
, etc.).
Please review the chat model integrations for a list of supported models.
Models that do not include the prefix "Chat" in their name or include "LLM" as a suffix in their name typically refer to older models that do not follow the chat model interface and instead use an interface that takes a string as input and returns a string as output.
Interfaceβ
LangChain chat models implement the BaseChatModel interface. Because BaseChatModel
also implements the Runnable Interface, chat models support a standard streaming interface, async programming, optimized batching, and more. Please see the Runnable Interface for more details.
Many of the key methods of chat models operate on messages as input and return messages as output.
Chat models offer a standard set of parameters that can be used to configure the model. These parameters are typically used to control the behavior of the model, such as the temperature of the output, the maximum number of tokens in the response, and the maximum time to wait for a response. Please see the standard parameters section for more details.
In documentation, we will often use the terms "LLM" and "Chat Model" interchangeably. This is because most modern LLMs are exposed to users via a chat model interface.
However, LangChain also has implementations of older LLMs that do not follow the chat model interface and instead use an interface that takes a string as input and returns a string as output. These models are typically named without the "Chat" prefix (e.g., Ollama
, Anthropic
, OpenAI
, etc.).
These models implement the BaseLLM interface and may be named with the "LLM" suffix (e.g., OllamaLLM
, AnthropicLLM
, OpenAILLM
, etc.). Generally, users should not use these models.
Key methodsβ
The key methods of a chat model are:
- invoke: The primary method for interacting with a chat model. It takes a list of messages as input and returns a list of messages as output.
- stream: A method that allows you to stream the output of a chat model as it is generated.
- batch: A method that allows you to batch multiple requests to a chat model together for more efficient processing.
- bind_tools: A method that allows you to bind a tool to a chat model for use in the model's execution context.
- with_structured_output: A wrapper around the
invoke
method for models that natively support structured output.
Other important methods can be found in the BaseChatModel API Reference.
Inputs and outputsβ
Modern LLMs are typically accessed through a chat model interface that takes messages as input and returns messages as output. Messages are typically associated with a role (e.g., "system", "human", "assistant") and one or more content blocks that contain text or potentially multimodal data (e.g., images, audio, video).
LangChain supports two message formats to interact with chat models:
- LangChain Message Format: LangChain's own message format, which is used by default and is used internally by LangChain.
- OpenAI's Message Format: OpenAI's message format.
Standard parametersβ
Many chat models have standardized parameters that can be used to configure the model:
Parameter | Description |
---|---|
model | The name or identifier of the specific AI model you want to use (e.g., "gpt-3.5-turbo" or "gpt-4" ). |
temperature | Controls the randomness of the model's output. A higher value (e.g., 1.0) makes responses more creative, while a lower value (e.g., 0.0) makes them more deterministic and focused. |
timeout | The maximum time (in seconds) to wait for a response from the model before canceling the request. Ensures the request doesnβt hang indefinitely. |
max_tokens | Limits the total number of tokens (words and punctuation) in the response. This controls how long the output can be. |
stop | Specifies stop sequences that indicate when the model should stop generating tokens. For example, you might use specific strings to signal the end of a response. |
max_retries | The maximum number of attempts the system will make to resend a request if it fails due to issues like network timeouts or rate limits. |
api_key | The API key required for authenticating with the model provider. This is usually issued when you sign up for access to the model. |
base_url | The URL of the API endpoint where requests are sent. This is typically provided by the model's provider and is necessary for directing your requests. |
rate_limiter | An optional BaseRateLimiter to space out requests to avoid exceeding rate limits. See rate-limiting below for more details. |
Some important things to note:
- Standard parameters only apply to model providers that expose parameters with the intended functionality. For example, some providers do not expose a configuration for maximum output tokens, so max_tokens can't be supported on these.
- Standard parameters are currently only enforced on integrations that have their own integration packages (e.g.
langchain-openai
,langchain-anthropic
, etc.), they're not enforced on models inlangchain-community
.
Chat models also accept other parameters that are specific to that integration. To find all the parameters supported by a Chat model head to the their respective API reference for that model.