This package contains the Python client for interacting with the LangSmith platform.
To install:
pip install -U langsmith
export LANGSMITH_TRACING=true
export LANGSMITH_API_KEY=ls_...
Then trace:
import openai
from langsmith.wrappers import wrap_openai
from langsmith import traceable
# Auto-trace LLM calls in-context
client = wrap_openai(openai.Client())
@traceable # Auto-trace this function
def pipeline(user_input: str):
result = client.chat.completions.create(
messages=[{"role": "user", "content": user_input}],
model="gpt-3.5-turbo"
)
return result.choices[0].message.content
pipeline("Hello, world!")
See the resulting nested trace 🌐 here.
LangSmith helps you and your team develop and evaluate language models and intelligent agents. It is compatible with any LLM application.
Cookbook: For tutorials on how to get more value out of LangSmith, check out the Langsmith Cookbook repo.
A typical workflow looks like:
We'll walk through these steps in more detail below.
Sign up for LangSmith using your GitHub, Discord accounts, or an email address and password. If you sign up with an email, make sure to verify your email address before logging in.
Then, create a unique API key on the Settings Page, which is found in the menu at the top right corner of the page.
[!NOTE] Save the API Key in a secure location. It will not be shown again.
You can log traces natively using the LangSmith SDK or within your LangChain application.
LangSmith seamlessly integrates with the Python LangChain library to record traces from your LLM applications.
Tracing can be activated by setting the following environment variables or by manually specifying the LangChainTracer.
import os
os.environ["LANGSMITH_TRACING"] = "true"
os.environ["LANGSMITH_ENDPOINT"] = "https://api.smith.langchain.com"
# os.environ["LANGSMITH_ENDPOINT"] = "https://eu.api.smith.langchain.com" # If signed up in the EU region
os.environ["LANGSMITH_API_KEY"] = "<YOUR-LANGSMITH-API-KEY>"
# os.environ["LANGSMITH_PROJECT"] = "My Project Name" # Optional: "default" is used if not set
# os.environ["LANGSMITH_WORKSPACE_ID"] = "<YOUR-WORKSPACE-ID>" # Required for org-scoped API keys
Tip: Projects are groups of traces. All runs are logged to a project. If not specified, the project is set to
default.
If the environment variables are correctly set, your application will automatically connect to the LangSmith platform.
from langchain_core.runnables import chain
@chain
def add_val(x: dict) -> dict:
return {"val": x["val"] + 1}
add_val({"val": 1})
You can still use the LangSmith development platform without depending on any LangChain code.
import os
os.environ["LANGSMITH_ENDPOINT"] = "https://api.smith.langchain.com"
os.environ["LANGSMITH_API_KEY"] = "<YOUR-LANGSMITH-API-KEY>"
# os.environ["LANGSMITH_PROJECT"] = "My Project Name" # Optional: "default" is used if not set
The easiest way to log traces using the SDK is via the @traceable decorator. Below is an example.
from datetime import datetime
from typing import List, Optional, Tuple
import openai
from langsmith import traceable
from langsmith.wrappers import wrap_openai
client = wrap_openai(openai.Client())
@traceable
def argument_generator(query: str, additional_description: str = "") -> str:
return client.chat.completions.create(
[
{"role": "system", "content": "You are a debater making an argument on a topic."
f"{additional_description}"
f" The current time is {datetime.now()}"},
{"role": "user", "content": f"The discussion topic is {query}"}
]
).choices[0].message.content
@traceable
def argument_chain(query: str, additional_description: str = "") -> str:
argument = argument_generator(query, additional_description)
# ... Do other processing or call other functions...
return argument
argument_chain("Why is blue better than orange?")
Alternatively, you can manually log events using the Client directly or using a RunTree, which is what the traceable decorator is meant to manage for you!
A RunTree tracks your application. Each RunTree object is required to have a name and run_type. These and other important attributes are as follows:
name: str - used to identify the component's purposerun_type: str - Currently one of "llm", "chain" or "tool"; more options will be added in the futureinputs: dict - the inputs to the componentoutputs: Optional[dict] - the (optional) returned values from the componenterror: Optional[str] - Any error messages that may have arisen during the callfrom langsmith.run_trees import RunTree
parent_run = RunTree(
name="My Chat Bot",
run_type="chain",
inputs={"text": "Summarize this morning's meetings."},
# project_name= "Defaults to the LANGSMITH_PROJECT env var"
)
parent_run.post()
# .. My Chat Bot calls an LLM
child_llm_run = parent_run.create_child(
name="My Proprietary LLM",
run_type="llm",
inputs={
"prompts": [
"You are an AI Assistant. The time is XYZ."
" Summarize this morning's meetings."
]
},
)
child_llm_run.post()
child_llm_run.end(
outputs={
"generations": [
"I should use the transcript_loader tool"
" to fetch meeting_transcripts from XYZ"
]
}
)
child_llm_run.patch()
# .. My Chat Bot takes the LLM output and calls
# a tool / function for fetching transcripts ..
child_tool_run = parent_run.create_child(
name="transcript_loader",
run_type="tool",
inputs={"date": "XYZ", "content_type": "meeting_transcripts"},
)
child_tool_run.post()
# The tool returns meeting notes to the chat bot
child_tool_run.end(outputs={"meetings": ["Meeting1 notes.."]})
child_tool_run.patch()
child_chain_run = parent_run.create_child(
name="Unreliable Component",
run_type="tool",
inputs={"input": "Summarize these notes..."},
)
child_chain_run.post()
try:
# .... the component does work
raise ValueError("Something went wrong")
child_chain_run.end(outputs={"output": "foo"}
child_chain_run.patch()
except Exception as e:
child_chain_run.end(error=f"I errored again {e}")
child_chain_run.patch()
pass
# .. The chat agent recovers
parent_run.end(outputs={"output": ["The meeting notes are as follows:..."]})
res = parent_run.patch()
res.result()
Once your runs are stored in LangSmith, you can convert them into a dataset. For this example, we will do so using the Client, but you can also do this using the web interface, as explained in the LangSmith docs.
from langsmith import Client
client = Client()
dataset_name = "Example Dataset"
# We will only use examples from the top level AgentExecutor run here,
# and exclude runs that errored.
runs = client.list_runs(
project_name="my_project",
execution_order=1,
error=False,
)
dataset = client.create_dataset(dataset_name, description="An example dataset")
for run in runs:
client.create_example(
inputs=run.inputs,
outputs=run.outputs,
dataset_id=dataset.id,
)
Check out the LangSmith Testing & Evaluation dos for up-to-date workflows.
For generating automated feedback on individual runs, you can run evaluations directly using the LangSmith client.
from typing import Optional
from langsmith.evaluation import StringEvaluator
def jaccard_chars(output: str, answer: str) -> float:
"""Naive Jaccard similarity between two strings."""
prediction_chars = set(output.strip().lower())
answer_chars = set(answer.strip().lower())
intersection = prediction_chars.intersection(answer_chars)
union = prediction_chars.union(answer_chars)
return len(intersection) / len(union)
def grader(run_input: str, run_output: str, answer: Optional[str]) -> dict:
"""Compute the score and/or label for this run."""
if answer is None:
value = "AMBIGUOUS"
score = 0.5
else:
score = jaccard_chars(run_output, answer)
value = "CORRECT" if score > 0.9 else "INCORRECT"
return dict(score=score, value=value)
evaluator = StringEvaluator(evaluation_name="Jaccard", grading_function=grader)
runs = client.list_runs(
project_name="my_project",
execution_order=1,
error=False,
)
for run in runs:
client.evaluate_run(run, evaluator)
LangSmith easily integrates with your favorite LLM framework.
We provide a convenient wrapper for the OpenAI SDK.
In order to use, you first need to set your LangSmith API key.
export LANGSMITH_API_KEY=<your-api-key>
Next, you will need to install the LangSmith SDK:
pip install -U langsmith
After that, you can wrap the OpenAI client:
from openai import OpenAI
from langsmith import wrappers
client = wrappers.wrap_openai(OpenAI())
Now, you can use the OpenAI client as you normally would, but now everything is logged to LangSmith!
client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "Say this is a test"}],
)
Oftentimes, you use the OpenAI client inside of other functions.
You can get nested traces by using this wrapped client and decorating those functions with @traceable.
See this documentation for more documentation how to use this decorator
from langsmith import traceable
@traceable(name="Call OpenAI")
def my_function(text: str):
return client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": f"Say {text}"}],
)
my_function("hello world")
We provide a convenient integration with Instructor, largely by virtue of it essentially just using the OpenAI SDK.
In order to use, you first need to set your LangSmith API key.
export LANGSMITH_API_KEY=<your-api-key>
Next, you will need to install the LangSmith SDK:
pip install -U langsmith
After that, you can wrap the OpenAI client:
from openai import OpenAI
from langsmith import wrappers
client = wrappers.wrap_openai(OpenAI())
After this, you can patch the OpenAI client using instructor:
import instructor
client = instructor.patch(OpenAI())
Now, you can use instructor as you normally would, but now everything is logged to LangSmith!
from pydantic import BaseModel
class UserDetail(BaseModel):
name: str
age: int
user = client.chat.completions.create(
model="gpt-3.5-turbo",
response_model=UserDetail,
messages=[
{"role": "user", "content": "Extract Jason is 25 years old"},
]
)
Oftentimes, you use instructor inside of other functions.
You can get nested traces by using this wrapped client and decorating those functions with @traceable.
See this documentation for more documentation how to use this decorator
@traceable()
def my_function(text: str) -> UserDetail:
return client.chat.completions.create(
model="gpt-3.5-turbo",
response_model=UserDetail,
messages=[
{"role": "user", "content": f"Extract {text}"},
]
)
my_function("Jason is 25 years old")
The LangSmith pytest plugin lets Python developers define their datasets and evaluations as pytest test cases. See online docs for more information.
This plugin is installed as part of the LangSmith SDK, and is enabled by default. See also official pytest docs: How to install and use plugins
To learn more about the LangSmith platform, check out the docs.
The LangSmith SDK is licensed under the MIT License.
The copyright information for certain dependencies' are reproduced in their corresponding COPYRIGHT.txt files in this repo, including the following:
API key authentication for write replicas.
Service-to-service JWT authentication for write replicas.
Custom authentication headers for write replicas.
Configuration for a write replica endpoint.
Run Schema with back-references for posting runs.
A single cache entry with metadata for TTL tracking.
Cache performance metrics.
Thread-safe LRU cache with background thread refresh.
Thread-safe LRU cache with asyncio task refresh.
Deprecated alias for PromptCache. Use PromptCache instead.
Deprecated alias for AsyncPromptCache. Use AsyncPromptCache instead.
An error occurred while communicating with the LangSmith API.
Internal server error while communicating with LangSmith.
Client took too long to send request body.
User error caused an exception when communicating with LangSmith.
You have exceeded the rate limit for the LangSmith API.
Couldn't authenticate with the LangSmith API.
Couldn't find the requested resource.
The resource already exists.
Couldn't connect to the LangSmith API.
Port of ExceptionGroup for Py < 3.11.
Base class for warnings.
Warning for missing API key.
Filter urllib3 warnings logged when the connection pool isn't reused.
Filter for retries from this lib.
Wrapper to filter logs with this name.
ThreadPoolExecutor that copies the context to the child thread.
Plugin for rendering LangSmith results.
String node extracted from the data.
Processes a list of string nodes for masking.
Configuration options for replacing sensitive data.
Declarative rule used for replacing sensitive data.
String node processor that uses a list of rules to replace sensitive data.
String node processor that uses a callable function to replace sensitive data.
A sentinel singleton class used to distinguish omitted keyword arguments
A class for making assertions on expectation values.
A class for setting expectations on test results.
Any additional info to be injected into the run dynamically.
Implementations of this Protocol accept an optional langsmith_extra parameter.
Manage a LangSmith run in context.
Introduced in python 3.9.
Used for optional OTEL tracing.
Item returned by :meth:Client.list_threads.
Client for interacting with the LangSmith API.
Annotated type that will be stored as an attachment if used.
Protocol for binary IO-like objects.
Example base model.
Example upload with attachments.
Example create with attachments.
Info for an attachment.
Example model.
Example returned via search.
Operations to perform on attachments.
Example update with attachments.
Enum for dataset data types.
Dataset base model.
Schema for dataset transformations.
Dataset ORM model.
Class representing a dataset version.
Base Run schema.
Run schema when loading from the DB.
(Deprecated) Enum for run types. Use string directly.
Run-like dictionary, for type-hinting.
Run schema with annotation queue info.
Base class for feedback sources.
API feedback source.
Model feedback source.
Feedback source type.
Feedback schema.
Specific value and label pair for feedback.
Represents how a feedback value ought to be interpreted.
Schema used for creating feedback.
Schema for getting feedback.
TracerSession schema for the API.
A project, hydrated with additional information.
A protocol representing objects similar to BaseMessage.
Represents the schema for a dataset share.
Represents a rubric item assigned to an annotation queue.
Represents an annotation queue.
Represents an annotation queue with details.
Configuration for batch ingestion.
Information about the LangSmith server.
Settings for the LangSmith tenant.
Represents the schema for a feedback ingest token.
Run event schema.
Timedelta input schema.
Represents the difference information between two datasets.
Represents a comparative experiment.
Represents a Prompt with a manifest.
Represents a listed prompt commit with associated metadata.
Represents a Prompt with metadata.
A list of prompts with metadata.
Enum for sorting fields for prompts.
Breakdown of input token counts.
Breakdown of output token counts.
Breakdown of input token costs.
Breakdown of output token costs.
Usage metadata for a message, such as token counts.
Usage metadata dictionary extracted from a run.
Response object returned from the upsert_examples_multipart method.
Example with runs.
Run statistics for an experiment.
Results container for experiment data with stats and examples.
An Insights Report created by the Insights Agent over a tracing project.
A feedback key and weight used when calculating feedback formulas.
Schema used for creating a feedback formula.
Schema used for updating a feedback formula.
Schema for getting feedback formulas.
Represents a feedback configuration for a tenant's feedback key.
Middleware for propagating distributed tracing context using LangSmith.
Async Client for interacting with the LangSmith API.
Result of executing a command in a sandbox.
Resource specification for a sandbox.
Represents a persistent volume.
Specification for mounting a volume in a sandbox template.
Represents a SandboxTemplate.
Lightweight provisioning status for any async-created resource.
Represents a Sandbox Pool for pre-provisioned sandboxes.
A single chunk of streaming output from command execution.
Handle to a running command with streaming output and auto-reconnect.
Async handle to a running command with streaming output and auto-reconnect.
Async client for interacting with the Sandbox Server API.
Client for interacting with the Sandbox Server API.
Represents an active sandbox for running commands and file operations.
Base exception for sandbox client errors.
Raised when the API endpoint returns an unexpected error.
Raised when authentication fails (invalid or missing API key).
Raised when connection to the sandbox server fails.
Raised when the server sends a 1001 Going Away close frame.
Raised when a resource is not found.
Raised when an operation times out.
Raised when deleting a resource that is still in use.
Raised when creating a resource that already exists.
Raised when updating a resource name to one that already exists.
Raised when request validation fails.
Raised when organization quota limits are exceeded.
Raised when resource provisioning fails.
Raised when dataplane_url is not available for the sandbox.
Raised when attempting to interact with a sandbox that is not ready.
Raised when a sandbox operation fails (run, read, write).
Raised when a command exceeds its timeout.
Sync httpx transport that retries on transient errors.
Async httpx transport that retries on transient errors.
Represents an active sandbox for running commands and file operations async.
A category for categorical feedback.
Configuration to define a type of feedback.
Evaluation result.
Batch evaluation results.
Evaluator interface class.
Feedback scores for the results of comparative evaluations.
A dynamic evaluator that wraps a function and transforms it into a RunEvaluator.
Compare predictions (as traces) from 2 or more runs.
Configuration for a categorical score.
Configuration for a continuous score.
A class for building LLM-as-a-judge evaluators.
Represents the results of an evaluate() call.
Represents the results of an evaluate_comparative() call.
Grades the run's string input, output, and optional answer.
Configure global LangSmith tracing context.
Validate that the dict only contains allowed keys.
Configure the global prompt cache.
Configure the global prompt cache.
Get the error message for an invalid prompt identifier.
Return True if tracing is enabled.
Return True if testing is enabled.
Validate specified keyword args are mutually exclusive.
Raise an error with the response text.
Get the value of a string enum.
Log a message at the specified level, but only once.
Extract messages from the given inputs dictionary.
Retrieve the message generation from the given outputs.
Retrieve the prompt from the given inputs.
Get the LLM generation from the outputs.
Get the correct docker compose command for this system.
Convert a LangChain message to an example.
Check if the given object is similar to BaseMessage.
Check if the given environment variable is truish.
Retrieve an environment variable from a list of namespaces.
Get the project name for a LangSmith tracer.
Temporarily adds specified filters to a logger.
Get the testing cache directory.
Filter request headers based on ignore_hosts and allow_hosts.
Use a cache for requests.
Use a cache for requests.
Deep copy a value with a compromise for uncopyable objects.
Check if the current version is greater or equal to the target version.
Parse a string in the format of owner/name:hash, name:hash, owner/name, or name.
Get the LangSmith API URL from the environment or the given value.
Get the API key from the environment or the given value.
Get workspace ID.
Get the host URL based on the web URL or API URL.
Check if the value is truish.
Set a boolean flag for LangSmith output.
Call immediately after command line options are parsed (pytest v7).
Handle args in pytest v8+.
Apply LangSmith tracking to tests marked with @pytest.mark.langsmith.
Remove the short test-status character outputs ("./F").
Register the 'langsmith' marker.
Create an anonymizer function.