Middleware

Built-in Middleware

TanStack AI ships ready-made middleware so you don't have to hand-roll the common cases. Each one is an ordinary ChatMiddleware — drop it into the middleware array of any chat() call. This page documents every built-in.

MiddlewareImportWhat it does
toolCacheMiddleware@tanstack/ai/middlewaresCache tool-call results by name + arguments
contentGuardMiddleware@tanstack/ai/middlewaresRedact / transform / block streamed text content
otelMiddleware@tanstack/ai/middlewares/otelEmit OpenTelemetry spans + GenAI metrics

toolCacheMiddleware and contentGuardMiddleware are exported from the main @tanstack/ai/middlewares barrel. otelMiddleware lives on its own subpath (@tanstack/ai/middlewares/otel) so that importing the barrel never eagerly pulls in @opentelemetry/api (an optional peer dependency).

toolCacheMiddleware

Caches tool call results based on tool name and arguments. When a tool is called with the same name and arguments as a previous call, the cached result is returned immediately without re-executing the tool.

ts
import { chat } from "@tanstack/ai";
import { toolCacheMiddleware } from "@tanstack/ai/middlewares";

const stream = chat({
  adapter: openaiText("gpt-5.5"),
  messages,
  tools: [weatherTool, stockTool],
  middleware: [
    toolCacheMiddleware({
      ttl: 60_000, // Cache entries expire after 60 seconds
      maxSize: 50, // Keep at most 50 entries (LRU eviction)
      toolNames: ["getWeather"], // Only cache specific tools
    }),
  ],
});

Options:

OptionTypeDefaultDescription
maxSizenumber100Maximum cache entries. Oldest evicted first (LRU). Only applies to the default in-memory storage.
ttlnumberInfinityTime-to-live in milliseconds. Expired entries are not served.
toolNamesstring[]All toolsOnly cache these tools. Others pass through.
keyFn(toolName, args) => stringJSON.stringify([toolName, args])Custom cache key derivation.
storageToolCacheStorageIn-memory MapCustom storage backend. When provided, maxSize is ignored — the storage manages its own capacity.

Behaviors:

  • Only successful tool calls are cached — errors are never stored

  • Cache hits trigger { type: 'skip', result } via onBeforeToolCall

  • LRU eviction: when maxSize is reached, the oldest entry is removed (default storage only)

  • Cache hits refresh the entry's LRU position (moved to most-recently-used)

    Custom key function — useful when you want to ignore certain arguments:

ts
function isRecord(value: unknown): value is Record<string, unknown> {
  return typeof value === "object" && value !== null;
}

toolCacheMiddleware({
  keyFn: (toolName, args) => {
    // Ignore pagination, cache by query only. `args` is `unknown`, so
    // narrow it with a type guard before destructuring.
    if (!isRecord(args)) return JSON.stringify([toolName, args]);
    const { page, ...rest } = args;
    return JSON.stringify([toolName, rest]);
  },
});

Custom Storage

By default the cache lives in-memory and is scoped to a single toolCacheMiddleware() instance. Pass a storage option to use an external backend like Redis, localStorage, or a database. This also enables sharing a cache across multiple chat() calls.

The storage interface:

ts
// Implement this interface (exported from `@tanstack/ai/middlewares`):
interface ToolCacheStorage {
  getItem: (key: string) => ToolCacheEntry | undefined | Promise<ToolCacheEntry | undefined>;
  setItem: (key: string, value: ToolCacheEntry) => void | Promise<void>;
  deleteItem: (key: string) => void | Promise<void>;
}

// ToolCacheEntry is { result: unknown; timestamp: number }

All methods may return a Promise for async backends. The middleware handles TTL checking — your storage just needs to store and retrieve entries.

Redis example:

ts
import { createClient } from "redis";
import { toolCacheMiddleware, type ToolCacheStorage } from "@tanstack/ai/middlewares";

const redis = createClient();

const redisStorage: ToolCacheStorage = {
  getItem: async (key) => {
    const raw = await redis.get(`tool-cache:${key}`);
    return raw ? JSON.parse(raw) : undefined;
  },
  setItem: async (key, value) => {
    await redis.set(`tool-cache:${key}`, JSON.stringify(value));
  },
  deleteItem: async (key) => {
    await redis.del(`tool-cache:${key}`);
  },
};

const stream = chat({
  adapter,
  messages,
  tools: [weatherTool],
  middleware: [toolCacheMiddleware({ storage: redisStorage, ttl: 60_000 })],
});

Sharing a cache across requests:

ts
// Create storage once, reuse across chat() calls
const sharedStorage: ToolCacheStorage = {
  getItem: (key) => globalCache.get(key),
  setItem: (key, value) => { globalCache.set(key, value); },
  deleteItem: (key) => { globalCache.delete(key); },
};

// Both requests share the same cache
app.post("/api/chat", async (req) => {
  const stream = chat({
    adapter,
    messages: req.body.messages,
    tools: [weatherTool],
    middleware: [toolCacheMiddleware({ storage: sharedStorage })],
  });
  return toServerSentEventsResponse(stream);
});

contentGuardMiddleware

Filters or transforms streamed text content as it flows through onChunk. Use it to redact sensitive data (SSNs, emails, API keys), enforce a profanity filter, or rewrite text on the fly. Rules are applied to TEXT_MESSAGE_CONTENT chunks; all other chunk types pass through untouched.

ts
import { chat } from "@tanstack/ai";
import { contentGuardMiddleware } from "@tanstack/ai/middlewares";

const stream = chat({
  adapter: openaiText("gpt-5.5"),
  messages,
  middleware: [
    contentGuardMiddleware({
      rules: [
        // Regex + replacement
        { pattern: /\b\d{3}-\d{2}-\d{4}\b/g, replacement: "[SSN REDACTED]" },
        // Custom transform function
        { fn: (text) => text.replaceAll("badword", "****") },
      ],
      strategy: "buffered",
    }),
  ],
});

Options:

OptionTypeDefaultDescription
rulesContentGuardRule[]Required. Applied in order; each rule receives the previous rule's output. A rule is either { pattern: RegExp; replacement: string } or { fn: (text: string) => string }.
strategy'delta' | 'buffered''buffered'How content is matched. See below.
bufferSizenumber50(Buffered only) Characters held back before emitting, so patterns spanning chunk boundaries still match. Set it ≥ the longest pattern you expect. Flushed at stream end.
blockOnMatchbooleanfalseWhen true, drop the entire chunk if any rule changes the content (instead of emitting the filtered version).
onFiltered(info: ContentFilteredInfo) => voidCallback fired whenever a rule changes content. Receives { messageId, original, filtered, strategy }.

Matching strategies:

  • 'buffered' (default) — Accumulates content and applies rules to the settled portion, holding back a bufferSize look-behind window so a pattern split across two chunks ("...123-45" then "-6789...") is still caught. The buffer is flushed when the message or run ends. Use this for anything that can span deltas — which is most redaction.

  • 'delta' — Applies rules to each delta in isolation as it arrives. Fastest and lowest-latency, but a pattern split across a chunk boundary may slip through. Use only when your patterns are guaranteed to fit within a single delta.

    Behaviors:

  • Only TEXT_MESSAGE_CONTENT chunks are inspected; every other chunk type passes through.

  • A rule that doesn't change the text is a no-op — the chunk passes through unchanged.

  • With blockOnMatch: true, a matched chunk is dropped entirely (returns null from onChunk) rather than emitting the redacted text.

  • The onFiltered callback is for observability/audit — it fires with the before/after text but does not alter what is emitted.

otelMiddleware

Emits vendor-neutral OpenTelemetry traces and metrics for every chat() call — a root span per call, a child span per agent-loop iteration, and a grandchild span per tool execution, all tagged with GenAI semantic-convention attributes.

ts
import { chat } from "@tanstack/ai";
import { otelMiddleware } from "@tanstack/ai/middlewares/otel";
import { trace, metrics } from "@opentelemetry/api";

const otel = otelMiddleware({
  tracer: trace.getTracer("my-app"),
  meter: metrics.getMeter("my-app"), // optional — enables GenAI histograms
});

const result = await chat({
  adapter: openaiText("gpt-5.5"),
  messages,
  middleware: [otel],
});

otelMiddleware has its own configuration surface (content capture, redaction, span-name formatting, attribute enrichment, lifecycle callbacks) and requires the optional @opentelemetry/api peer dependency. See the dedicated OpenTelemetry guide for full setup, the span/metric catalogue, and all options.

Writing your own

These built-ins are just ChatMiddleware objects — nothing about them is privileged. To build your own, see the Middleware guide for the full hook reference, the context object, and composition rules.

Next Steps