Fits at the API layer
Add semantic caching where request behavior is visible, testable, and easier to reason about.
Velcio
Velcio builds small, practical infrastructure tools for teams shipping AI
features in production. Start with fastapi-semcache,
semantic caching middleware for FastAPI and LLM endpoints.
Open source first. Product focused. Built for developers who want clear behavior and low operational weight.
Current product
fastapi-semcache helps reduce repeated model work by
reusing responses when requests are semantically similar enough to
serve from cache. It fits at the API layer, where the behavior is
visible, testable, and easier to reason about.
Add semantic caching where request behavior is visible, testable, and easier to reason about.
Reuse responses for semantically similar requests instead of paying for near-duplicates again.
Start with a few expensive endpoints and expand only where hit quality justifies it.
How it works
The mechanism is simple: compare incoming requests to prior ones, reuse when the match is strong enough, and fall through when it is not.
01
Traffic reaches the middleware before your FastAPI handler or LLM-backed endpoint runs.
02
The middleware compares the incoming request to prior ones and looks for a strong semantic match.
03
Return a cached response on a good match, or execute the full request and store the result for later reuse.
This gives you a concrete control point for latency, cost, and repeat traffic without turning your application into a black box.
Use cases
Repeated prompts, paraphrased follow-ups, and common support questions are natural candidates for semantic reuse.
Knowledge assistants and workflow helpers often see the same intent phrased many different ways.
If repeated work is driving latency or spend, semantic caching gives you an explicit place to control it.
Proof and trust
Velcio should earn trust with mechanism, scope, and readable source, not inflated claims.
The source is public, the behavior is inspectable, and adoption does not depend on a hosted control plane.
This is a focused tool for semantic caching, not a vague platform promise.
For Postgres teams, pgvector keeps the storage model close to infrastructure you already understand.
More tools coming
Velcio is building a focused set of infrastructure tools for AI applications. The goal is not to replace your stack. The goal is to make critical pieces of it simpler.