Velcio

Open-source-first infrastructure primitives for AI applications.

Velcio builds small, practical infrastructure tools for teams shipping AI features in production. Start with fastapi-semcache, semantic caching middleware for FastAPI and LLM endpoints.

Open source first. Product focused. Built for developers who want clear behavior and low operational weight.

Current product

Semantic caching for FastAPI and LLM endpoints.

fastapi-semcache helps reduce repeated model work by reusing responses when requests are semantically similar enough to serve from cache. It fits at the API layer, where the behavior is visible, testable, and easier to reason about.

Fits at the API layer

Add semantic caching where request behavior is visible, testable, and easier to reason about.

Reduces repeated model work

Reuse responses for semantically similar requests instead of paying for near-duplicates again.

Adopts incrementally

Start with a few expensive endpoints and expand only where hit quality justifies it.

How it works

Same endpoint, fewer unnecessary model calls.

The mechanism is simple: compare incoming requests to prior ones, reuse when the match is strong enough, and fall through when it is not.

01

Receive the request

Traffic reaches the middleware before your FastAPI handler or LLM-backed endpoint runs.

02

Check for a close match

The middleware compares the incoming request to prior ones and looks for a strong semantic match.

03

Reuse or fall through

Return a cached response on a good match, or execute the full request and store the result for later reuse.

This gives you a concrete control point for latency, cost, and repeat traffic without turning your application into a black box.

Use cases

Useful anywhere requests repeat in meaning.

Chat and assistant APIs

Repeated prompts, paraphrased follow-ups, and common support questions are natural candidates for semantic reuse.

Internal AI tools

Knowledge assistants and workflow helpers often see the same intent phrased many different ways.

Cost-sensitive LLM endpoints

If repeated work is driving latency or spend, semantic caching gives you an explicit place to control it.

Proof and trust

Built to be inspected.

Velcio should earn trust with mechanism, scope, and readable source, not inflated claims.

Open source first

The source is public, the behavior is inspectable, and adoption does not depend on a hosted control plane.

Narrow by design

This is a focused tool for semantic caching, not a vague platform promise.

Familiar infrastructure

For Postgres teams, pgvector keeps the storage model close to infrastructure you already understand.

More tools coming

fastapi-semcache is the first primitive, not the whole story.

Velcio is building a focused set of infrastructure tools for AI applications. The goal is not to replace your stack. The goal is to make critical pieces of it simpler.

Small surface area Clear operational value Fits existing stacks