Introducing Coregit
Two developers from Kazakhstan built a Git API that commits 1,000 files in one call while GitHub needs 105. It's 3.6x faster, handles 15,000 requests per hour instead of 500, and runs on Cloudflare Workers with zero servers to manage.
This is Coregit — a full reimplementation of Git's object model in TypeScript, designed from the ground up for AI agents. Not a GitHub wrapper. Not a proxy.
Why Git needs a new API
Every AI coding agent needs to persist code. Today, that means wrangling the GitHub Content API — an API designed for human developers browsing repositories in a web UI.
The problem isn't just rate limits. It's fundamental architecture. To commit a single file on GitHub, an agent makes 5 sequential API calls:
GET /repos/:owner/:repo/git/ref/heads/main— get current HEADPOST /repos/:owner/:repo/git/blobs— create blob objectPOST /repos/:owner/:repo/git/trees— create tree with new blobPOST /repos/:owner/:repo/git/commits— create commit pointing to treePATCH /repos/:owner/:repo/git/refs/heads/main— update branch ref
For 100 files, that's 105 sequential API calls. Each one crosses the network, hits authentication, passes through load balancers, and waits for a response. The latency scales linearly with file count.
We asked: what if Git hosting was designed for agents from day one?
Git, reimplemented for V8 isolates
Coregit doesn't shell out to git. It doesn't use libgit2. The entire Git object model — blobs, trees, commits, tags, packfiles, refs — is implemented from scratch in TypeScript.
Why? Because Cloudflare Workers are V8 isolates. No filesystem. No native bindings. No subprocess calls. By reimplementing Git in pure TypeScript, we get sub-millisecond cold starts, global edge deployment, and zero infrastructure to manage.
The R2 key structure mirrors Git's own object addressing:
{orgId}/{repoSlug}/
objects/{sha[0:2]}/{sha[2:40]} ← loose objects (zlib-compressed)
pack/{packId}.pack ← packfiles for clone/fetch
refs/heads/{branch} ← branch refs
HEAD ← symbolic refEvery git object is content-addressed by SHA-1 hash — identical to what git itself produces. You can git clone a Coregit repo, push to it, and everything works because the underlying data format is the same.
One API call, 1000 files
Coregit's commit endpoint accepts up to 1,000 file changes in a single request:
import { createCoregitClient } from '@coregit/sdk'
const cg = createCoregitClient({ apiKey: 'cgk_live_...' })
await cg.commits.create('my-repo', {
branch: 'main',
message: 'feat: scaffold project',
changes: [
{ path: 'src/app.ts', content: 'export default function App() {}' },
{ path: 'src/utils.ts', content: 'export const sum = (a, b) => a + b' },
// ... up to 1000 files
]
})Under the hood, this single call does what would take 105+ GitHub API calls:
- Resolves the branch ref to the current HEAD commit SHA
- Flattens the current tree into a
Map<path, {sha, mode}>(cached by immutable commit SHA) - Applies all changes — create, edit, delete, rename
- Builds new Git tree objects bottom-up with parallel subtree construction
- Creates the commit object with proper parent chain
- Updates the branch ref atomically via compare-and-swap
The "edit" action supports surgical modifications: range-based line replacement or old_string/new_string find-replace — so agents don't need to send entire file contents for small changes.
Fire-and-forget writes
The commit endpoint returns a 201 with the commit SHA before all work is complete. Here's the trick: object writes go to a Durable Object (RepoHotDO) that acknowledges in ~2ms. The R2 durability write fires in parallel but is never awaited — it completes 200-500ms later, after the response is already on the wire.
Client ← 201 { sha, tree_sha } ← ~2ms after DO ack
↓ (background)
R2 write completes ← 200-500ms later
Semantic indexing queued ← via ctx.waitUntil()
Usage tracking recorded ← fire-and-forget
Cache warming ← .catch(() => {})The Durable Object flushes accumulated objects to R2 every 30 seconds via an alarm. If the buffer exceeds 2,000 objects, it triggers backpressure (HTTP 507) and falls back to direct R2 writes.
With a session (X-Session-Id), writes are even more aggressive: objects stay exclusively in SessionDO until the session closes. An agent doing 50 sequential commits in a session pays ~2ms per object write instead of ~200ms — that's the difference between seconds and minutes of cumulative write latency.
Six-layer cache hierarchy
Reads in Coregit pass through six cache layers, each trading latency for durability:
| Layer | Latency | What it caches |
|---|---|---|
| In-memory Map | 0ms | Per-request, capped at 32MB |
| Durable Object (RepoHotDO) | ~2ms | Unflushed writes from recent commits |
| KV | ~5ms | Refs, tree flattening, embeddings |
| Edge Cache API | <5ms | Immutable git objects (1-year TTL) |
| Hyperdrive | ~10ms | PostgreSQL query results |
| R2 | 50-200ms | Final durability layer |
Git objects are immutable and content-addressed — the same SHA always maps to the same bytes. This means cache invalidation is essentially free. Once an object is cached at the edge, it stays valid forever.
Zero-Wait Protocol
AI agents make many sequential API calls — read a file, think, edit, commit, read again. Each call normally requires full authentication (hash the key, check KV cache, maybe hit the database).
Coregit's Zero-Wait Protocol eliminates this overhead:
# Open a session (auth validated once)
SESSION=$(curl -X POST https://api.coregit.dev/v1/session/open \
-H "x-api-key: cgk_live_..." | jq -r '.sessionId')
# All subsequent requests skip auth DB/KV lookups
curl https://api.coregit.dev/v1/repos/my-repo/files/src/app.ts \
-H "X-Session-Id: $SESSION"Session auth is validated against a Durable Object in <1ms (warm). Writes are deferred to the session — flushed to R2 only on close or after 30 minutes of inactivity. For a typical agent workflow of 50+ API calls, this saves hundreds of milliseconds in cumulative auth overhead.
AI-native search
Coregit includes semantic code search powered by Voyage AI (voyage-code-3) embeddings and Pinecone vector storage.
cgt semantic-search my-project "how does authentication work"The search pipeline is more sophisticated than a simple vector lookup:
- Embed the query via Voyage AI (1024-dimensional vector)
- Over-fetch 150 candidates from Pinecone (more than you need)
- Post-filter by tree membership — only return results that exist in the target commit's tree (version-aware search)
- Rerank top 30 via Voyage
rerank-2.5for relevance - MMR diversification (lambda=0.3) — penalize same-file duplicates so results span the codebase
- Return top-k with optional context expansion (±20 surrounding lines)
Embeddings are content-addressed: vectors are keyed by blob SHA, so identical code across branches is never re-embedded. The embedding cache lives in KV — SHA-256 of text content maps to the precomputed vector.
Combined with Tree-sitter code graph analysis (30+ languages), you get function call graphs, dependency tracking, and cross-file relationship mapping — all via API.
Git Smart HTTP — real Git, not a wrapper
Coregit implements the full Git Smart HTTP transport protocol:
# Clone works out of the box
git clone https://api.coregit.dev/git/org/repo.git
# Push works too
git push origin mainThe packfile parser handles all Git object types including OFS_DELTA and REF_DELTA compression. Push triggers automatic export to GitHub/GitLab if sync is configured — so you can use Coregit as the write layer and mirror to GitHub for visibility.
Open source
Coregit is open source under AGPL-3.0. The entire codebase — Git implementation, caching layers, search pipeline, auth system — is public at github.com/coregit-inc/coregit-api.
Self-host on Cloudflare Workers with your own R2, KV, and Durable Objects. Or use our hosted service at coregit.dev.
Get started
npx coregit-wizard@latestThe wizard creates your account, API keys, and configures your agent — no browser needed.