Blog/Product
·Product

Introducing Coregit

Alemzhan JakipovAlemzhan Jakipov

Two developers from Kazakhstan built a Git API that commits 1,000 files in one call while GitHub needs 105. It's 3.6x faster, handles 15,000 requests per hour instead of 500, and runs on Cloudflare Workers with zero servers to manage.

This is Coregit — a full reimplementation of Git's object model in TypeScript, designed from the ground up for AI agents. Not a GitHub wrapper. Not a proxy.

Why Git needs a new API

Every AI coding agent needs to persist code. Today, that means wrangling the GitHub Content API — an API designed for human developers browsing repositories in a web UI.

The problem isn't just rate limits. It's fundamental architecture. To commit a single file on GitHub, an agent makes 5 sequential API calls:

  1. GET /repos/:owner/:repo/git/ref/heads/main — get current HEAD
  2. POST /repos/:owner/:repo/git/blobs — create blob object
  3. POST /repos/:owner/:repo/git/trees — create tree with new blob
  4. POST /repos/:owner/:repo/git/commits — create commit pointing to tree
  5. PATCH /repos/:owner/:repo/git/refs/heads/main — update branch ref

For 100 files, that's 105 sequential API calls. Each one crosses the network, hits authentication, passes through load balancers, and waits for a response. The latency scales linearly with file count.

We asked: what if Git hosting was designed for agents from day one?

Git, reimplemented for V8 isolates

Coregit doesn't shell out to git. It doesn't use libgit2. The entire Git object model — blobs, trees, commits, tags, packfiles, refs — is implemented from scratch in TypeScript.

Why? Because Cloudflare Workers are V8 isolates. No filesystem. No native bindings. No subprocess calls. By reimplementing Git in pure TypeScript, we get sub-millisecond cold starts, global edge deployment, and zero infrastructure to manage.

The R2 key structure mirrors Git's own object addressing:

{orgId}/{repoSlug}/
  objects/{sha[0:2]}/{sha[2:40]}    ← loose objects (zlib-compressed)
  pack/{packId}.pack                 ← packfiles for clone/fetch
  refs/heads/{branch}                ← branch refs
  HEAD                               ← symbolic ref

Every git object is content-addressed by SHA-1 hash — identical to what git itself produces. You can git clone a Coregit repo, push to it, and everything works because the underlying data format is the same.

One API call, 1000 files

Coregit's commit endpoint accepts up to 1,000 file changes in a single request:

import { createCoregitClient } from '@coregit/sdk'

const cg = createCoregitClient({ apiKey: 'cgk_live_...' })

await cg.commits.create('my-repo', {
  branch: 'main',
  message: 'feat: scaffold project',
  changes: [
    { path: 'src/app.ts', content: 'export default function App() {}' },
    { path: 'src/utils.ts', content: 'export const sum = (a, b) => a + b' },
    // ... up to 1000 files
  ]
})

Under the hood, this single call does what would take 105+ GitHub API calls:

  1. Resolves the branch ref to the current HEAD commit SHA
  2. Flattens the current tree into a Map<path, {sha, mode}> (cached by immutable commit SHA)
  3. Applies all changes — create, edit, delete, rename
  4. Builds new Git tree objects bottom-up with parallel subtree construction
  5. Creates the commit object with proper parent chain
  6. Updates the branch ref atomically via compare-and-swap

The "edit" action supports surgical modifications: range-based line replacement or old_string/new_string find-replace — so agents don't need to send entire file contents for small changes.

Fire-and-forget writes

The commit endpoint returns a 201 with the commit SHA before all work is complete. Here's the trick: object writes go to a Durable Object (RepoHotDO) that acknowledges in ~2ms. The R2 durability write fires in parallel but is never awaited — it completes 200-500ms later, after the response is already on the wire.

Client ← 201 { sha, tree_sha }     ← ~2ms after DO ack
         ↓ (background)
         R2 write completes          ← 200-500ms later
         Semantic indexing queued     ← via ctx.waitUntil()
         Usage tracking recorded      ← fire-and-forget
         Cache warming                ← .catch(() => {})

The Durable Object flushes accumulated objects to R2 every 30 seconds via an alarm. If the buffer exceeds 2,000 objects, it triggers backpressure (HTTP 507) and falls back to direct R2 writes.

With a session (X-Session-Id), writes are even more aggressive: objects stay exclusively in SessionDO until the session closes. An agent doing 50 sequential commits in a session pays ~2ms per object write instead of ~200ms — that's the difference between seconds and minutes of cumulative write latency.

Six-layer cache hierarchy

Reads in Coregit pass through six cache layers, each trading latency for durability:

LayerLatencyWhat it caches
In-memory Map0msPer-request, capped at 32MB
Durable Object (RepoHotDO)~2msUnflushed writes from recent commits
KV~5msRefs, tree flattening, embeddings
Edge Cache API<5msImmutable git objects (1-year TTL)
Hyperdrive~10msPostgreSQL query results
R250-200msFinal durability layer

Git objects are immutable and content-addressed — the same SHA always maps to the same bytes. This means cache invalidation is essentially free. Once an object is cached at the edge, it stays valid forever.

Zero-Wait Protocol

AI agents make many sequential API calls — read a file, think, edit, commit, read again. Each call normally requires full authentication (hash the key, check KV cache, maybe hit the database).

Coregit's Zero-Wait Protocol eliminates this overhead:

# Open a session (auth validated once)
SESSION=$(curl -X POST https://api.coregit.dev/v1/session/open \
  -H "x-api-key: cgk_live_..." | jq -r '.sessionId')

# All subsequent requests skip auth DB/KV lookups
curl https://api.coregit.dev/v1/repos/my-repo/files/src/app.ts \
  -H "X-Session-Id: $SESSION"

Session auth is validated against a Durable Object in <1ms (warm). Writes are deferred to the session — flushed to R2 only on close or after 30 minutes of inactivity. For a typical agent workflow of 50+ API calls, this saves hundreds of milliseconds in cumulative auth overhead.

Coregit includes semantic code search powered by Voyage AI (voyage-code-3) embeddings and Pinecone vector storage.

cgt semantic-search my-project "how does authentication work"

The search pipeline is more sophisticated than a simple vector lookup:

  1. Embed the query via Voyage AI (1024-dimensional vector)
  2. Over-fetch 150 candidates from Pinecone (more than you need)
  3. Post-filter by tree membership — only return results that exist in the target commit's tree (version-aware search)
  4. Rerank top 30 via Voyage rerank-2.5 for relevance
  5. MMR diversification (lambda=0.3) — penalize same-file duplicates so results span the codebase
  6. Return top-k with optional context expansion (±20 surrounding lines)

Embeddings are content-addressed: vectors are keyed by blob SHA, so identical code across branches is never re-embedded. The embedding cache lives in KV — SHA-256 of text content maps to the precomputed vector.

Combined with Tree-sitter code graph analysis (30+ languages), you get function call graphs, dependency tracking, and cross-file relationship mapping — all via API.

Git Smart HTTP — real Git, not a wrapper

Coregit implements the full Git Smart HTTP transport protocol:

# Clone works out of the box
git clone https://api.coregit.dev/git/org/repo.git

# Push works too
git push origin main

The packfile parser handles all Git object types including OFS_DELTA and REF_DELTA compression. Push triggers automatic export to GitHub/GitLab if sync is configured — so you can use Coregit as the write layer and mirror to GitHub for visibility.

Open source

Coregit is open source under AGPL-3.0. The entire codebase — Git implementation, caching layers, search pipeline, auth system — is public at github.com/coregit-inc/coregit-api.

Self-host on Cloudflare Workers with your own R2, KV, and Durable Objects. Or use our hosted service at coregit.dev.

Get started

npx coregit-wizard@latest

The wizard creates your account, API keys, and configures your agent — no browser needed.

Y CombinatorDiscuss on Hacker News