Introducing Coregit

Two developers from Kazakhstan built a Git API that commits 1,000 files in one call while GitHub needs 105. It's 3.6x faster, handles 15,000 requests per hour instead of 500, and runs on serverless compute with zero servers to manage.

This is Coregit — a full reimplementation of Git's object model in TypeScript, designed from the ground up for AI agents. Not a GitHub wrapper. Not a proxy.

Why Git needs a new API

Every AI coding agent needs to persist code. Today, that means wrangling the GitHub Content API — an API designed for human developers browsing repositories in a web UI.

The problem isn't just rate limits. It's fundamental architecture. To commit a single file on GitHub, an agent makes 5 sequential API calls:

GET /repos/:owner/:repo/git/ref/heads/main — get current HEAD
POST /repos/:owner/:repo/git/blobs — create blob object
POST /repos/:owner/:repo/git/trees — create tree with new blob
POST /repos/:owner/:repo/git/commits — create commit pointing to tree
PATCH /repos/:owner/:repo/git/refs/heads/main — update branch ref

For 100 files, that's 105 sequential API calls. Each one crosses the network, hits authentication, passes through load balancers, and waits for a response. The latency scales linearly with file count.

We asked: what if Git hosting was designed for agents from day one?

Git, reimplemented for V8 isolates

Coregit doesn't shell out to git. It doesn't use libgit2. The entire Git object model — blobs, trees, commits, tags, packfiles, refs — is implemented from scratch in TypeScript.

Why? Because our edge compute runs in V8 isolates. No filesystem. No native bindings. No subprocess calls. By reimplementing Git in pure TypeScript, we get sub-millisecond cold starts, global edge deployment, and zero infrastructure to manage.

The object storage key structure mirrors Git's own object addressing:

{orgId}/{repoSlug}/
  objects/{sha[0:2]}/{sha[2:40]}    ← loose objects (zlib-compressed)
  pack/{packId}.pack                 ← packfiles for clone/fetch
  refs/heads/{branch}                ← branch refs
  HEAD                               ← symbolic ref

Every git object is content-addressed by SHA-1 hash — identical to what git itself produces. You can git clone a Coregit repo, push to it, and everything works because the underlying data format is the same.

One API call, 1000 files

Coregit's commit endpoint accepts up to 1,000 file changes in a single request:

import { createCoregitClient } from '@coregit/sdk'

const cg = createCoregitClient({ apiKey: 'cgk_live_...' })

await cg.commits.create('my-repo', {
  branch: 'main',
  message: 'feat: scaffold project',
  changes: [
    { path: 'src/app.ts', content: 'export default function App() {}' },
    { path: 'src/utils.ts', content: 'export const sum = (a, b) => a + b' },
    // ... up to 1000 files
  ]
})

Under the hood, this single call does what would take 105+ GitHub API calls:

Resolves the branch ref to the current HEAD commit SHA
Flattens the current tree into a Map<path, {sha, mode}> (cached by immutable commit SHA)
Applies all changes — create, edit, delete, rename
Builds new Git tree objects bottom-up with parallel subtree construction
Creates the commit object with proper parent chain
Updates the branch ref atomically via compare-and-swap

The "edit" action supports surgical modifications: range-based line replacement or old_string/new_string find-replace — so agents don't need to send entire file contents for small changes.

Fire-and-forget writes

The commit endpoint returns a 201 with the commit SHA before all work is complete. Here's the trick: object writes go to a coordination layer (RepoHotDO) that acknowledges in ~2ms. The storage durability write fires in parallel but is never awaited — it completes 200-500ms later, after the response is already on the wire.

Client ← 201 { sha, tree_sha }     ← ~2ms after coordination ack
         ↓ (background)
         Storage write completes     ← 200-500ms later
         Semantic indexing queued     ← via ctx.waitUntil()
         Usage tracking recorded      ← fire-and-forget
         Cache warming                ← .catch(() => {})

The coordination layer flushes accumulated objects to storage every 30 seconds via an alarm. If the buffer exceeds 2,000 objects, it triggers backpressure (HTTP 507) and falls back to direct storage writes.

With a session (X-Session-Id), writes are even more aggressive: objects stay exclusively in session-scoped coordination storage until the session closes. An agent doing 50 sequential commits in a session pays ~2ms per object write instead of ~200ms — that's the difference between seconds and minutes of cumulative write latency.

Six-layer cache hierarchy

Reads in Coregit pass through six cache layers, each trading latency for durability:

Layer	Latency	What it caches
In-memory Map	0ms	Per-request, capped at 32MB
Coordination layer (RepoHotDO)	~2ms	Unflushed writes from recent commits
Edge cache	~5ms	Refs, tree flattening, embeddings
Edge Cache API	<5ms	Immutable git objects (1-year TTL)
Database pooler	~10ms	PostgreSQL query results
Object storage	50-200ms	Final durability layer

Git objects are immutable and content-addressed — the same SHA always maps to the same bytes. This means cache invalidation is essentially free. Once an object is cached at the edge, it stays valid forever.

Zero-Wait Protocol

AI agents make many sequential API calls — read a file, think, edit, commit, read again. Each call normally requires full authentication (hash the key, check edge cache, maybe hit the database).

Coregit's Zero-Wait Protocol eliminates this overhead:

# Open a session (auth validated once)
SESSION=$(curl -X POST https://api.coregit.dev/v1/session/open \
  -H "x-api-key: cgk_live_..." | jq -r '.sessionId')

# All subsequent requests skip auth DB/cache lookups
curl https://api.coregit.dev/v1/repos/my-repo/files/src/app.ts \
  -H "X-Session-Id: $SESSION"

Session auth is validated against a coordination layer in <1ms (warm). Writes are deferred to the session — flushed to object storage only on close or after 30 minutes of inactivity. For a typical agent workflow of 50+ API calls, this saves hundreds of milliseconds in cumulative auth overhead.

AI-native search

Coregit includes semantic code search powered by AI code embeddings and a vector database for storage.

cgt semantic-search my-project "how does authentication work"

The search pipeline is more sophisticated than a simple vector lookup:

Embed the query via AI code embeddings (1024-dimensional vector)
Over-fetch 150 candidates from the vector database (more than you need)
Post-filter by tree membership — only return results that exist in the target commit's tree (version-aware search)
Rerank top 30 via an AI cross-attention reranker for relevance
MMR diversification (lambda=0.3) — penalize same-file duplicates so results span the codebase
Return top-k with optional context expansion (±20 surrounding lines)

Embeddings are content-addressed: vectors are keyed by blob SHA, so identical code across branches is never re-embedded. The embedding cache lives in the edge cache — SHA-256 of text content maps to the precomputed vector.

Combined with AST-based code graph analysis (30+ languages), you get function call graphs, dependency tracking, and cross-file relationship mapping — all via API.

Git Smart HTTP — real Git, not a wrapper

Coregit implements the full Git Smart HTTP transport protocol:

# Clone works out of the box
git clone https://api.coregit.dev/git/org/repo.git

# Push works too
git push origin main

The packfile parser handles all Git object types including OFS_DELTA and REF_DELTA compression. Push triggers automatic export to GitHub/GitLab if sync is configured — so you can use Coregit as the write layer and mirror to GitHub for visibility.

Get started

npx coregit-wizard@latest

The wizard creates your account, API keys, and configures your agent — no browser needed.

Discuss on Hacker News