The Lean AI Build Stack a 2-3 Person Venture Runs
The lean AI build stack a 2-3 person venture actually runs, layer by layer, with a build-vs-buy rule for each. Skip the plumbing, own the moat.
The lean AI build stack a 2-3 person venture runs rents almost everything and owns exactly one thing. Rent the base models, the vector storage, the hosting, and the observability. Build the data, the evals, and the vertical workflow logic, because those are the only layers a competitor cannot buy off a shelf. That single rule carries the whole playbook. Buy the commodity, own the moat, wait on the speculative.
This is a build-vs-buy decision, not a tooling review. It resolves cleanly in 2026 for one reason. Inference cost has collapsed, so intelligence is now a metered utility you rent by the token instead of a capital project you fund. Avante Ventures builds every company this way on purpose, which is how three people ship a real AI product without a Series A.
Build vs buy: the only stack question that matters early
A 2-3 person team has one scarce resource, and it is not money. It is engineer-attention. Every layer the team decides to build is attention pulled off the one layer that becomes defensible. So the question is never build versus buy in the abstract. It is which specific layers are commodities to rent and which single layer is the moat to own.
The rule is blunt on purpose. Buy the commodity, own the moat, wait on the speculative. Base models, vector storage, hosting, queues, and observability are commodities that vendors run better and cheaper than any three-person team will. The proprietary dataset, the domain evals, and the vertical workflow logic are the moat. Fine-tuning, a self-hosted model, and a custom orchestration framework are speculative until a rented equivalent visibly breaks under your load.
There is a fast test for any layer you are tempted to build. If it would cost the same to run at ten customers or ten thousand, and a vendor already sells it as a metered service, building it is the wrong tool. Ownership only earns its cost on the layer that gets more valuable the more your specific customers use it.
- Buy the commodity. Base models by API, managed Postgres for retrieval, hosting, queues, observability. Real vendors, falling prices, no advantage in running them yourself.
- Own the moat. The proprietary data your workflow generates, the domain eval set, and the vertical logic that encodes how your market actually works.
- Wait on the speculative. Fine-tuning, self-hosting a model, a bespoke framework. Waiting here is a decision, not indecision.
The cost of LLM inference for a fixed level of capability fell from about $60 per million tokens in 2021 to about $0.06 by 2024, a factor of roughly 1,000 in three years, and drops about 10x a year for an equivalent model.
— a16z, LLMflation, 2024
The lean AI build stack, layer by layer
Here is the stack a lean team can assemble this week, walked from the bottom up. Apply the same buy-build-wait rule to each layer. When you are done, the only code the team truly owns is the code that compounds.
The discipline is to keep every commodity layer swappable and every owned layer deep. Rent thin, build thick.
- Model layer. Rent frontier and small models through an API. Route the easy calls to a small model and reserve a frontier model for the hard ones. Hide the provider behind one internal interface so switching stays a config change.
- Retrieval and data layer. Start with managed Postgres and the pgvector extension, not a dedicated vector database. It keeps your relational data and your embeddings in one system you already run. Add a specialized store only when a measured limit forces it.
- Application and hosting layer. Rent serverless hosting and a managed queue. Ship the vertical workflow, the part that encodes how your domain actually works, as your own code.
- Observability and evals layer. Rent the logging and tracing. Build the evals yourself, because a domain eval set is a proprietary asset, not a commodity.
- Feedback layer. Instrument the product so every expert correction is captured as a labeled example. That capture is what later turns usage into a fundable dataset.
Where to buy, where to build, where to wait
Stated flatly, so a founder can apply it in an afternoon. Three columns, and most of the stack lands in the first one.
- Buy now. Base models by API. Managed Postgres with pgvector. Serverless hosting. Managed queues. Observability, tracing, and auth. Commodities with real vendors and prices that fall every quarter.
- Build now. The proprietary dataset and its capture pipeline. The domain eval set. The vertical workflow logic that encodes 10+ years of operator scar tissue. This column is the moat.
- Wait. Fine-tuning, a self-hosted model, a bespoke orchestration framework, a custom vector engine. Each is justified only when a rented equivalent measurably breaks. Until then, waiting keeps runway on the moat.
The single move that matters. Keep a written list of what you rent and what you own, and defend the border. Every quarter something on the rent side will tempt a rebuild. Resist it unless a number, not a hunch, says the vendor broke.
Keeping inference cost off your critical path
Inference cost is a solved problem for a lean team, and treating it as a crisis is its own failure mode. Falling token prices are the reason a 2-3 person venture can deploy a real AI product without a Series A. The market is already cutting this cost by an order of magnitude a year, so do not spend engineer-months chasing it.
Epoch AI, measuring the price to hit a fixed benchmark, finds declines between 9x and 900x per year across tasks, with a median near 50x, and the price to match GPT-4-level performance on PhD-level science questions fell about 40x a year. Building your own inference to save money is building a depreciating asset.
You get the savings without a fine-tuning project through three plain moves.
- Route by difficulty. A cheap model for the easy 80 percent of calls, a frontier model for the hard 20 percent.
- Cache aggressively. Deduplicate repeated prompts and reuse retrieved context so you stop paying twice for the same answer.
- Keep the provider swappable. When a cheaper model clears your evals, switching should be a config change, not a rewrite.
The price to reach a fixed performance bar has fallen between 9x and 900x per year across tasks, with a median around 50x per year. GPT-4-level performance on PhD-level science questions got about 40x cheaper each year.
— Epoch AI, 2025
The one layer worth owning: your data and evals
The moat is never the model, and it is never the infrastructure. Every competitor can call the same API and rent the same managed Postgres, so rented layers cannot be a source of advantage. The durable asset is the proprietary data your workflow generates and the domain evals that prove your system is getting better at the specific job.
This is the copilot to data to fund flywheel stated as a stack decision. Build a copilot to generate proprietary data, then use that data to raise and deploy capital. Every expert correction captured is a labeled example a competitor cannot purchase, because it is produced inside a workflow the competitor does not run. Over quarters, the rented stack stays flat and the owned layer pulls ahead.
Owning the evals matters more than it looks. Without a domain eval set, a team cannot even tell whether a cheaper model is good enough to switch to, so it overpays for the frontier model out of fear. Domain-specific evals are both the moat and the instrument that lets you ride the falling price curve safely. Build the tests only your data can pass.
Failure modes: premature infrastructure
The honest failure mode is premature infrastructure. A tiny team rebuilds a vector database, a model gateway, or an orchestration layer that a vendor would run for a fraction of the cost, and burns its runway on plumbing instead of on the one layer that becomes the moat.
The numbers are unkind to the do-it-yourself instinct. For a modest one-million-vector workload, a reliable self-hosted pgvector setup runs roughly $385 to $915 a month, and the dominant line item is not compute. It is engineer-hours for setup, index tuning, backups, failover, and on-call, often $320 to $720 of that total, against about $99 a month for a managed equivalent. Database work is invisible when it succeeds and catastrophic when it fails, so the variance, not the average month, is what a three-person team cannot afford.
- Premature infrastructure. Building a vector engine, a model gateway, or an orchestration framework before a rented one has measurably broken.
- Fine-tuning too early. Spending weeks to save tokens when tokens fall 10x a year and prompt-and-retrieve would have cleared the bar.
- Owning the wrong layer. Investing in the commodity plumbing and treating the proprietary data and evals as an afterthought.
- Vendor lock-in by neglect. Renting is correct, but wiring one provider so deeply that a 40x-cheaper option cannot be adopted without a rewrite.
- No eval set. Without domain evals you cannot ride the falling price curve, so you either overpay for the frontier model or ship regressions.
A reliable self-hosted vector setup for a one-million-vector workload runs about $385 to $915 a month, dominated by $320 to $720 of engineer-hours, versus roughly $99 a month for a managed equivalent.
— Rivestack, pgvector total cost of ownership
How Avante solves plumbing once across ventures
Avante Ventures treats the build stack as a Build-stage decision with a Compound-stage payoff. The six-stage system runs Research, Partner, Build, Traction, Revenue, Compound, and the studio makes the rent-versus-own call once, then reuses it across the portfolio. Solving company plumbing once routes roughly $300K-$500K of effective capital per venture into product and traction rather than overhead.
That is the arithmetic behind building 3-4 ventures a year on a shared stack while deploying $500K-1.5M per venture. The Brazil and LATAM context sharpens it. Services account for roughly 70% of Brazilian GDP with low software penetration, so the addressable verticals are enormous, and a domain operator with 10+ years of Brazilian-market scar tissue is the one who knows which workflow logic is worth owning. You can read the full argument in why Avante builds as a studio.
AI infrastructure is now cheap enough to deploy without a Series A. So the scarce asset was never model access. It is the proprietary data and evals the owned layer creates while the rest of the market rebuilds plumbing it could have rented. The teams still tuning their own vector index in 2027 will be renting intelligence and calling it a moat. The teams that owned the data will be raising on it.
Frequently asked questions
- What is the lean AI build stack a small team should run?
- The lean AI build stack rents the commodity layers and owns exactly one thing. Rent base models by API, managed Postgres with pgvector for retrieval, serverless hosting, and observability. Own the proprietary data, the domain evals, and the vertical workflow logic, because those are the only layers a competitor cannot buy.
- Should a startup build or buy its AI infrastructure?
- Buy the commodity, own the moat, wait on the speculative. Base models, vector storage, hosting, and observability are commodities a vendor runs cheaper than a small team can, so rent them. The data, evals, and vertical logic are the moat, so build those. Anything speculative like fine-tuning waits until a rented option measurably breaks.
- Why not self-host a vector database to save money?
- Because the cost of self-hosting is engineer-hours, not compute. A reliable one-million-vector setup runs about $385 to $915 a month, mostly $320 to $720 in engineer time for tuning, backups, and on-call, versus about $99 for a managed equivalent. For a 2-3 person team, that attention is better spent on the moat.
- Can a 2-3 person team deploy an AI product without a Series A?
- Yes. The cost of LLM inference for a fixed capability fell roughly 1,000x from 2021 to 2024 and drops about 10x a year, so intelligence is now a metered utility you rent by the token. AI infrastructure is cheap enough to deploy without a Series A. The scarce asset is the proprietary data and evals, not model access.
- What is the biggest mistake in an early AI stack?
- Premature infrastructure. A tiny team rebuilds a vector database, a model gateway, or an orchestration layer a vendor would run for a fraction of the cost, and burns its runway on plumbing instead of the one layer that becomes the moat. Own the data and evals. Rent the rest until a number says the vendor broke.
Want more? Get one essay per week on venture building, AI-native businesses, and the Brazil opportunity.
Browse the Library →