Knitting an AI-Native Software Team (Part-3)

Nov 23

Building Machine(s) That Builds the Machine - Reliable Outcomes With Unreliable Parts

I borrowed this mental model from manufacturing, where the real leverage isn't in making the end product, but in making the machine that makes the product better. Applied to software, this means our focus shouldn't be on using AI to write code faster. It should be on building the machine - the workflows, context, and systems - that uses AI to build the end software.

But here's the crucial insight from years of distributed systems work: we're building this reliable machine using unreliable components. AI models hallucinate, generate incorrect code about 30% of the time, and introduce security vulnerabilities in nearly half of what they produce. Yet we can still build reliable systems with them. How? There are few principles that can be borrowed from our experience of distributed systems from unreliable computers - through composition, redundancy, and error correction (but with some caveats)

Also, this is still an active area of research on how to improve reliability of the overall AI workflow, which involves solving problems related to:

Evaluation process that involves both at a single step and also end-to-end for the workflow
Using the right set of benchmarks, if needed coming up with one that represents our domain use case correctly
Creating tight loops to collect high quality feedback with humans in the loop
Designing the right level of autonomy at each step for your AI Agent.

Composition in Practice: A Real Workflow Example

Caveat: Composition pattern makes a critical assumption: errors at each workflow step are isolated and independent, not cascading in unexpected ways. While this often holds for failures in distributed systems, it can break down for logical mistakes that compound through the AI workflow. This often requires deeper analysis (the holy grail of building reliable AI agents) - you need to validate with real examples from your codebase and benchmark on representative samples of your actual workflows. Only after seeing the pattern work repeatedly on your specific problems, coding standards, and business logic can you have confidence in these reliability numbers.

Think about distributed systems. No single server has 99.999% uptime, but we achieve that reliability through redundancy and health checks. Similarly, no single AI generation is reliable, but we can compose multiple AI operations with verification steps to achieve reliability. The key is making sure these operations do not have the same overlap for errors.

If our coding AI has a 30% error rate and our code review AI has a 20% error rate, individually they're unreliable. But when we compose them - having the review AI check the coding AI's work - assuming the errors in these individual steps are isolated, the combined error rate drops to 6% (0.06 = 0.3 x 0.2). Add a third verification layer, and we push reliability even higher.

We need to build what I call "error-squashing workflows". Also, knowing and evaluating quickly when the “error-squashing” assumption is not correct in a given workflow is key.

A Practical Workflow Example

Let me show you how we envision building reliable and repeatable workflows from unreliable components - An example: The Data Integration with a New Vendor Workflow

Start with context gathering (90% reliable) - extract vendor API documentation, authentication patterns, rate limits, and data schemas. Then map our internal data models to vendor fields. This is fairly reliable but less than pure code dependencies since vendor docs are often incomplete. Next, AI generates the integration code (65% reliable) - lower than typical because each vendor has unique quirks, undocumented behaviors, and business logic requirements. A different AI model reviews for common integration pitfalls like missing error handling, retry logic, and data validation (catches 75% of errors), bringing combined reliability to ~91.2%. Test with sandbox data and validate transformations (catches 85% of remaining errors), pushing reliability to 98.7%. Run production data samples through the pipeline to catch edge cases in real data formats (catches 80% of remaining), reaching 99.5%. Finally, human review for business logic correctness and compliance requirements achieves 99.9%+ reliability.

Vendor integrations have more unknowns and business context, so we need more verification layers and the initial AI generation is less reliable. But by composing multiple verification steps, we still achieve production-grade reliability.

A few other example workflows:

Python package upgrade
Adding a new stage into CI/CD
Writing an SQL query from a natural language task description
Converting a meeting recording into notes and creating followup tickets

The key: this isn't a one-time prompt. It's a repeatable, reliable workflow that can be triggered automatically. Many such workflows come together in building individual components (i.,e features) which become part of your end machine i.,e software product used by your end customers.

Context Engineering as the Path to Reliability

The single most important factor in building reliable AI systems from unreliable components isn't better models - it's better context engineering. This is the discipline of providing the right information, in the right format, at the right time, to maximize the probability of correct output.

Here's my vision for how context management could dramatically improve reliability:

WRITE: Maintain persistent context outside the AI's working memory - like giving it a notebook. Any new information needed in decision making is recorded into this persistent memory.

SELECT: Build semantic search over codebases, documentation, meeting recordings, and operating procedures. Precise retrieval can reduce hallucination.

COMPRESS: Distill information while preserving critical decisions. Less noise means fewer errors.

ISOLATE: Build and run only specialized sub-agents with focused contexts. Compartmentalization prevents error cascades. This is a key factor for “error-squashing workflows”.

The impact of proper context engineering could be transformative - potentially achieving 90% better performance from the same AI model. It's not about perfect AI; it's about building systems that compose unreliable components into reliable outcomes.

The Division of Labor: Two Distinct Speeds of Development

Making this work requires a fundamental reorganization of how we think about development teams. We need developers working at two distinctly different speeds, and this isn't a temporary phase - it's the new permanent structure of AI-native teams.

Think of it this way: in traditional development, everyone worked at roughly the same pace because everyone was constrained by typing speed and human cognition. Now, we need to deliberately split our teams into two groups operating at completely different velocities.

The first group are what I call "context engineers." These developers work slowly and deliberately - they're the architects of the machine itself. Their job is to take our institutional knowledge, our architectural decisions, our business rules, and encode them into reusable, reliable context that AI can consume. They're not just writing documentation; they're creating the semantic infrastructure that makes AI useful. When a context engineer encodes how we handle database migrations or why we chose a particular authentication pattern, that context might be used thousands of times by AI systems. It needs to be perfect, unambiguous, and constantly updated as our system evolves.

The second group are the developers using these AI workflows to build features at unprecedented speed. They can move fast precisely because they're not building the machine - they're operating it. When they need to upgrade a package, they trigger a workflow. When they need to generate an API endpoint, they invoke a pattern. They're working within the guardrails and context that the first group has carefully constructed.

This division isn't about seniority or skill level - it's about the nature of the work. A context engineer working on encoding our authentication patterns needs to think like a compiler writer: their work will run countless times, so every edge case matters. Meanwhile, a developer using these patterns to build customer features needs to think like a product builder: combining reliable components to create value quickly.

What's fascinating is that the same person might operate at both speeds during a single day. In the morning, they might spend three hours carefully crafting context for a new microservice pattern. In the afternoon, they might use AI workflows to rapidly implement five new features. The key is recognizing which mode you're in and adjusting your pace accordingly.

The Premium on Human Skills

The paradox of AI-native development is that as coding - the last mile activity we've historically obsessed over - becomes automated, uniquely human skills become exponentially more valuable. But they're not the skills we traditionally valued in engineers.

Clear Communication: The New Fundamental Communication isn't just important - it's the new fundamental skill. When you're setting context for an AI system, every ambiguity multiplies into errors. I've seen perfectly capable engineers fail with AI because they can't articulate what they want with legal-document-level precision. The developers who succeed are those who can express not just what they want, but why they want it, what constraints exist, and what the success criteria are.

Customer Empathy and Business Context Here's what AI fundamentally cannot do: understand why your customer is frustrated. AI doesn't know that your enterprise client values audit trails over performance, or that your users will abandon the product if onboarding takes more than 3 minutes. The developers who deeply understand customer pain points and business objectives can guide AI to build solutions that actually matter. This isn't just product thinking - it's the ability to infuse every technical decision with customer and business context that AI will never independently possess.

Structured Thinking and Problem Decomposition The ability to break down complex problems into clear, logical steps has become invaluable. AI can handle complexity, but only if you can structure it properly. This means writing detailed specifications that capture not just the happy path, but all the nuances. When a developer can decompose a vague requirement like "make it less memory intensive" into "reduce garbage collection pressure by introducing read memory buffers," AI becomes incredibly powerful.

First-Principles Reasoning The combination of first-principles thinking and fundamental inquisition is perhaps the most underrated skill in AI-native development. It's not just about your ability to reason from fundamentals - it's about using those principles to interrogate the AI's reasoning process itself. When you ask AI fundamental questions like "What assumptions are you making about our data consistency model?" or "What are the core constraints you're optimizing for?", you're forcing it to expose its hidden assumptions and reasoning chains.

This is incredibly powerful because AI often makes implicit assumptions based on common patterns it has seen. By grounding the AI in first principles through systematic questioning, you can validate and correct its foundational thinking before any code is generated. For instance, when designing a distributed system, asking "Why are you choosing eventual consistency over strong consistency?" forces the AI to articulate its assumptions about CAP theorem trade-offs, network partitions, and your specific use case. Once you've corrected any flawed assumptions at this fundamental level, everything that follows - the implementation, the error handling, the optimization strategies - will be built on solid ground. The developers who excel at this aren't just good at thinking from first principles; they're skilled at making AI think and reason using the same set of assumptions and constraints needed for problem solving.

Edge Case Thinking Early Here's what separates great AI-native developers from good ones: thinking about edge cases before the first line of code is generated. AI tends to solve for the happy path unless explicitly told otherwise. The developers who succeed are those who can articulate upfront: "What happens when the network fails mid-transaction? What if two users update simultaneously? What about timezone boundaries?" This proactive edge case thinking prevents the exponential complexity that comes from retrofitting error handling into AI-generated code.

Creating Learning Artifacts for in-context learning This is a superpower I didn't anticipate: the ability to create artifacts that teach AI your patterns. When you need to introduce a new design pattern to your codebase, the developers who can create minimal, clear examples that demonstrate the pattern are invaluable.

For instance, when we wanted to introduce a new event-driven pattern, one developer created a perfect reference implementation - just 50 lines of code that captured every nuance of how we wanted events handled, validated, and logged. That artifact became the template for hundreds of AI-generated implementations. It's like being able to teach by example at scale.

Explicit Design Choices AI is terrible at making design decisions but excellent at implementing them. The developers who explicitly document design choices - "We use dependency injection design principle because...", “We use async implementations in the code because…”, "We chose FastAPI over Django because..." - enable AI to maintain consistency across the entire codebase. This isn't just documentation; it's encoding your architectural philosophy in a way AI can apply repeatedly.

Clear, Actionable Feedback The ability to give precise feedback has become crucial. When AI generates code that's almost right but not quite, can you articulate exactly what's wrong? "The error handling is too generic" is useless feedback for AI. "Each error should include the detail that the AI did not think about, user context, and specific retry instructions" is actionable. The best developers can review AI output and provide feedback that's specific enough for AI to correct course.

Human-AI-Human Collaboration Perhaps the most undervalued skill: orchestrating collaboration between humans and AI systems. This means knowing when to hand off to AI, when to take back control, and how to structure work so multiple humans and AI agents can contribute effectively. It's like being a conductor where half your orchestra is human and half is artificial - you need to understand the strengths and limitations of each to create harmony.

Strategically leverage both asynchronous and synchronous communication strategies. The ability to leverage both asynchronous and synchronous communication strategies becomes critical when human communication and time are the bottleneck. In a world where AI can generate solutions in minutes but humans need hours to evaluate them, we must judiciously allocate our scarce communication bandwidth. Synchronous communication - our most expensive resource - should be reserved for high-stakes alignment: building consensus on architectural decisions, resolving conflicts, and establishing shared mental models. A 30-minute architecture discussion can prevent weeks of rework when AI implements the wrong interpretation at scale.

Asynchronous communication becomes the workhorse for everything else: code reviews where reviewers need time to understand AI-generated logic, documentation updates, and the bookkeeping that keeps everyone aligned without real-time interaction. The best teams create async artifacts from every sync discussion - structured decision documents that AI can reference, ensuring expensive human alignment conversations become reusable context. The developers who excel understand that when human consensus-building is your system's bottleneck, you architect around it just as carefully as you would any other constrained resource.

I've seen developers who were mediocre coders become exceptional AI-native engineers because they mastered these human skills. Conversely, I've seen brilliant algorithmic minds struggle because they can't structure their thoughts in a way that leverages the AI tool to full extent.

Meanwhile, traditional skills are becoming commoditized. Language syntax expertise? Nearly worthless when AI can translate between languages instantly. Framework memorization? Pointless when AI knows every framework better than any human could. Even algorithm implementation - once the hallmark of a good engineer - is now just a prompt away.

The engineers thriving in this new world aren't necessarily the ones who could implement quicksort from memory. They're the ones who can clearly articulate why quicksort isn't the right choice for this particular data distribution, document that decision, and create examples that ensure AI makes the right choice every time.

The Path Forward - Practical Steps for Teams

Based on my journey and the patterns I'm seeing across the industry, here's my framework for building an AI-native software team:

Phase 1: Foundation Start with platform engineering, create important guard rails, not AI adoption. Build your context infrastructure. Document your architectural decisions in machine-readable formats. Create your guardrails and quality gates. This feels slow, but it's the only way to move fast later.

Phase 2: Workflow Identification Identify your repeated patterns. What does your team do over and over? These are your candidates for AI workflows. Start with the simplest, most constrained tasks - they're easier to make reliable.

Phase 3: Context Engineering Build your context management systems. Create semantic search over your codebase and documents. Build your scratchpads and state management. Train a dedicated context engineering team - they're your new compiler writers.

Phase 4: Workflow Implementation Start building your composable workflows with error-squashing built in. Each should be reliable, repeatable, and verifiable. Measure not just success rate but also context efficiency - how much context does it need to succeed?

Phase 5: Acceleration Only now do you accelerate. With foundations in place, context managed, and workflows built, you can move at AI speed without creating chaos.

The Reality Check

Let me be clear about current limitations. Individual AI operations still have error rates around 30% for complex tasks. Security vulnerabilities appear in about 40% of AI-generated code if you don't have proper guardrails. These aren't solved problems.

But here's what I've learned: these limitations are manageable if you build the right machine. It's not about perfect AI - it's about building systems that compose unreliable components into reliable outcomes. We've done this with distributed systems for decades. Now we're doing it with AI.

The teams that will win in this new era aren't those with the best AI models or the most sophisticated prompts. They're the teams that understand this fundamental shift: we're not using AI to write code faster, we're building machines that use AI to build software. The bottleneck isn't coding speed, it's human communication and architectural clarity. And the key to success isn't just about moving fast - it's building the right foundations before you accelerate.

Conclusion: The New Software Engineering Paradigm

We're living through a fundamental inversion of software development. The constraints we optimized for over decades have dissolved, revealing that they were never the real constraints at all.

The successful AI-native teams I see aren't trying to make AI better at coding - they're reorganizing their entire approach around three core principles:

The bottleneck has shifted from code generation to human communication and context management
Platform engineering and foundations must be frontloaded before acceleration
Reliable systems can be built from unreliable AI components through proper composition

This isn't a small shift. It's a complete reimagining of how we build software. And for those who understand and adapt to these new patterns, the opportunity is enormous. We're not just writing code faster - we're building machines that build software. And that changes everything.

The question for your team isn't whether to adopt AI - that ship has sailed. The question is whether you'll try to use AI within your existing paradigm (and fail), or whether you'll recognize that the paradigm itself has changed and reorganize accordingly.

The choice, and the opportunity, is yours.

Sai Sreenivas Kodur