Sapphire Ventures
Partnering with expansion-stage, enterprise software companies that we believe can become category leaders.
Sapphire Partners
Limited partner investing in exceptional early-stage venture fund managers.
Sapphire Sport
Partnering with early-stage companies at the nexus of technology and culture.
Menu close
Sapphire

2025 Hypergrowth Engineering Summit: Engineering the AI Frontier

Table of contents

2025 Hypergrowth Engineering Summit: Engineering the AI Frontier

For engineering leaders, the landscape is transforming beneath their feet. AI isn’t simply another tool in the toolkit—it’s fundamentally rewriting the rules of how we conceive, build, and ship software. The very definitions of “developer,” “application,” and “engineering organization” are evolving in real-time, opening pathways to build superior software at unprecedented speed and deliver intelligent capabilities that once lived only in science fiction.

Yet with this extraordinary potential comes equally extraordinary complexity. The challenge facing engineering managers today isn’t choosing between embracing or resisting AI—it’s learning to wield its transformative power while keeping their teams adaptable, productive, and motivated. Success lies not in the technology itself, but in how thoughtfully leaders navigate its potential.

It was against this backdrop of unprecedented transformation that we hosted our 4th annual Hypergrowth Engineering Summit in San Francisco. The Summit explored the intersection of engineers creating AI agents while simultaneously being enhanced by them, aiming to define new playbooks for harnessing the power of AI while enabling secure and responsible adoption at scale.

The Summit brought together over 230 CTOs, VPEs, CISOs, CIOs, and Heads of AI spanning a mix of both innovative startups and industry-leading enterprises. The program featured keynote presentations, fireside chats, panel discussions, and intimate networking opportunities, serving as a vital forum for technical leaders to share best practices and explore trends shaping our industry.

We were honored to be joined by a luminary speaker lineup, including Andrew Ng (Founder @ Deeplearning.ai, Chairman @ Landing.ai), Brandon Yang (Co-Founder @ Cartesia), Vish T.R. (CTO & Co-Founder @ Glean), Russell Kaplan (President @ Cognition Labs), Mike Hamilton (CIO @ Cloudflare), Kim Huffman (CIO @ Workiva), Zack Lipton (CTO & Co-Founder @ Abridge), Bryan Wise (CIO @ 6sense), Abi Noda (CEO & Co-founder @ DX), Brad Jones (CISO @ Snowflake), Susan Chiang (CISO @ Headway),  Ethan Dixon (Applied AI Lead @ Anthropic), Jerry Liu (CEO & Founder @ LlamaIndex), Tony Stoyanov (CTO & Co-Founder @ Elise AI), Kayla Williams (Field CISO @ Cyera) and Piyush Mangalick (Head of Core AI Applied Eng @ Microsoft).

This recap distills key insights from the Summit, organized around four core themes:

  • AI-Driven Developer Productivity: How autonomous AI agents are reshaping the entire software development lifecycle, from coding to deployment
  • Architecting AI Agents: How engineering leaders are building reliable, secure AI systems—leveraging advanced context engineering for better accuracy, open standards like MCP for seamless integration, and robust security guardrails to mitigate cyber risks
  • Scaling AI-Centric Platform Engineering: How engineering platforms are balancing centralized governance with democratized access to cutting-edge tooling
  • Defining & Executing an AI Product Strategy: How to build defensible, user-centered AI products by prioritizing high-impact use cases, delivering transparent and trustworthy experiences, and creating intuitive multi-modal interfaces

Read on as we explore each of these areas in more detail.

AI-Driven Developer Productivity

With the recent wave of enhanced developer tools, we are observing a notable vibes shift among engineering leaders – from initially questioning whether these AI tools were ready for primetime, to now strategizing the most optimal ways to integrate them into their teams and maximize their potential. As the next wave of development tools mature, we’re quickly approaching an era of fully automated, closed-loop software development, enabling continuous automation across the entire software development lifecycle (SDLC) – from ingesting requirements and mockups to generating code and tests, to user-driven feedback loops that detect soft bugs and feature gaps in the product roadmap, to production monitoring and downstream incident response. 

The Rise of Fully Autonomous Software Engineering Agents

As the vision of fully autonomous application development comes into focus, Russell Kaplan, (President @ Cognition Labs) shared his view on the “emerging third wave” of AI developer tools. Unlike earlier waves of autocomplete utilities or AI IDEs, autonomous agents such as Cognition’s Devin can execute tasks end-to-end over hours, days even, and return a complete unit of work.

This creates a powerful complementary dynamic: human engineers concentrate on critical design decisions and high-risk tasks, while AI agents work asynchronously in parallel—resolving Jira tickets, triaging bugs, and clearing backlogs of supporting work that would otherwise consume valuable human time. “We’re seeing this new programming paradigm emerge, where the human engineer becomes an architect manager of an infinite army of AI agents,” Russell explained, “and it’s your job to decompose complex problems into many small ones, fire off an army of agents to go solve them, and then review and combine the work.” This evolution is fundamentally reshaping what it means to be a “good developer.”

Russell Kaplan, (President @ Cognition Labs)
Russell Kaplan, President @ Cognition Labs

Measuring ROI of AI Developer Tools

Measuring and optimizing developer productivity requires new KPIs and benchmarks to help organizations make sense of their AI adoption journey. Abi Noda (CEO & Co-founder @ DX) highlighted the need to supplement traditional metrics like throughput and defect rates with both qualitative measures of developer sentiment and quantitative indicators such as time savings and % of PRs that are AI assisted.  These metrics offer engineering leaders a realistic, data-driven view of adoption, helping to establish benchmarks and drive meaningful improvements in productivity, while surfacing a more compelling story of AI-impact for leadership.

Engineering leaders find themselves caught between two realities: sky-high expectations driven by AI productivity gains splashed across headlines, and the more modest improvements they’re witnessing within their own teams. Even when AI accelerates inner loop activities like coding, outer loop friction points—code reviews, deployment processes, stakeholder alignment—can neutralize much of those gains. As Abi explained, these AI tools are not yet “eliminating the human bottlenecks in software development, which often make up the majority of the friction.”

Abi Noda (CEO & Co-founder @ DX)
Fireside chat: Abi Noda, CEO & Co-founder @ DX; Anders Ranum, Partner @ Sapphire Ventures

Overcoming Organizational Friction

Bryan Wise (CIO @ 6sense) captured this human-centric challenge during our Enterprise CIO panel: “you’ve got this sort of middle ground right now where, as tech leaders, you see where these processes are going to have to change – but, change management with humans is always the hardest part…” The tension is real: leaders can envision automated, closed-loop workflows, but must navigate the messy realities of organizational transformation. As new tools and frameworks flood the market, scaling success requires more than just technology adoption – it demands careful tool selection, reliable measurement, realistic benchmarking, and thoughtful orchestration of human and AI capabilities. Ultimately, realizing AI’s transformative potential depends as much on skillful change management as on the technology itself.

Panelists: Mike Hamilton, CIO @ Cloudflare; Kim Huffman, CIO @ Workiva; Bryan Wise, CIO @ 6sense Moderator: Jai Das, President & Partner, Sapphire Ventures
Panelists: Mike Hamilton, CIO @ Cloudflare; Kim Huffman, CIO @ Workiva; Bryan Wise, CIO @ 6sense | Moderator: Jai Das, President & Partner, Sapphire Ventures

Architecting and Deploying AI Agents

Much of AI’s impact on software engineering has been to augment the development of ‘traditional’ application workloads (deterministic systems with predefined pathways and predictable inputs and outputs).  But in parallel, a fundamentally new category of application is emerging:  AI agents that can think, adapt and perform complex tasks autonomously.

As building agents has become a top priority for engineering leaders, their teams have been tasked with developing expertise and fluency across a multifaceted stack. Agent engineering incorporates many concepts from traditional DevOps, but reimagined through the lens of this new style of application development (e.g. SDET practices evolve into eval). Navigating this new frontier requires a fresh set of playbooks, governing everything from model selection and underlying agent framework design, to context engineering and data retrieval, tool use, open standard protocols, robust evaluation and more. In the following section, we explore some of these key building blocks for architecting and deploying AI agents, highlighting insights and best practices shared by leading voices shaping the future of agentic systems.

Context Engineering for More Reliable Agent Performance

One of the most critical aspects of AI engineering is to ensure models are equipped with the appropriate context, instructions, and tools to execute tasks effectively. Jerry Liu (CEO & Founder @ LlamaIndex) emphasized the importance of advanced context engineering to address model “mistakes” and unlock more reliable behavior. Emerging strategies, such as fine-tuned embedding models, advanced chunking techniques, and classification and reranking methods have been shown to significantly improve data retrieval and RAG performance. The future of context engineering may even see agents autonomously retrieve and organize their own data, a concept referred to as “agentic retrieval.”

Jerry Liu (CEO & Founder @ LlamaIndex)
Jerry Liu, CEO & Founder @ LlamaIndex

Open Standards Such as MCP for Agent Interoperability

As agents interact with an increasingly diverse ecosystem of data sources, tools, and AI services, standardizing connectivity becomes critical. This is where de-facto standards such as Anthropic’s Model Context Protocol (MCP) prove essential. Ethan Dixon (Applied AI Lead @ Anthropic) described MCP as a “USB-C adapter” for LLMs—creating a consistent interface between models and external systems, whether they be internal tools, knowledge bases, or third-party APIs. By establishing a universal “handshake” protocol between models and the digital world, MCP tackles a fundamental constraint: LLMs are only as powerful as the information and tools they can reliably access.  Without such standards, each integration becomes a custom engineering effort, creating fragmentation and hindering supportability. 

Ethan Dixon (Applied AI Lead @ Anthropic)
Ethan Dixon, Applied AI Lead @ Anthropic

We’re excited to watch the continued evolution of open standards like MCP, particularly as the community begins to tackle key questions around discoverability and authentication, ultimately laying the groundwork for more composable, secure, and scalable agent ecosystems.

Evaluation Frameworks and Testing Best Practices

Testing and evaluation remain a cornerstone of agent development, and the first step is often a workload-agnostic discovery phase. This initial period of experimentation is not tied to a specific task, but instead focused on evaluating the foundational models themselves to find the one most fit for purpose—balancing performance against your unique latency requirements and spend thresholds. This helps teams build familiarity with the various model families, observing each one’s preferences in communication style and other intricacies beyond what a provider shares in documentation. As Tony Stoyanov (CTO & Co-Founder @ Elise AI) noted, “some of them work better with different data formats,” highlighting that even seemingly small changes in how input is structured or how output is requested can significantly impact accuracy. This early insight is critical not only for selecting the right model but also for designing effective prompting strategies and optimizing retrieval.

Tony Stoyanov (CTO & Co-Founder @ Elise AI)
Tony Stoyanov, CTO & Co-Founder @ Elise AI

Following the initial experimentation phase, Tony recommends categorizing evals into easy, medium, and hard use cases. For easy and medium evals, the goal is to automate as many of the lower-complexity tests as possible via the CI/CD pipeline. To ensure confidence in automation, he suggests a simple benchmark: “a good rule of thumb we found is you should run that test 100 times” to confirm it’s stable and reliable.

Recognizing that not all test cases are suitable for auto-grading, more complex evaluations may require a human-in-the-loop approach, particularly when working across different modalities. For example, due to the subjective nature of certain voice evals (e.g. assessing tone, personality), Tony shared that such tests “almost always involve the product manager… they’re the closest to having the best taste about what the behavior should be.”

With any approach, maintaining comprehensive snapshots of ‘agent-accessible’ data becomes critical for debugging. When production monitoring catches errors, teams need the ability to replay scenarios and retrace decision paths step-by-step. And as agents tackle longer time horizon tasks – potentially involving 15-20 separate actions in a single workflow – visibility becomes paramount. Teams require “a comprehensive understanding of how well each action was performed,” insights into any unnecessary or redundant steps, and a complete audit trail of task execution from start to finish. Without this level of observability, diagnosing agent behavior and improving performance becomes nearly impossible.

Securing AI Workloads

As the universe of agent use cases continue to expand, so too do the associated cyber risks. On our Securing Agents panel, four enterprise CISOs took to the stage to share strategies for securing these workloads. Critical areas explored included untrusted software supply chains, excess agent privileges leading to system exposure and data leakage, and possible hallucinations with reputational consequences to name a few. If you are interested in diving into these topics in more depth, we encourage you to check back in a few weeks as our team will be publishing a ‘Cyber for AI’ blog detailing these very challenges and some associated best practices in more detail.

Panelists: Brad Jones, CISO @ Snowflake; Susan Chiang, CISO @ Headway; Kayla Williams, CISO @ Cyera Moderator: Casber Wang, Growth Partner @ Sapphire Ventures
Panelists: Brad Jones, CISO @ Snowflake; Susan Chiang, CISO @ Headway; Kayla Williams, CISO @ Cyera | Moderator: Casber Wang, Growth Partner @ Sapphire Ventures

Fine-Tuning Models Through Continuous Feedback Loops

Piyush Mangalick (Head of Core AI Applied Engineering @ Microsoft) championed the concept of an ‘AI signals loop” for continuous improvement—embedding observability into production agents, systematically labeling inputs and outputs to understand agent behavior, and leveraging this production-generated data to iteratively fine-tune LLMs. This creates a feedback loop where real-world usage directly drives model improvements.

Scaling AI-Centric Platform Engineering

As the ecosystem evolves, both the required skill sets and supporting infrastructure to build with AI are in constant flux. The platform tooling and talent needed at the prototype stage can differ dramatically from what’s required for scaled production. This dynamic demands more frequent and fluid resource recalibration across organizations, with leaders constantly balancing distributed innovation against centralized quality control.

Empowering Product Teams to Build and Deploy

Zack Lipton (CTO & Co-Founder @ Abridge) challenged the conventional wisdom that AI capabilities should be siloed within dedicated teams: “I think too many of us have maybe a bit of a bias to think the AI capability should be built by the dedicated AI team.” He argues that the critical knowledge is shifting—from technical implementation expertise to the statistical thinking needed to properly evaluate features, assess potential risks and harms, and architect organizational structures “where all of your product engineers are empowered to be able to build features and to develop workflows, and be able to ship them.” Success lies not in centralizing AI development, but in democratizing its use.

Fireside Chat: Zack Lipton, CTO & Co-Founder @ Abridge; Cathy Gao, Partner @ Sapphire Ventures
Fireside Chat: Zack Lipton, CTO & Co-Founder @ Abridge; Cathy Gao, Partner @ Sapphire Ventures

Vish T.R. (CTO & Co-Founder @ Glean) echoed this sentiment, sharing “we want our entire team to be building AI features …measuring their quality… and improving them.” Glean utilizes their central team of AI experts to enable a distributed development model: “the central AI team provides the primitives, things like eval, infrastructure, things like recommendation engine, like prompt-tuning, all of those services… to the application teams. And then the application teams are able to build products and features based on that.”

Fireside chat: Vish T.R., CTO & Co-Founder @ Glean; Rami Branitzky, Partner, Portfolio Growth @ Sapphire Ventures
Fireside Chat: Vish T.R., CTO & Co-Founder @ Glean; Rami Branitzky, Partner, Portfolio Growth @ Sapphire Ventures

Recalibrating Platform Strategies as Products Scale

Companies at various stages naturally have different concerns – the reach and responsibility of the platform team should be tailored to a specific organization’s needs and maturity level, recalibrating when necessary. For example, a more scaled product team may explore infrastructure-level optimizations to improve margins; however, Zack cautioned against a common pitfall for early startups: worrying about cost and scale before it’s actually a problem. “I’d say that early days of a company probably should be 100% driven by what’s giving you the best product experience.” Once operating at scale with ample data and sufficient resources, you can then think about further optimizations such as “fine-tuning or distilling those models.”

Furthermore, early-stage startups should prioritize flexibility to fuel rapid iteration and experimentation, but must recognize that unchecked flexibility can spawn duplication and inefficiencies as teams grow. This evolution is natural: as businesses scale, transitioning toward a more centralized platform stack coupled with structured guardrails helps ensure quality, efficiency, and reliability while preventing the fragmentation that emerges when divergent paths are pursued.

Defining & Executing an AI Product Strategy

As organizations increasingly bring AI-driven products and features to market, defining a clear, actionable product strategy is essential – from thoughtfully prioritizing which features and enhancements to pursue, to identifying defensible moats in an increasingly competitive landscape.

Designing Trustworthy and Transparent AI-Driven User Experiences

Crafting the ideal ‘Ux of AI’ requires balancing seamless, multi-modal interaction with transparency and user control. The best AI products know when to hide and when to reveal themselves. For routine tasks (scheduling meetings, organizing files, filtering notifications), AI works best when invisible, quietly automating workflows without interrupting the user’s flow.  But for high-stakes decisions like approving financial transactions or crafting sensitive customer responses, transparency becomes essential. Users need to visualize the agent’s reasoning, understand its confidence levels, and retain meaningful control over outcomes. Best-in-class products don’t just surface AI involvement – they expose step-by-step logic, highlight uncertainty, and provide clear override mechanisms. This transparency doesn’t slow users down; it builds the trust necessary for AI to handle increasingly critical tasks.

Multi-Modal Interfaces as the Next UX Frontier

Optimizing Voice Agents

Achieving a best-in-class voice agent requires overcoming significant technical hurdles. The traditional three-legged handshake (Speech-to-Text, LLM processing, then Text-to-Speech) creates inherent latency challenges that can break conversational flow. Beyond pipeline optimization, teams must also generate realistic-sounding voices that convey appropriate tone and emotion, all while maintaining response times under 500 milliseconds to preserve the natural rhythm of human conversation.  The complexity multiplies when considering edge cases: handling interruptions, managing background noise, adapting to accents and speaking styles, and gracefully recovering from misunderstandings. Success demands not just technical excellence in each component, but seamless orchestration across the entire voice stack.

Andrew Ng (Founder @ Deeplearning.ai, Chairman @ Landing.ai) highlighted the importance of thoughtfully masking latency challenges. He explained how his team borrows natural stalling techniques commonly used by humans—such as injecting “That’s a good question” or “Yeah, I could probably help you with that” into the conversation—to buy the agent critical processing time.

Andrew Ng, Founder @ Deeplearning.ai, Chairman @ Landing.ai
Andrew Ng, Founder @ Deeplearning.ai, Chairman @ Landing.ai

Brandon Yang (Co-founder @ Cartesia) highlighted a fundamental limitation in current voice models: most lack native reasoning capabilities. As Brandon explained, “the big blocker and why most voice agents aren’t built on end-to-end voice models is the reasoning. It’s just not that good. At some point, you care more that your voice agent is smart than exactly how it sounds.”

The next generation of end-to-end voice models promise to solve this by combining improved realism and latency through fully integrated processing, where “you’ll have one model that directly takes audio in and predicts the next audio that it should be saying.”

Brandon discussed promising research, including State Space Model (SSM) architectures. Unlike traditional transformer models that scale quadratically with input length, SSMs operate linearly during inference. This architectural advantage could unlock more efficient processing and enable more intelligent, long-context reasoning.

Brandon Yang, Co-Founder @ Cartesia
Brandon Yang, Co-Founder @ Cartesia

Conclusion

We’re grateful to all the visionary speakers who shared their expertise and to every engineering leader who joined us for the vibrant exchange of ideas at this year’s Hypergrowth Engineering Summit. The incredible leaders in attendance will continue to collectively shape the next chapter of engineering – one where AI is seamlessly woven into every aspect of our work, guided by the principles and insights we’re developing together as a community.

If you’re building innovative AI-driven tools, tackling challenges aligned with these themes, or simply want to continue the conversation within Sapphire’s engineering community, we invite you to reach out anytime at [email protected].

Sign up for our newsletter

Legal disclaimer

This article is for informational purposes only. Nothing presented within this article is intended to constitute investment advice, and under no circumstances should any information provided herein be used or considered as an offer to sell or a solicitation of an offer to buy an interest in any investment fund managed by Sapphire. Information provided reflects Sapphires’ views as of a time, whereby such views are subject to change at any point and Sapphire shall not be obligated to provide notice of any change. Various quotes set forth herein are from members of current and former Sapphire portfolio companies and other third parties with whom Sapphire interacts. Statements made by such individuals reflect their thoughts and opinions only; Sapphire may have different views as compared to such parties, and makes no representation or guarantee around any claims made by such parties.  Companies mentioned in this article are a representative sample of portfolio companies in which Sapphire has invested in which the author believes such companies fit the objective criteria stated in commentary, which do not reflect all investments made by Sapphire. A complete alphabetical list of investments made by Sapphire’s Growth strategy is available here. No assumptions should be made that investments listed above were or will be profitable. Due to various risks and uncertainties, actual events, results or the actual experience may differ materially from those reflected or contemplated in these statements. Nothing contained in this virtual event may be relied upon as a guarantee or assurance as to the future success of any particular company. Past performance is not indicative of future results.