Engineering leaders are navigating an exciting yet complex time. The emergence of generative AI has created a new landscape of platforms and creative possibilities, seemingly overnight. Yet economic uncertainty is challenging leaders to do more with less and strike a delicate balance between cost optimization, continuity of operations and team morale. Startups are having to adapt product strategies on the fly, introducing Gen AI-powered functionality to outpace competitors, with fewer resources at their disposal.
We at Sapphire Ventures see this moment as an opportunity to define a new set of engineering playbooks that harness the power of AI innovations while maintaining a careful focus on R&D efficiency. We believe the key ingredient to achieving this is community: connecting like-minded leaders to share best practices, refine reference architectures and ultimately become better engineers, together.
With this commitment in mind, we were thrilled to host our second annual Hypergrowth Engineering Summit earlier this summer in San Francisco. The event brought together more than 165 VPEs and CTOs representing over 140 venture-backed startups and digital giants for a day of panel discussions, TED-style talks and in-person networking (and of course, whiskey!). We were truly humbled by the lineup of luminaries who shared their time and expertise, including Inés Sombra (VP of Eng at Fastly), Vaibhav Nivargi (CTO & Co-founder at Moveworks, a Sapphire portfolio company), Allan Leinwand (CTO at Webflow), Tony Gentilcore (Head of Product Eng & Co-founder at Glean), Surabhi Gupta (SVP of Eng at Robinhood), Shailesh Kumar (SVP of Eng at ClickUp), Barbara Nelson (VP of Eng at InfluxData, another Sapphire portfolio company), Nora Jones (CEO & Founder at Jeli), and Lukas Biewald (CEO & Founder at Weights & Biases), just to name a few! The Summit was put together by Sapphire’s Engineering Excellence platform, which supports our engineering community with strategy advisory, partner connectivity and peer-to-peer learning.
In this post, we share key lessons from this year’s summit, which centered around R&D efficiency, Gen AI product strategy, autonomous DevOps and platform engineering. For a deeper look into the topics covered, check out video replays of the summit to unpack the full depth of sessions.
Pursuing a Gen AI product strategy
Gen AI has quickly emerged as an almost table stakes enhancement to any software product. R&D teams must respond quickly to customer demand for integrating novel LLM models into their platforms. But keeping pace with the rapid speed and breadth of emerging technologies can feel daunting. Drawing upon his experience building a natural-language AI platform over the last six and a half years, Vaibhav Nivargi (CTO at Moveworks) shared a thoughtful and comprehensive field guide for getting started with LLMs:
Model Selection: Once you’ve chosen a Gen AI-focused product strategy, model selection is a key first step. Options vary by performance, latency, cost, consumption model, modality, licensing, and more. Vaibhav explained that there is no single model to rule them all. “Model selection is nontrivial,” he said. “As you understand the use case, you need to think about modality of data (video, text, images), and theoretical bounds governing how big a model you can train based on the size of the data corpus. If you are sensitive to cost, APIs might grow to be very expensive juxtaposed to hosting your own.” Vaibhav stressed the importance of continually experimenting with new models, given the availability and pace with which they are being released to the community.
Stacking Models: Using only one model can produce control and accuracy challenges. It’s important to break down a process or workflow into a series of tasks, each of which can be powered by a purpose-built model. “Some of these are much smaller and barely qualify as a large language model today – but they can be great for tasks like classification or language translation,” Vaibhav said.
Fine-tuning Models: Changing model weights via established approaches to fine-tuning (e.g., grounding, instruction training, parameter efficient fine-tuning) is an important step towards improving overall model accuracy. Despite the buzz around emerging practices like prompt engineering, Vaibhav shared that, “There’s enough research and understanding now that fine-tuning [on domain-specific data] can exceed the performance of prompt engineering.”
Infrastructure: KPIs like latency, scalability and time to first token have a direct impact on user experience, and choices made at the infrastructure and runtime layers can influence these factors significantly. On the physical infrastructure side, Vaibhav spoke about GPU shortages and the need to standardize on a common software stack (deep learning, training and orchestration frameworks) that can be easily ported across cloud providers, in order to take advantage of inventory wherever it resides.
Product Design: Vaibhav also stressed the importance of product design and devising thoughtful ways of embedding Gen AI. While natural language might seem like a more intuitive user interface (juxtaposed to a CLI or API), it requires thoughtful planning to integrate seamlessly into a product. At Moveworks, the product design team works closely with the ML team. Both rely on user analytics to understand the adoption and usability of new Gen AI features as they are released.
MLOps in the age of Gen AI
The rise of Gen AI is reshaping the MLOps workflow and associated tech stack. Casber Wang (Partner at Sapphire Ventures) hosted a panel with three ML experts – Aditya Bindal (Head of Product at AWS Sagemaker), Lukas Biewald (CEO, Weights & Biases) and Tony Gentilcore (Head of Product Engineering at Glean) – to delve into the evolution of the MLOps ecosystem.
On the data engineering front, transformer architectures have reduced the emphasis on data cleaning, preprocessing pipelines and featuring engineering. In the case of LLMs, “feature engineering is more intrinsic to the training itself,” Aditya said. This shift, coupled with the general availability of common LLM APIs, is making the Gen AI stack more approachable for traditional software engineers. Teams can iterate quickly and achieve rapid innovation, without the added cost of specialized infrastructure or a dedicated data science team. This has opened the door to a federated organizational model, where “AI engineers” can be embedded within each feature team rather than being centralized.
The process for experimentation is also evolving. New approaches, such as prompt tuning and chaining, are emerging as effective means of steering models toward more predictable and repeatable outputs. Lukas shared that, “Historically, the experiments you might run when you are training a model might be changing the hyperparameters or changing the input data, whereas when you are using a Gen AI model, it’s often changing the prompt or changing the underlying LLM model… Almost every production application that we see includes quite a bit of chaining.” While the right operational model is still being defined for leveraging these techniques, an exciting ecosystem of frameworks and platforms has emerged to help with prompt versioning, visualizing and tracing LLM chains, and more.
LLMs have also changed model evaluation, and human-in-the-loop feedback (RLHF) has become particularly critical given the complexity and range of outputs an LLM can produce. “In the past, it was much easier for us to have automated eval sets,” Tony said. “Now we are going much more manual.” Manual evaluation can slow down the release process, but humans with domain expertise are difficult to replace and can ultimately produce a more accurate model.
The process for model serving has become more complex. As Aditya explained, “It used to be a lot more straightforward. You had a model artifact, you put it in a model server and deployed. Now you are thinking about GPU capacity, vector DB retrieval during inference, caching prompts and caching context for conversational use cases.” The emergence of LLM architectures is resulting in the inference stack taking on more of this load.
The rise of autonomous DevOps
Gen AI is improving not only the functional capabilities of apps, but also the way we build them. There has been a massive proliferation of ML-powered developer tools providing functionality across every phase of the SDLC, including code generation, bug detection, continuous verification and more.
Dave Bullock (CTO at UJET, a Sapphire portfolio company) took the Hypergrowth stage to share the value his teams are already gleaning from tools like Github’s Copilot: “The productivity boost you get from having [AI] write a bunch of unit tests or help you find a bug or write code is massive.” Eddie Aftandilian (Principal Researcher at GitHub Next) spoke about how these tools are fundamentally changing the very nature of software development. He noted that AI-assisted developers will become better generalists as their time increasingly shifts from lower-level tasks (e.g., writing individual lines) to higher-order abstractions (e.g., code review and system design).
Despite their early impact, the new breed of autonomous DevOps tools has room to improve. Eddie explained that context windows will get larger, which could eventually allow developers to present an entire repo to a model and have the AI “summarize it for you, show you where to make a change, create architectural diagrams, directly from the source code.” Another area of opportunity is explainability; models are not particularly good at “showing their work.” Emerging approaches like “chain of thought” and “action plan generation” are encouraging models to break down their thinking into steps in order to help engineers spot and correct flaws in reasoning. Finally, models will integrate and take action across the SDLC, moving beyond code gen into writing CI scripts, running builds, validating compilation, execution and performance (and we’ve begun to see this vision take shape with GitHub’s release of Code Interpreter).
Importance of platform engineering
Platform engineering – or investment in common, reusable tools and patterns – plays an essential role in improving overall developer efficiency. A best-in-class platform can reduce the cognitive load associated with tool-hopping and inefficient pipelines, reduce redundant spend and increase developer satisfaction. Though often loosely referring to the dev toolchain, platform engineering covers a much wider breadth of foundational technologies, including data platforms, infrastructure, shared front-end utilities and back-end services.
Commonality is what reduces complexity and drives efficiency. On a panel moderated by Anders Ranum (Partner at Sapphire Ventures), Barbara Nelson (VP of Engineering at InfluxData) explained that, because the company operates across all three major cloud providers, it’s inefficient to “ask feature teams to figure out the complexity of each environment.” Instead, a centralized platform team masters the nuances of each provider and “abstracts that away for the others.”
Investing in a dedicated team focused on “back of the house” optimizations can be difficult, particularly during a flat market. The time, effort, and outcomes can be far removed from tangible, client-facing feature development and bottom-line business value. Panelists suggested that it is critical to continuously monitor and report on the tangible value of the investment, including adoption rates across the R&D org, redundant cost take outs and service level metrics.
Finding the right people for the job can be difficult. Shailesh Kumar (SVP of Eng at ClickUp) suggested companies tap “tenured developers and get them to build the platform. They will build the best platform, because they understand the systems very well and the common things that need to built.” Pulling top talent off product-related endeavors is not for the faint of heart, but can help ensure a proper foundation is laid.
More with less: Engineering efficiency
In addition to best-in-class platforms and AI-driven coding utilities, engineering leaders play a huge role in keeping their teams motivated during downshifts in the market. Allan Leinwand (CTO at Webflow) shared best practices for inspiring engineers: “Doing more with less doesn’t mean, ‘Type faster.’” Allan highlighted the need to satisfy people’s desire to contribute to meaningful work. Managers must demonstrate that “being part of this team means you have an impact at scale,” whether that’s contributing to a critical feature or algorithm or having a global effect across a vibrant community of end users. He also stressed the importance of efficient onboarding, with the goal of every new engineer shipping to production within their first few days on the job. This gives them a sense of empowerment and agency, while getting them up to speed on the codebase faster.
Methods for measuring R&D productivity
With a strong technical platform in place, and the right engineering leadership at the helm, it’s important to establish a consistent method for assessing R&D efficiency. Determining exactly what to measure is a key first step.
On a panel moderated by Shatakshi Mohan (Investor at Sapphire Ventures), Surabhi Gupta (SVP of Eng at Robinhood) suggested that there was no one-size-fits all efficiency KPI: “Don’t…optimize on a single metric.” Andrew Lau (CEO at Jellyfish) added that you “can’t have one [metric], but you also can’t have 50…Part of this process is choosing what matters to your organization, and this is driven by your technical culture.” In terms of who to measure, Andrew noted that it’s important to start with teams. “Individual metrics are where people get wound up,” he said.
Quality metrics are typically uncontroversial – everyone wants their application to be bug-free and reliable. However, the panel cautioned against placing too much emphasis on velocity and other vanity stats (e.g. throughput, lines of code), which are often red herrings. It’s also important to recognize the importance of softer, qualitative metrics, such as developer satisfaction, which has been shown to directly correlate with efficiency. Nathan Harvey (DORA Advocate at Google Cloud) talked about the evolution of engineering efficiency frameworks like DORA, SPACE and DevEx, all of which provide a baseline of well-researched metrics, proven to reflect efficiency and performance.
Finally, engineering leaders need to effectively report up the chain. With R&D often representing 30% to 40% of OpEx and total headcount, R&D efficiency is facing more scrutiny than ever. Leaders must understand how to bridge the gap between individual feature teams and board-level concerns. “Your role as an engineering leader is also as a translator,” Andrew noted. Common boardroom metrics (e.g., R&D spend as a share of headcount) are useful macro signals, but understanding the total allocation of work (or how much time the team is spending on new features vs incidents, technical debt, unplanned activities) is an important window into the overall direction of the R&D org. Across that allocation, new feature development should serve as the north star and engineering leaders must devise ways to map activities to bottom-line business outcomes, such as user retention, and expansion.
Rewriting the playbook
The swift emergence of Gen AI has kicked up a dust cloud of new technologies and associated use cases to sift. R&D leaders are tasked with harnessing the raw power of emerging AI capabilities, while walking the tightrope of an unpredictable economic environment, where scale at all costs has been supplanted by a sharp focus on sustainable growth.
It’s a tall order, but we believe that such moments create the opportunity for a reset on traditional practices and playbooks. And we have faith in the power of communities to help shape these new paradigms.
We wanted to give a special shout out to our two keynote speakers for their contributions this year. Nora Jones (CEO at Jeli) kicked us off with an insightful talk about the importance of investing in the expertise of your teams, and steps that engineering leaders can take to improve post mortems and more effectively learn from incidents. Inés Sombra (VP of Eng at Fastly) took us home with reflections on her past experiences and best practices honed from building and operating hyperscale systems.
We’re grateful to all the luminaries who shared their expert insights and to all the attendees who contributed to the vibrant exchange of ideas at the Hypergrowth Engineering Summit.
Yet, the conversation doesn’t stop here.
If you’re building innovative infrastructure and DevOps platforms, developing novel approaches to the AI practices discussed here, or just want to get involved in Sapphire’s engineering community, drop me an email: carter@sapphireventures. And remember to save the date for next year’s third annual Hypergrowth Engineering Summit. We can’t wait to see you there!
Legal disclaimer
Disclaimer: Nothing presented within this article is intended to constitute investment advice, and under no circumstances should any information provided herein be used or considered as an offer to sell or a solicitation of an offer to buy an interest in any investment fund managed by Sapphire Ventures, LLC (“Sapphire”). Information provided reflects Sapphires’ views as of a time, whereby such views are subject to change at any point and Sapphire shall not be obligated to provide notice of any change. Companies mentioned in this article are a representative sample of portfolio companies in which Sapphire has invested in which the author believes such companies fit the objective criteria stated in commentary, which do not reflect all investments made by Sapphire. A complete alphabetical list of Sapphire’s investments made by Its direct growth and sports investing strategies is available here. No assumptions should be made that investments described were or will be profitable. Due to various risks and uncertainties, actual events, results or the actual experience may differ materially from those reflected or contemplated in these statements. Nothing contained in this article may be relied upon as a guarantee or assurance as to the future success of any particular company. Past performance is not indicative of future results.