Generative AI’s Act Two(投稿)

admin 2023年9月25日14:29:18评论11 views字数 20372阅读67分54秒阅读模式

Generative AI’s Act Two





scientists, historians and economists have long studied the optimal conditions that create a Cambrian explosion of innovation. In generative AI, we have reached a modern marvel, our generation’s space race.


This moment has been decades in the making. Six decades of Moore’s Law have given us the compute horsepower to process exaflops of data. Four decades of the internet (accelerated by COVID) have given us trillions of tokens’ worth of training data. Two decades of mobile and cloud computing have given every human a supercomputer in the palm of our hands. In other words, decades of technological progress have accumulated to create the necessary conditions for generative AI to take flight.


ChatGPT’s rise was the spark that lit the fuse, unleashing a density and fervor of innovation that we have not seen in years—perhaps since the early days of the internet. The breathless excitement was especially visceral in “Cerebral Valley,” where AI researchers reached rockstar status and hacker houses were filled to the brim each weekend with new autonomous agents and companionship chatbots. AI researchers transformed from the proverbial “hacker in the garage” to special forces units commanding billions of dollars of compute. The arXiv printing press has become so prolific that researchers have jokingly called for a pause on new publications so they can catch up.

ChatGPT 的崛起是点燃导火索的火花,它释放出的创新密度和热情是我们多年来从未见过的--也许是自互联网诞生之初以来从未见过的。这种令人窒息的兴奋在 "大脑谷 "尤为明显,在那里,人工智能研究人员达到了摇滚明星的地位,每个周末都会有新的自主代理和陪伴聊天机器人出现在黑客之家。人工智能研究人员从众所周知的 "车库里的黑客 "转变为指挥数十亿美元计算的特种部队。arXiv 印刷机已经变得如此多产,以至于研究人员开玩笑地呼吁暂停发表新论文,以便赶上进度。

But quickly, AI excitement turned to borderline hysteria. Suddenly, every company was an “AI copilot.” Our inboxes got filled up with undifferentiated pitches for “AI Salesforce” and “AI Adobe” and “AI Instagram.” The $100M pre-product seed round returned. We found ourselves in an unsustainable feeding frenzy of fundraising, talent wars and GPU procurement.

但很快,对人工智能的兴奋就变成了近乎歇斯底里。突然之间,每家公司都成了 "人工智能副驾驶"。我们的收件箱里塞满了 "AI Salesforce"、"AI Adobe "和 "AI Instagram "的无差别推销。1亿美元的产品前种子轮融资又回来了。我们发现自己陷入了难以为继的筹资、人才争夺战和 GPU 采购狂潮之中。

And sure enough, the cracks started to show. Artists and writers and singers challenged the legitimacy of machine-generated IP. Debates over ethics, regulation and looming superintelligence consumed Washington. And perhaps most worryingly, a whisper began to spread within Silicon Valley that generative AI was not actually useful. The products were falling far short of expectations, as evidenced by terrible user retention. End user demand began to plateau for many applications. Was this just another vaporware cycle?


The AI summer of discontent has sent critics gleefully grave dancing, reminiscent of the early days of the internet, where in 1998 one famous economist declared “By 2005, it will become clear that the Internet’s impact on the economy has been no greater than the fax machine’s.”

人工智能的 "不满之夏 "让批评家们兴高采烈地跳起了严肃的舞蹈,让人不禁想起互联网诞生之初,一位著名经济学家在 1998 年宣称:"到 2005 年,互联网对经济的影响显然不会超过传真机。

Make no mistake—despite the noise and the hysteria and the air of uncertainty and discontent, generative AI has already had a more successful start than SaaS, with >$1 billion in revenue from startups alone (it took the SaaS market years, not months, to reach the same scale). Some applications have become household names: ChatGPT became the fastest-growing application with particularly strong product-market fit among students and developers; Midjourney became our collective creative muse and was reported to have reached hundreds of millions of dollars in revenue with a team of just eleven; and Character popularized AI entertainment and companionship and created the consumer “social” application we craved the most—with users spending two hours on average in-app.

别误会--尽管喧嚣、歇斯底里、充满不确定性和不满,但生成式人工智能的起步已经比 SaaS 更成功,仅初创企业的收入就超过了 10 亿美元(SaaS 市场花了数年而非数月才达到同样的规模)。一些应用程序已经家喻户晓: ChatGPT 成为增长最快的应用,在学生和开发者中具有特别强的产品-市场契合度;Midjourney 成为我们的集体创作缪斯,据说仅靠 11 人的团队就实现了数亿美元的收入;Character 普及了人工智能娱乐和陪伴,创造了我们最渴望的消费者 "社交 "应用--用户在应用内平均花费两个小时。

Nonetheless, these early signs of success don’t change the reality that a lot of AI companies simply do not have product-market fit or a sustainable competitive advantage, and that the overall ebullience of the AI ecosystem is unsustainable.


Now that the dust has settled for a bit, we thought it would be an opportune moment to zoom out and reflect on generative AI—where we find ourselves today, and where we’re possibly headed.


Towards Act Two


Generative AI’s first year out the gate—“Act 1”—came from the technology-out. We discovered a new “hammer”—foundation models—and unleashed a wave of novelty apps that were lightweight demonstrations of cool new technology.

生成式人工智能走出大门的第一年--"第一幕"--是从技术出发的。我们发现了一个新的 "锤子"--基础模型,并推出了一大批新奇的应用程序,轻而易举地展示了酷炫的新技术。

We now believe the market is entering “Act 2”—which will be from the customer-back. Act 2 will solve human problems end-to-end. These applications are different in nature than the first apps out of the gate. They tend to use foundation models as a piece of a more comprehensive solution rather than the entire solution. They introduce new editing interfaces, making the workflows stickier and the outputs better. They are often multi-modal.

现在,我们相信市场正在进入 "第二阶段"--即从客户出发。第二幕将从头到尾解决人类的问题。这些应用在本质上不同于最初的应用。它们倾向于使用基础模型作为更全面解决方案的一部分,而不是整个解决方案。它们引入了新的编辑界面,使工作流程更有粘性,输出效果更好。它们通常是多模式的。

The market is already beginning to transition from “Act 1” to “Act 2.” Examples of companies entering “Act 2” include Harvey, which is building custom LLMs for elite law firms; Glean, which is crawling and indexing our workspaces to make Generative AI more relevant at work; and Character and Ava, which are creating digital companions.

市场已经开始从 "第一幕 "向 "第二幕 "过渡。进入 "第二幕 "的公司包括:为精英律师事务所打造定制法学硕士的哈维公司(Harvey);为我们的工作空间建立爬行和索引,使生成式人工智能与工作更加相关的 Glean 公司;以及打造数字伴侣的 Character 和 Ava 公司。

Market Map(一些公司的图,用于标注市场,略)

Our updated generative AI market map is below.

Unlike last year’s map, we have chosen to organize this map by use case rather than by model modality. This reflects two important thrusts in the market: Generative AI’s evolution from technology hammer to actual use cases and value, and the increasingly multimodal nature of generative AI applications.

与去年的地图不同,我们选择按使用案例而不是按模型模式来组织本地图。这反映了市场的两个重要趋势: 生成式人工智能从技术锤炼到实际用例和价值的演变,以及生成式人工智能应用越来越多的多模式性质。

In addition, we have included a new LLM developer stack that reflects the compute and tooling vendors that companies are turning to as they build generative AI applications in production.

此外,我们还包含了一个新的 LLM 开发人员堆栈,它反映了公司在生产中构建生成式人工智能应用时所求助的计算和工具供应商。


Revisiting Our Thesis


Our original essay laid out a thesis for the generative AI market opportunity and a hypothesis for how the market would unfold. How did we do?


Here’s what we got wrong:


Things happened quickly. Last year, we anticipated it would be nearly a decade before we had intern-level code generation, Hollywood-quality videos or human quality speech that didn’t sound mechanical. But a quick listen to Eleven Labs’ voices on TikTok or Runway’s AI film festival makes it clear that the future has arrived at warp speed. Even 3D models, gaming and music are becoming good, quickly.

事情发生得很快。去年,我们曾预计,要等到近十年后,我们才能拥有内部代码生成、好莱坞品质的视频或听起来不机械的人类品质的语音。但只要听一听 Eleven Labs 在 TikTok 或 Runway 的人工智能电影节上的声音,就会清楚地意识到未来已经飞速到来。就连 3D 模型、游戏和音乐也在迅速变得优秀。

The bottleneck is on the supply side. We did not anticipate the extent to which end user demand would outstrip GPU supply. The bottleneck to many companies’ growth quickly became not customer demand but access to the latest GPUs from Nvidia. Long wait times became the norm, and a simple business model emerged: pay a subscription fee to skip the line and access better models.

瓶颈在于供应方。我们没有预料到最终用户的需求会在多大程度上超过 GPU 的供应。许多公司的增长瓶颈很快就不是客户需求,而是获得 Nvidia 最新 GPU 的途径。漫长的等待时间成为常态,于是出现了一种简单的商业模式:支付订阅费就可以不用排队,获得更好的型号。

Vertical separation hasn’t happened yet. We still believe that there will be a separation between the “application layer” companies and foundation model providers, with model companies specializing in scale and research and application layer companies specializing in product and UI. In reality, that separation hasn’t cleanly happened yet. In fact, the most successful user-facing applications out of the gate have been vertically integrated.

垂直分离尚未发生。我们仍然认为,"应用层 "公司和基础模型提供商之间将出现分离,模型公司将专注于规模和研究,而应用层公司将专注于产品和用户界面。实际上,这种分离还没有完全实现。事实上,最成功的面向用户的应用程序都是垂直整合的。

Cutthroat competitive environment and swiftness of the incumbent response. Last year, there were a few overcrowded categories of the competitive landscape (notably image generation and copywriting), but by and large the market was whitespace. Today, many corners of the competitive landscape have more competition than opportunity. The swiftness of the incumbent response, from Google’s Duet and Bard to Adobe’s Firefly—and the willingness of incumbents to finally go “risk on”—has magnified the competitive heat. Even in the foundation model layer, we are seeing customers set up their infrastructure to be agnostic between different vendors.

残酷的竞争环境和现有公司的迅速反应。去年,在竞争格局中,有几类竞争过于激烈(尤其是图片生成和文案撰写),但总体而言,市场还是一片空白。如今,竞争格局的许多角落都是竞争大于机遇。从谷歌的 Duet 和 Bard 到 Adobe 的 Firefly,现有公司的迅速反应,以及现有公司最终愿意 "冒险 "的意愿,都放大了竞争的热度。即使是在基础模型层,我们也看到客户将其基础设施设置为与不同供应商无关。

The moats are in the customers, not the data. We predicted that the best generative AI companies could generate a sustainable competitive advantage through a data flywheel: more usage → more data → better model → more usage. While this is still somewhat true, especially in domains with very specialized and hard-to-get data, the “data moats” are on shaky ground: the data that application companies generate does not create an insurmountable moat, and the next generations of foundation models may very well obliterate any data moats that startups generate. Rather, workflows and user networks seem to be creating more durable sources of competitive advantage.

护城河在于客户,而非数据。我们曾预测,最好的生成式人工智能公司可以通过数据飞轮产生可持续的竞争优势:更多的使用→更多的数据→更好的模型→更多的使用。虽然这在某种程度上仍然是正确的,尤其是在那些拥有非常专业和难以获取的数据的领域,但 "数据护城河 "的基础并不稳固:应用公司生成的数据并不能形成不可逾越的护城河,而下一代基础模型很可能会抹去初创公司生成的任何数据护城河。相反,工作流程和用户网络似乎正在创造更持久的竞争优势。

Here’s what we got right:


Generative AI is a thing. Suddenly, every developer was working on a generative AI application and every enterprise buyer was demanding it. The market even kept the “generative AI” moniker. Talent flowed into the market, as did venture capital dollars. Generative AI even became a pop culture phenomenon in viral videos like “Harry Potter Balenciaga” or the Drake imitation song “Heart on My Sleeve” by Ghostwriter which has become a chart-topping hit.

The first killer apps emerged. It’s been well documented that ChatGPT was the fastest application to reach 100M MAU—and it did so organically in just 6 weeks. By contrast, Instagram took 2.5 years, WhatsApp took 3.5 years, and YouTube and Facebook took 4 years to reach that level of user demand. But ChatGPT is not an isolated phenomenon. The depth of engagement of Character AI (2 hour average session time), the productivity benefits of Github Copilot (55% more efficient), and the monetization path of Midjourney (hundreds of millions of dollars in revenue) all suggest that the first cohort of killer apps has arrived.

生成式人工智能是个新事物。突然间,每个开发者都在开发生成式人工智能应用,每个企业买家都在要求这种应用。市场甚至保留了 "生成式人工智能 "的名称。人才流入市场,风险资本也纷纷涌入。生成式人工智能甚至在病毒视频中成为一种流行文化现象,如 "哈利-波特-巴伦西亚加",或由 Ghostwriter 创作的模仿 Drake 的歌曲 "Heart on My Sleeve",这首歌已成为排行榜上的热门歌曲。

第一批杀手级应用出现了。有资料显示,ChatGPT 是最快达到 1 亿 MAU 的应用,而且只用了 6 周时间。相比之下,Instagram 用了 2.5 年,WhatsApp 用了 3.5 年,YouTube 和 Facebook 用了 4 年才达到这一用户需求水平。但 ChatGPT 并不是一个孤立的现象。Character AI 的深度参与(平均会话时间为 2 小时)、Github Copilot 的生产力优势(效率提高 55%)以及 Midjourney 的货币化路径(数亿美元的收入)都表明,第一批杀手级应用已经到来。

Developers are the key. One of the core insights of developer-first companies like Stripe or Unity has been that developer access opens up use cases you could not even imagine. In the last several quarters, we have been pitched everything from music generation communities to AI matchmakers to AI customer support agents.

开发人员是关键。Stripe 或 Unity 等开发者优先型公司的核心观点之一是,开发者的访问权限开辟了你无法想象的使用案例。在过去的几个季度里,从音乐生成社区到人工智能媒人,再到人工智能客户支持代理,我们已经接触到了各种各样的产品。

The form factor is evolving. The first versions of AI applications have largely been autocomplete and first drafts, but these form factors are now growing in complexity. Midjourney’s introduction of camera panning and infilling is a nice illustration of how the generative AI-first user experience has grown richer. Across the board, form factors are evolving from individual to system-level productivity and from human-in-the-loop to execution-oriented agentic systems.

形式因素在不断演变。人工智能应用的最初版本大多是自动完成和初稿,但现在这些形式因素正变得越来越复杂。Midjourney 引入的摄像头平移和填充功能很好地说明了生成式人工智能用户体验是如何变得更加丰富的。总的来说,形式因素正在从个人生产力发展到系统级生产力,从人类在环发展到以执行为导向的代理系统。

Copyright and ethics and existential dread. The debate has roared on these hot-button topics. Artists and writers and musicians are split, with some creators rightfully outraged that others are profiting off derivative work, and some creators embracing the new AI reality (Grimes’ profit-sharing proposition and James Buckhouse’s optimism about becoming part of the creative genome come to mind). No startup wants to be the Napster or Limewire to the eventual Spotify (h/t Jason Boehmig). The rules are opaque: Japan has declared that content used to train AI has no IP rights, while Europe has proposed heavy-handed regulation.

版权、伦理和生存恐惧。关于这些热门话题的争论此起彼伏。艺术家、作家和音乐家们意见不一,一些创作者对他人利用衍生作品牟利义愤填膺,一些创作者则对新的人工智能现实表示欢迎(我想到了格兰姆斯的利润分享主张和詹姆斯-巴克豪斯对成为创意基因组一部分的乐观态度)。没有一家初创公司愿意成为 Napster 或 Limewire,而不是最终的 Spotify(ht/Jason Boehmig)。规则不透明: 日本宣布用于训练人工智能的内容没有知识产权,而欧洲则提出了严厉的监管措施。

Where do we stand now? Generative AI’s Value Problem


Generative AI is not lacking in use cases or customer demand. Users crave AI that makes their jobs easier and their work products better, which is why they have flocked to applications in record-setting droves (in spite of a lack of natural distribution).


But do people stick around? Not really. The below chart compares the month 1 mobile app retention of AI-first applications to existing companies.

但人们会留下来吗?其实不然。下图比较了人工智能优先应用与现有公司的移动应用第 1 个月留存率。

User engagement is also lackluster. Some of the best consumer companies have 60-65% DAU/MAU; WhatsApp’s is 85%. By contrast, generative AI apps have a median of 14% (with the notable exception of Character and the “AI companionship” category). This means that users are not finding enough value in Generative AI products to use them every day yet.

用户参与度也乏善可陈。一些优秀的消费类公司的 DAU/MAU 为 60-65%;WhatsApp 的 DAU/MAU 为 85%。相比之下,生成式人工智能应用程序的中位数仅为 14%("性格 "和 "人工智能陪伴 "类别是个明显的例外)。这意味着,用户还没有发现生成式人工智能产品的足够价值,以至于每天都要使用它们。

In short, generative AI’s biggest problem is not finding use cases or demand or distribution, it is proving value. As our colleague David Cahn writes, “the $200B question is: What are you going to use all this infrastructure to do? How is it going to change people’s lives?” The path to building enduring businesses will require fixing the retention problem and generating deep enough value for customers that they stick and become daily active users.

简而言之,生成式人工智能最大的问题不是寻找用例、需求或分配,而是证明价值。正如我们的同事大卫-卡恩(David Cahn)所写的那样,"2000 亿美元的问题是:你打算用这些基础设施来做什么?它将如何改变人们的生活?要想建立持久的业务,就必须解决客户留存问题,为客户创造足够深的价值,使他们坚持使用并成为每日活跃用户。

Let’s not despair. Generative AI is still in its “awkward teenage years.” There are glimpses of brilliance, and when the products fall short of expectations the failures are often reliable, repeatable and fixable. Our work is cut out for us.

我们不要绝望。生成式人工智能仍处于 "尴尬的青少年时期"。当产品未能达到预期时,失败往往是可靠的、可重复的和可修复的。我们任重而道远。

Act Two: A Shared Playbook


Founders are embarking on the hard work of prompt engineering, fine tuning and dataset curation to make their AI products *good*. Brick by brick, they are building flashy demos into whole product experiences. And meanwhile, the foundation model substrate continues to brim with research and innovation.


A shared playbook is developing as companies figure out the path to enduring value. We now have shared techniques to make models useful, as well as emerging UI paradigms that will shape generative AI’s second act.


The Model Development Stack


Emerging reasoning techniques like chain-of-thought, tree-of-thought and reflexion are improving models’ ability to perform richer, more complex reasoning tasks, closing the gap between customer expectations and model capabilities. Developers are using frameworks like Langchain to invoke and debug more complex multi-chain sequences.

思维链、思维树和反思等新兴推理技术正在提高模型执行更丰富、更复杂推理任务的能力,缩小客户期望与模型能力之间的差距。开发人员正在使用 Langchain 等框架来调用和调试更复杂的多链序列。

Transfer learning techniques like RLHF and fine-tuning are becoming more accessible, especially with the recent availability of fine-tuning for GPT-3.5 and Llama-2, which means that companies can adapt foundation models to their specific domains and improve from user feedback. Developers are downloading open-source models from Hugging Face and fine-tuning them to achieve quality performance.

 RLHF 和微调这样的迁移学习技术正变得越来越容易获得,特别是最近 GPT-3.5 和 Llama-2 的微调技术的推出,这意味着公司可以根据其特定领域调整基础模型,并根据用户反馈进行改进。开发人员正在从 Hugging Face 下载开源模型,并对其进行微调,以实现优质性能。

Retrieval-augmented generation is bringing in context about the business or the user, reducing hallucinations and increasing truthfulness and usefulness. Vector databases from companies like Pinecone have become the infrastructure backbone for RAG.

检索增强生成正在引入有关企业或用户的背景信息,减少幻觉,提高真实性和实用性。Pinecone 等公司的矢量数据库已成为 RAG 的基础架构支柱。

New developer tools and application frameworks are giving companies reusable building blocks to create more advanced AI applications and helping developers evaluate, improve and monitor the performance of AI models in production, including LLMOps tools like Langsmith and Weights & Biases.

新的开发人员工具和应用框架为公司提供了可重复使用的构件,以创建更先进的人工智能应用,并帮助开发人员评估、改进和监控人工智能模型在生产中的性能,包括 LLMOps 工具(如 Langsmith 和 Weights & Biases)。

AI-first infrastructure companies like Coreweave, Lambda Labs, Foundry, Replicate and Modal are unbundling the public clouds and providing what AI companies need most: plentiful GPUs at a reasonable cost, available on-demand and highly scalable, with a nice PaaS developer experience.

Coreweave、Lambda Labs、Foundry、Replicate 和 Modal 等以人工智能为先的基础架构公司正在为公有云松绑,并提供人工智能公司最需要的服务:以合理的成本提供大量 GPU,按需可用,高度可扩展,并提供良好的 PaaS 开发人员体验。

Together, these techniques should close the expectations vs reality gap for models as the underlying foundation models simultaneously improve. But making the models great is only half the battle. The playbook for a generative AI-first user experience is evolving as well:


Emerging Product Blueprints


Generative interfaces. A text-based conversational user experience is the default interface on top of an LLM. Gradually, newer form factors are entering the arsenal, from Perplexity’s generative user interfaces to new modalities like human-sounding voices from Inflection AI.

生成界面。基于文本的对话式用户体验是 LLM 的默认界面。从 Perplexity 的生成式用户界面到新的模式,如 Inflection AI 的人声,更新的形式因素正逐渐进入这个领域。

New editing experiences: from Copilot to Director’s Mode. As we advance from zero-shot to ask-and-adjust (h/t Zach Lloyd), generative AI companies are inventing a new set of knobs and switches that look very different from traditional editing workflows. Midjourney’s new panning commands and Runway’s Director’s Mode create new camera-like editing experiences. Eleven Labs is making it possible to manipulate voices through prompting.

新的编辑体验:从 "副驾驶 "到 "导演模式"。随着我们从 "零拍摄"(zero-shot)向 "要求-调整"(ask-and-adjust)转变(ht/t Zach Lloyd),生成式人工智能公司正在发明一系列新的旋钮和开关,这些旋钮和开关与传统的剪辑工作流程截然不同。Midjourney 的新平移命令和 Runway 的导演模式创造了类似摄像机的全新剪辑体验。Eleven Labs 则通过提示来操控声音。

Increasingly sophisticated agentic systems. Generative AI applications are increasingly not just autocomplete or first drafts for human review; they now have the autonomy to problem-solve, access external tools and solve problems end-to-end on our behalf. We are steadily progressing from level 0 to level 5 autonomy.

日益复杂的代理系统。生成式人工智能应用越来越不仅仅是自动完成或供人类审阅的初稿;它们现在能够自主解决问题、访问外部工具并代表我们端到端解决问题。我们正在从 0 级自主稳步向 5 级自主迈进。

System-wide optimization. Rather than embed in a single human user’s workflow and make that individual more efficient, some companies are directly tackling the system-wide optimization problem. Can you pick off a chunk of support tickets or pull requests and autonomously solve them, thereby making the whole system more effective?


Parting Thoughts


As we approach the frontier paradox and as the novelty of transformers and diffusion models dies down, the nature of the generative AI market is evolving. Hype and flash are giving way to real value and whole product experiences.


At Sequoia we remain steadfast believers in generative AI. The necessary conditions for this market to take flight have accumulated over the span of decades, and the market is finally here. The emergence of killer applications and the sheer magnitude of end user demand has deepened our conviction in the market.


However, Amara’s Law—the phenomenon that we tend to overestimate the effect of a technology in the short run and underestimate the effect in the long run—is running its course. We are applying patience and judgment in our investment decisions, with careful attention to how founders are solving the value problem. The shared playbook companies are using to push the boundaries on model performance and product experiences gives us optimism on generative AI’s second act.


If you are building in the AI market with an eye towards value and whole product experiences, we would love to hear from you. Please email Sonya ([email protected]) and Pat ([email protected]). Our third coauthor does not have an email address yet, sadly :-).

如果您正着眼于价值和整体产品体验,在人工智能市场进行建设,我们很乐意听取您的意见。请发送电子邮件至 Sonya ([email protected]) 和 Pat ([email protected])。很遗憾,我们的第三位共同作者还没有电子邮件地址:-)。

原文始发于微信公众号(KK安全说):Generative AI’s Act Two(投稿)

  • 我的微信
  • 微信扫一扫
  • weinxin
  • 我的微信公众号
  • 微信扫一扫
  • weinxin
  • 本文由 发表于 2023年9月25日14:29:18
  • 转载请保留本文链接(CN-SEC中文网:感谢原作者辛苦付出):
                   Generative AI’s Act Two(投稿)


匿名网友 填写信息