The 2021 machine learning, AI, and data landscape

It has also been a year of multiple threads and stories intertwining.

One story has been the maturation of the ecosystem, with market leaders reaching large scale and ramping up their ambitions for global market domination, in particular through increasingly broad product offerings. Some of those companies, such as Snowflake, have been thriving in public markets (see our MAD Public Company Index), and a number of others (Databricks, Dataiku, DataRobot, etc.) have raised very large (or in the case of Databricks, gigantic) rounds at multi-billion valuations and are knocking on the IPO door (see our Emerging MAD company Index).

But at the other end of the spectrum, this year has also seen the rapid emergence of a whole new generation of data and ML startupsWhether they were founded a few years or a few months ago, many experienced a growth spurt in the past year or so. Part of it is due to a rabid VC funding environment and part of it, more fundamentally, is due to inflection points in the market.

In the past year, there’s been less headline-grabbing discussion of futuristic applications of AI (self-driving vehicles, etc.), and a bit less AI hype as a result. Regardless, data and ML/AI-driven application companies have continued to thrive, particularly those focused on enterprise use trend cases. Meanwhile, a lot of the action has been happening behind the scenes on the data and ML infrastructure side, with entirely new categories (data observability, reverse ETL, metrics stores, etc.) appearing or drastically accelerating.

To keep track of this evolution, this is our eighth annual landscape and “state of the union” of the data and AI ecosystem — coauthored this year with my FirstMark colleague John Wu. (For anyone interested, here are the prior versions: 20122014201620172018, 2019: Part I and Part II, and 2020.)

For those who have remarked over the years how insanely busy the chart is, you’ll love our new acronym: Machine learning, Artificial intelligence, and Data (MAD) — this is now officially the MAD landscape!

We’ve learned over the years that those posts are read by a broad group of people, so we have tried to provide a little bit for everyone — a macro view that will hopefully be interesting and approachable to most, and then a slightly more granular overview of trends in data infrastructure and ML/AI for people with a deeper familiarity with the industry.

Quick notes:

  • My colleague John and I are early-stage VCs at FirstMark, and we invest very actively in the data/AI space. Our portfolio companies are noted with an (*) in this post.

Let’s dig in.

The macro view: Making sense of the ecosystem’s complexity

Let’s start with a high-level view of the market. As the number of companies in the space keeps increasing every year, the inevitable questions are: Why is this happening? How long can it keep going? Will the industry go through a wave of consolidation?

Rewind: The megatrend

Readers of prior versions of this landscape will know that we are relentlessly bullish on the data and AI ecosystem.

As we said in prior years, the fundamental trend is that every company is becoming not just a software company, but also a data company.

Historically, and still today in many organizations, data has meant transactional data stored in relational databases, and perhaps a few dashboards for basic analysis of what happened to the business in recent months.

But companies are now marching towards a world where data and artificial intelligence are embedded in myriad internal processes and external applications, both for analytical and operational purposes. This is the beginning of the era of the intelligent, automated enterprise — where company metrics are available in real time, mortgage applications get automatically processed, AI chatbots provide customer support 24/7, churn is predicted, cyber threats are detected in real time, and supply chains automatically adjust to demand fluctuations.

This fundamental evolution has been powered by dramatic advances in underlying technology — in particular, a symbiotic relationship between data infrastructure on the one hand and machine learning and AI on the other.

Both areas have had their own separate history and constituencies, but have increasingly operated in lockstep over the past few years. The first wave of innovation was the “Big Data” era, in the early 2010s, where innovation focused on building technologies to harness the massive amounts of digital data created every day. Then, it turned out that if you applied big data to some decade-old AI algorithms (deep learning), you got amazing results, and that triggered the whole current wave of excitement around AI. In turn, AI became a major driver for the development of data infrastructure: If we can build all those applications with AI, then we’re going to need better data infrastructure — and so on and so forth.

Fast-forward to 2021: The terms themselves (big data, AI, etc.) have experienced th ups and downs of the hype cycle, and today you hear a lot of conversations around automation, but fundamentally this is all the same megatrend.