Read Full Report
The report is best read on a desktop to navigate the interactive data visualizations. If you are on mobile, you can read more about the findings and the critical technical roles for AI productization here.
A recent Technology Quarterly by The Economist was subtitled: “After years of hype, many people feel AI has failed to deliver”. It considered two core problems: access to the required data, and that algorithms weren’t all that smart, yet. This is unsurprising since so much of the communication and media about artificial intelligence (AI) and machine learning (ML) promised magic algorithms if given the right data.
However, this perspective is only partly true, as there is much unrealized potential of AI that can be achieved with currently available data and algorithms—given enough of another scarce resource to apply it: talent. Our previous estimates of the talent pool have focused on AI researchers—those who are making the algorithms smarter and working on ever larger data sets—and this focus perhaps added to the illusion that that was all that was needed.
Taking AI from research to real-world impact is a long value chain that depends on a range of skill sets and experience. It is common to see people who can, and do, fill multiple roles as some of the rarer skills are in need across the value chain. Even so, we think it is useful to categorize and explore these roles separately to better understand what it takes to build and run AI solutions,1 and what accessibility to that talent looks like.
This year, we’ve added estimates of several other critical specialized technical roles in the value chain of developing an AI product: ML engineering, technical implementation, and data architecture. We measured the size of the available talent pool for industry through self-reported data on social media, and demand via the monthly total job postings for the same roles.
We also expanded our view of those contributing to AI research, and have gone from using conferences as a proxy, whose limited seats don’t capture the full growth of the ecosystem, to arXiv. ArXiv is where researchers pre-publish their papers2 and is perhaps the closest thing there is to a census of AI research. It also gives a much broader view of AI growth by including papers on applied methods.
Here are the brief highlights
The total number of authors publishing AI papers each year on arXiv has grown an average of 52.69% per year since 2007. Last year saw 58,000 authors and we estimate it to be 86,000 by year-end. Among the four industry roles we looked at for the talent pool we estimate there to be about 478,000 people (for more details, see section ”The Specialized Technical Roles of the AI Value Chain” in the report).
The national rankings are in the two graphs below, and should be viewed with an important note on methodology: we have assigned location based on the location of the headquarters of the organization the author is affiliated with, in order to emphasize where the intellectual property (IP) is owned. This choice gives significant weight to countries like the United States whose Big Tech companies have AI research labs all over the world.
As we have in the past, we continue with several quantitative analyses of the pool of research talent, this time using the data from arXiv rather than conferences (for full detail, see report section “The People Making AI Smarter”).
1 – Talent remains global and mobile (at least pre-pandemic)
Our analysis shows that at least for the people contributing to research, the talent pool truly is global. Collaborations continue across the world, except for the Global South, whose ecosystems are much earlier in their development. Ireland outranks everyone with over 15 average collaborations per author, while most countries sit around 4-6.
Migration may be a dated metric with the pandemic and remote work, but it does help to show which countries (or companies) have more pull versus those who seem to supply others, as well as those who seem to be more insular with relatively few people coming and going. Of course, the US has by far the biggest pull of anyone. This may be partly undone by new visa policies, creating a big opportunity for their neighbors, but the many international labs of its big tech companies may mitigate some of this.
2 – The gender balance has made little gains
Gender3 in arXiv is even less balanced than we had seen when looking at conferences, and has barely moved since 2007, only going from 12% to 15% today.4 The ratios do vary significantly country to country, but the pandemic has more greatly affected the output of female researchers and so this year could be a step backward overall. Another factor is in who contributes to publishing, which our self-representation data sets help shed light on.
3 – Few of those in industry appear to do full-time fundamental research
On job-related social media we see that there are only about 4,100 people presenting themselves as professional researchers for industry, perhaps indicating a number of publishers working primarily as engineers or another role and doing research part-time. We have also found this to be anecdotally true among our peers. While this may be out of a preference for applied work, we have found that the proportion of demand for pure researchers in industry is equally small (about 1.7% of job postings) and that pure research work in industry isn’t widely available.5 Adding data on graduation rates, employment records in academia and private research labs, and understanding the average publishing lifespan and output of a researcher will help with answering more clearly how industry impacts the output of research.
This phenomenon of people splitting time between applied work and research could further affect the gender balance in research contributions. Some reporting has shown that women are less likely to get jobs that allow them to continue contributing to research. With the challenge of getting jobs doing pure research after graduation, this puts even more pressure on incoming cohorts to include high numbers of women to affect the overall ratio of women’s contributions to research.
4 – Demand for new roles has been steady, but saw a big dip in 2020
Unfortunately, data on demand is limited at this stage, but we can get some insight from tracking monthly job postings. What we have found is that the proportionate mix of specialized engineering, technical implementation, and research roles in demand is closely matched with the mix of supply: about 61% for implementation roles building the software around AI capabilities, 38% for AI engineering roles building the core AI capabilities, 1% for researchers. We do not know the aggregate amounts, but the monthly flows were growing steadily in 2019 around 2-6% for different job titles. Unsurprisingly, we also saw 20-30% drops in demand for the relevant job titles during 2020, but both 2019 and 2020 show significant outliers for those entering the scene and persisting through the pandemic.
5 – Emerging markets are showing signs of catching up
Brazil and India are growing hotspots based on supply of the roles we tracked. They are perhaps normally discounted given the emphasis the industry puts on fundamental research, but these and other emerging markets (especially in South America) have a “leapfrog” opportunity of jumping right into implementation, as AI tooling becomes democratized and the training needed is more accessible via online courses. In African countries such as Nigeria, Ethiopia, and Rwanda, there is a new approach to the pre-existing push for digitalization that bakes in AI at its core.
Talent as friction to AI success
Whether the full potential of AI is over-hyped is for another discussion, but we can say that practical AI success is not just a formula of high-level experts and access to the right data. The AI industry initially focused on the very high-level experts because only they could administer the new techniques emerging with the right knowhow for applying them to novel areas. Now there is a recognition that the dynamics of this new technology require more than just engineers and people who can build nice models in order to deploy it effectively.
AI is a new generation of software that is adaptive to the data fed to it; it is coded with data rather than logical rules. Traditional software is static by comparison, and AI needs a new ecosystem of support and infrastructure to not only be built but also governed once it is deployed. For AI to work at scale, lots of new talent is needed for engineering, building infrastructure, developing new business models, and monitoring objectives.
A survey by the European Commission recently found businesses identified access to the right skill sets as being the number on impediment to adopting AI. Our Global AI Talent Report is just a first look at these professional roles and how mismatches in demand and supply may continue. A lot remains to be done to paint a full picture.
As AI matures it will become more pervasive. We will see new specialized roles emerge for managing the new dynamics of AI, but eventually everyone will need to update their digital skills to collaborate with this new technology. Already, we have seen that most people can grasp the concept of an AI-powered recommendation algorithm, and adjust their behavior to affect the output of the algorithm. However, people have very limited choice and only blunt tools for manipulating an algorithm to their needs. When the different tooling and skill sets standardize along the value chain, it will vastly increase the choice and access to AI technology and engender far more innovation than we have yet seen with AI software.
To get there, we have a challenge of bridging the gap between proof-of-concept in the lab and real world deployment. Researchers and engineers play an important role right now in helping close that gap, but they cannot do it alone. They, and the institutions that train them, need to focus on standardizing their tools and processes so that others can more easily collaborate with others down the value chain.
- Our categories emphasize the aspects of roles in building an AI solution, as opposed to running it, though the needed skill sets could cover both.
- There is a strong acceptance for pre-publishing in AI as it is easy to test out methods and see if they are replicable and useful. Granted, we did not sort for popularity as a signifier impact but only to see where the volume of research is.
- The gender measure was based on the names of the authors. We recognize this as a crude measure due to the ambiguity of many names and of course how people may not identify with either gender.
- At conferences it was only slightly better, last year’s report showed the proportion of women published at the top conferences was 18%.
- There is also a possibility that most people stay in research in academia and simply do not create job profiles for themselves, and that the professional degrees and online training in AI are making up a greater proportion of the supply thanks in part to being significantly shorter than research oriented PhD tracks.