Welcome to the MediaGamma blog

The gap is widening between those with a handle on their AI strategy and those only scratching the surface of its potential.

Is there an artificial intelligence reproducibility crisis: What does it mean for business?

Posted by Victor Malachard

artificial intelligence reproducibility

Is there an artificial intelligence reproducibility crisis: What does it mean for business?

The widespread use of artificial intelligence (AI) is one of the most important scientific and technological developments of our time. In the business world, AI adoption is growing at a rapid pace, as a growing number of organisations deploy AI to add value, automate processes and solve specific problems. Unsurprisingly, the use of AI in standard business processes has increased by nearly 25% year-over-year. By 2030, AI is projected to add up to $13 trillion of value to the global economy. 

AI applications that incorporate machine learning (ML) and deep learning (DL) processes can transform business through improving decision-making quality and speed, and helping organisations extract valuable insights from raw data. However, a growing number of experts are voicing their concern that AI will not reach its full potential until researchers can independently verify the data behind the decisions. In AI research, ‘transparency’ and ‘traceability’ have become buzzwords, and the underlying issue is reproducibility. 


What is reproducibility?

Reproducibility is the ability to repeat an experiment and reach the same results, and it’s a vital component of scientific study. The point of reproducibility isn’t to replicate the results exactly. Given the randomness inherent in neural networks and the variations in code and hardware, an exact replication would prove impossible in the vast majority of cases. Instead, the point of reproducibility is to provide documentation that offers others a basic idea of how they might reach similar conclusions within the context of their own work. Documentation also helps researchers gain a better understanding of their own projects.

Practical AI Guide

However, in AI, researchers have famously struggled to replicate each other’s results, even given vast financial, human and technical resources. One of the most notable examples of this was Facebook’s attempt to replicate Deepmind’s AlphaGo, a program unleashed to play Go, an ancient game similar to chess. Facebook’s team struggled with the lack of available code, as well as the daily computational requirements needed to conduct millions of experiments across thousands of devices. With time, Facebook’s team eventually succeeded in their efforts. All the same, their struggles highlight an important issue: if a large team with virtually unlimited financial resources found this task nearly impossible, what chance does a small team have?


Is it actually a crisis?

Most people refer to the lack of transparency within AI as a ‘reproducibility crisis’. While ‘crisis’ comes with heavy connotations, ML experts believe the term is warranted. In fact, a survey of ML experts by AI researcher Joelle Pineau indicates that 35% of experts believe there is a ‘significant’ crisis and 45% believe there is at least a ‘slight’ crisis. Only 10% felt there was no crisis at all. Although these experts would likely argue that the word ‘crisis’ fits, it might be more helpful for the advancement of the field to frame the issue as an opportunity for improvement.

Whether you choose to view this problem as a crisis or an opportunity, the lack of transparency in AI has bred a level of scepticism. It might even be eroding the public’s confidence in the AI economy. That said, it’s helpful to have a little perspective and understand that it’s not just data science that’s suffering: a 2016 survey of more than 1,500 scientific researchers across various fields revealed that approximately 70% had failed in their efforts to reproduce other researchers’ experiments. More shockingly, roughly half had failed to reproduce the results of their own experiments. 


Why does reproducibility matter?

In all areas of science, reproducibility plays a vital role. It contributes to a common body of knowledge and assures the quality of research. Simply put, when a project is not reproducible, it’s difficult to prove its efficacy. A lack of reproducibility also presents a barrier for others to recognise a project’s success, and thus its value. In a business setting, a lack of reproducibility can lead to an inability to secure budget and other forms of organisational support. Businesses should also consider the issue of changing teams and employee turnover. For instance, if a company’s head of AI moves on to a new job or even just takes family leave, it’s critical that the person filling the role can step in and carry on with the project.

In AI research, reproducibility also enables AI teams to determine which ML system works best to solve a specific problem. This knowledge can prove valuable to businesses, especially those working with smaller teams and smaller budgets. Different types of projects suit different budgets, and sometimes the simplest, and the more cost-effective, project might be the most appropriate, particularly for addressing narrow business problems. 

In a broader sense, reproducibility is a critical part of successful AI projects for a few reasons:

  • It allows researchers to verify previous findings independently
  • It creates a solid foundation on which researchers can base future projects and potential breakthroughs
  • It establishes standards and baselines by which future progress can be measured

Reproducibility is crucial for the successful future of the AI field because it can assist with:

  • Identifying new research areas
  • Investigating algorithms as they grow and change
  • Helping research become more effective


Factors driving the reproducibility crisis

We already know that the concept of reproducibility is inexorably linked with traceability and transparency. But what’s actually driving AI’s reproducibility crisis? Here are some of the key factors:

  • Missing information: An analysis of 30 AI papers revealed that the majority of the projects were challenging to reproduce due to missing information about methodologies, including study parameters and data sets, as well as oversimplified assumptions and implementation details. Exacerbating this is the fact that subtle tweaks often go completely unreported in publications. Google research engineer Pete Warden stresses that many ML and DL data scientists find it challenging to record the countless steps in building an ML model. He even goes so far as to call ML “the worst environment I’ve ever found for collaborating and keeping track of changes.” But why is this the case? To answer that, let’s look at the next point.
  • Complexities: AI is complicated. The neural networks used in DL are so opaque that they are often referred to as “black boxes”. By their very nature, DL experiments are multifaceted and ever-growing, involving many layers of processes (training runs, software updates, file changes, algorithm adjustments, and so on). Given this level of complexity, it’s easy to see how even the most detail-oriented researchers might fail to record a step, rendering it challenging for the next researcher to reproduce the results. This also makes it difficult for researchers to pinpoint the root cause of variations in expected vs actual results. Even if everything is thoroughly documented, other issues may arise. For instance, ML frameworks prioritise performance over exact numeric determinism, which means there will almost inevitably be variations in the final results.
  • A lack of universal standards: It’s difficult to imagine scientists in other fields operating within and advancing their field without agreed-upon units of measure. However, at present, there are no universal standards that govern data capture and curation, or processing techniques, and these are what ultimately give meaning to AI experiments. A lack of universal benchmarking or best practice for implementing and recording processes can prove a substantial barrier to innovation, especially when considering the number of iterations involved in the development of ML tools. 
  • Market incentives: AI research is exciting, and so it makes sense that parent companies often encourage AI labs to “shoot for the moon”. There’s value in achieving buzzworthy results, and many companies also wish to make their projects difficult for competitors to copy. If a company invests its resources in developing a tool, it doesn’t want to give its process (and thus its ROI) away for free. Unfortunately, all of this means that researchers often prioritise their research outputs over documenting their methodology.
  • Proprietary and sensitive data concerns: For some AI teams, proprietary data and code can prove a challenge to transparency and thus reproducibility. Companies working with sensitive health data, for instance, might be legally and ethically unable to share vital components of their research.

What does reproducibility mean for businesses?

Although some organisations might believe that holding their secrets close gives them a competitive advantage, an increasing number of experts are arguing that reproducibility drives innovation. Complete and traceable data chains can help researchers understand the past failures and successes of their own projects and those of their peers. Contributing to the larger body of knowledge in this way can help lay a smoother, stronger foundation for future successes. Proponents of transparency contend that shared knowledge benefits everyone.

Along the same lines, a lack of reproducibility can negatively affect innovation. Technology is a fast-moving field, and without reproducibility, many businesses have found that building a new AI application or updating an existing one means starting from scratch. Due to the high costs of AI projects, retracing steps can prove detrimental to a company’s bottom line. It also means your AI team is spending valuable time redoing something that has already been done, instead of breaking new ground. 

The ability to measure improvement is just as crucial to business as it is to science. As we already know, a lack of reproducibility makes it challenging to measure progress. Unable to reproduce an earlier success, an AI team will almost inevitably struggle to demonstrate the value of their project to key stakeholders. Again, this provides a barrier to justifying a project’s budget and illustrating its potential value within the organisation.


How can businesses help solve the crisis?

Remember: although experts have framed this as a crisis, businesses can also view the reproducibility issue as an opportunity. As chair of the UK Reproducibility Network’s steering committee of researchers, Mark Munafo puts it, “We’re now at the point of thinking, what could we change about the ways in which we work that might improve the quality of what we do?”

As more businesses adopt and deploy AI, they should consider the value of pushing for the proper documentation that reproducibility requires. As a business, choosing to prioritise reproducibility can help you remain at the cutting-edge. Being aware of changing best practices is a great first step. For instance, a number of experts are calling for universal reproducibility frameworks, consisting of:

  • Complete documentation
  • Clear limits of inference governing autonomous data analysis, prioritising quality over quantity when it comes to training data and study parameters
  • Training data sets put into context and designed within clear parameters

In both the academic and private sectors, AI researchers are taking steps to help the field achieve increased levels of transparency. McGill University’s AI conference now asks researchers to submit a reproducibility checklist, with items that researchers often previously omitted from their publications, such as the number of models trained, the computing power used and vital information about source code and data sets. In 2019, the Allen Institute for AI released a paper expanding upon the reproducibility checklist and proposing a solution by providing more data about experiments. Researchers at Google have devised model cards to illustrate how ML systems have been tested, including results that indicate potential bias. 

Although moving towards reproducibility is a wise choice for future-proofing your business and your AI projects, most experts in the field do recognise that it can be difficult to make generalisations about how researchers should report results. After all, these generalisations could conceal the complexities that colour how researchers choose the best models. That said, for most businesses, making every attempt to build AI projects on verifiable and trustworthy foundations is the best, and most valuable, way forward.


The future is traceable

Reproducibility allows businesses to save valuable times and resources, learn from past successes and mistakes, and avoid the costly process of retracing steps each time an AI application requires an update. Making AI projects more transparent also makes it significantly easier to prove a project’s value to key stakeholders within an organisation. When it comes to adopting a policy of reproducibility, keeping up with changing best practices and standards within the industry is a great place to start.

However, for AI to reach its full potential, a balance must exist between adhering to reproducibility standards and allowing AI teams the freedom to do their work and experiment with different methods. One of the best ways to strike this balance is to work with a consultancy. A consultancy can help you manage all aspects of your AI projects, including research design and documentation. The future is traceable--working with experts can ensure that your AI projects are too.


If you're not using AI, chances are your competitors already are ahead of you by doing so. Find out how AI can be deployed so that your business gains competitive advantage. 

Get in touch

Victor Malachard
Victor Malachard
Executive Chairman

Related Posts

Google Privacy Sandbox: What it means for your programmatic ads

Google’s Privacy Sandbox looks like it might be the end of programmatic advertising as we know it, with third-party cookies largely set to become a thing of the past, and reliable automated data collation at risk along with them. The only way to prevent this from significantly unravelling online marketing efforts is to focus on shifting data landscapes, sources, and handling on a business-wide scale.

Is AI the right solution for your business: a 2020 assessment guide

AI adoption is on the rise at a rate of around 25% year-on-year. That’s an increase of approximately 270% in the last four years alone, and it’s a figure that set to continue rising as big data and tech potential continue to grow hand in hand. Still, despite this popularity, AI is far from a panacea for everything, as is evidenced by the fact that only 23% of businesses report using such solutions regularly. More disturbingly, a reported 85% if AI projects fail to deliver on their intended goals.  

How to train your own NLP model transformers

In recent years, search engines have led the way in natural language processing (NLP) and the benefits it stands to bring across business landscapes. Focused around machine learning capabilities, applications like Google’s BERT work to process natural language without human input. Used right, such systems can remove redundant work processes for faster and more efficient handling, analysis, and translation of written or spoken input from various sources.