Data scientists will have a bumper year in 2023 as governments invest heavily in applying AI and algorithms to public policy. The European Commission has committed €1.3 billion ($1.38 billion) to research and innovation under the Digital Europe Programme. The UK government is funding £117 million ($143.6 million) for PhDs in AI, and it’s already on the second year of its 10-year plan to “make Britain a global AI superpower.” Examples of ongoing initiatives include the National Health Service’s use of AI to identify abnormalities in CT scans and the Department for Work and Pensions’ efforts to detect fraud in universal credit applications.
While the promise of these technologies is exciting, the new tools will only be useful if the data that feeds into them is accurate and complete. However, in 2023 most government data will still be inaccurate or full of holes.
For instance, outside of a census year, the UK doesn’t have accurate data on the size of its population, the scale of immigration, or the nature of inequality affecting groups like ethnic minorities and the LGBTQ+ community. More than 15 percent of the land owned in England and Wales remains unregistered, meaning that we still don’t know who owns large swaths of the country. The UK Statistics Agency has stripped and recorded the crime of its “national statistics” status because the measures used to track it were so inaccurate. Similarly, there’s still no agreement on how to estimate poverty in the country, making the problem harder to tackle.
Bad data has also been responsible for a raft of major policy mishaps, wasted public funds, and harm to people’s lives. Bad data is why people in the UK have been wrongly deported and accused of being illegal immigrants, as happened during the Windrush scandal. Bad data was behind a childcare benefits scandal in the Netherlands, where benefit claimants were wrongly accused of fraud because a government algorithm had been programmed to identify people with dual nationalities as more likely to commit the crime.
The reality is, when it comes to collecting and analyzing national statistics, many governments around the world are severely underresourced. Globally, one in four children “don’t exist”—their birth was never registered. Only eight of the 54 countries in Africa have fully accurate mortality figures. Large parts of the globe remain digitally unmapped; In India, only 21 percent of the road network exists in digital format. Over half of the world’s countries still don’t have any recent data on eight of the 17 sustainable development goals—targets for the improvement of people’s lives that all UN countries have agreed to try to achieve by 2030. Without data, progress is impossible.
The promise of AI and big data analytics in fields like health care will be severely diluted if existing government data is outdated and of poor quality. Private intent data—such as from mobile phones and internet traffic—can plug some gaps, as it did for governments during the Covid-19 pandemic. But private companies’ data is itself flawed and generated without the transparency and accountability government data promises. For instance, when the Israeli government started using mobile phone records to track people’s movements to better understand the spread of Covid-19, its supreme court ruled the initiative a breach of privacy.
That said, 2023 will see incremental progress. The UK’s NHS has announced a project to address the gaps in its ethnicity data, for instance. The Democratic Republic of Congo will also be conducting its first census since 1984, an arduous task that will produce valuable information on some of the world’s poorest individuals. These are steps in the right direction, but there is a long road ahead.