When FIFA Opened the Data: How the World Cup Is Changing the Way We Understand the Game

For most of my career, the richest data in elite football sat behind closed doors. Tracking systems, multi-camera optical feeds, possession context, off-the-ball movement — the kind of information that actually explains why a match unfolds the way it does — was the private property of a handful of federations and clubs wealthy enough to own it. Everyone else worked with shots, corners, fouls and possession percentage, and squinted to fill in the rest.

That is changing, and the FIFA World Cup has become the most visible stage for the shift. Over the last two tournaments FIFA has done something I think is genuinely important for our field: it has taken some of the most modern performance data ever produced in football and made a meaningful slice of it public. As someone who has spent years arguing that data only becomes knowledge when people are allowed to interrogate it, I find this exciting — so I wanted to write down where this data lives, who is doing interesting things with it, and why the act of sharing matters as much as the numbers themselves.

FIFA’s initiative: Enhanced Football Intelligence

The centrepiece is Enhanced Football Intelligence (EFI) — the set of metrics that first appeared as those small graphics in the corner of the screen during the 2022 World Cup. EFI was built by FIFA’s Football Performance Analysis & Insights team to move us beyond traditional counting stats and toward metrics that describe how a team plays.

What makes EFI different is its source. Rather than relying only on on-the-ball event data, it combines event data with live tracking data from every player on the pitch, captured by a multi-camera optical system. When something happens, you know where all twenty-two players were in relation to it. That positional context is what unlocks metrics you simply cannot derive from a traditional stats sheet, including:

  • Line breaks — how often a pass cuts through an entire defensive unit, and whether it went through, around or over. FIFA’s own analysis shows that the more line breaks a team concedes in midfield, the more games it tends to lose.
  • Ball recovery time — how long, on average, it takes a team to win the ball back after losing it.
  • “In contest” possession — the honest third category that sits between “our ball” and “their ball,” capturing the messy phases when nobody is truly in control.
  • Receptions behind the midfield and defensive lines — where and how players make themselves available between the opponent’s units.
  • Pressure on the ball and forced turnovers — whether a side is genuinely disrupting the opponent or merely looking busy.

Crucially, FIFA makes this available to the public. The metrics are explained — with video and multilingual PDFs — on the FIFA Training Centre, and snippets of match data are published after games. For the 2026 tournament FIFA has gone further still, layering on two new initiatives: the FIFA Power Rankings, an objective player-rating system scoring every outfield player 0–10 for attacking, creativity and defending using EFI algorithms; and FIFA AI Pro, which gives all 48 teams the same generative-AI tools to explore match data and rebuild moments in 3D — explicitly framed as democratising analytics that used to belong only to the biggest budgets.

Where to find the data and the people working with it

Here are the sites I’d point any coach, analyst or curious fan toward. I’ve grouped them deliberately, because one distinction matters a great deal and is easy to miss: some of these work directly from data officially published by FIFA, while others produce excellent World Cup analysis using third-party providers such as StatsBomb. All are worth your time — but knowing which is which keeps you honest about where a number actually came from.

The official source

FIFA Training Centre

This is the well from which everything else is drawn. Free, open, and aimed at coaches of every level, it hosts the EFI metric explainers, the “Football Language” glossary, video breakdowns, the FIFA Insight interviews with the people who built EFI, and — the part I’d flag for 2026 — the Match Report Hub, a live index of post-match summary reports for every World Cup match, organised by group and added as the tournament unfolds. If you only bookmark one link from this article, make it this one.

Working directly with FIFA’s published reports

Tactics Journal

The sharpest example I’ve seen of someone treating FIFA’s openness as a system rather than a curiosity. Tactics Journal points out that FIFA is publishing a roughly 52-page post-match summary for every group-stage game — formations, pressing phases, line-break tables player-by-player, defensive pressure maps, sprint-zone physical data — all in a consistent structure. Their argument is that because every report follows the same layout, you can parse all of them into a structured, auditable evidence layer for scouting and tournament-wide tactical questions. They call it “tactical infrastructure,” and I think that’s exactly the right way to think about it: the PDF is content; the structured layer you build from it is infrastructure.

EFI Data Reference

An independent blog that does something quietly useful: it organises and explains the EFI metrics as a reference, pulling together the definitions of phases of play, line heights, team lengths, receptions and the rest into one navigable place. A patient companion to the official material that helps a newcomer go from “what is a line break?” to actually reading a match through that lens.

Doğan Parlak’s open-source EFI implementation

My favourite example of why open methods matter as much as open data. Parlak built an open-source implementation of FIFA’s EFI metrics — data, concept and visualisation layers — with the explicit goal of reproducing FIFA’s match reports and testing whether the published concepts are specified well enough to be rebuilt by an outsider. That is science in the best sense: take the published method, try to recreate it, and flag the ambiguities. For analysts and students it doubles as a practical toolkit for generating EFI-style visualisations. You can read his Master’s thesis on this project here.

The wider analytics and data-journalism ecosystem

These don’t all run on FIFA’s own feed, but they show what a culture of shared football data makes possible — and they’re some of the most engaging World Cup analysis being published right now.

Northeastern Global News — NGN Offside / NetSI Sport

A blog “powered by data science and written by journalists,” produced by Northeastern’s Network Science Institute (the NetSI Sport group led by Brennan Klein). It’s a masterclass in turning event data into narrative: passing networks and passing-cluster maps that fingerprint a team’s style, xG shot maps, dribble-and-carry graphics, and genuinely novel angles like whether the 2026 hydration breaks are changing scoring patterns. Worth knowing that their underlying data comes from Hudl StatsBomb (over 3,400 events per match), not FIFA’s EFI feed — a good illustration of how the official and commercial data worlds sit side by side.

Datawrapper — Data Vis Dispatch

Not a football site at all, but a weekly roundup of the best data visualisations from newsrooms around the world, and during the tournament it’s been a reliable showcase of World Cup charts — qualification journeys, the evolution of the match ball, player and game analyses from the likes of Reuters, The New York Times and El País. The best place to see how professional data journalists choose to present this kind of information. Some details are also here.

Microsoft Fabric Community — FIFA World Cup 2026 Stats Analysis Hub

A reminder that you don’t need a newsroom to do this. This is a community-built interactive Power BI dashboard — one example aimed at World Cup Fantasy players, with a “Stat View” toggle to compare a player’s club versus international form before making transfer decisions — sitting within Fabric’s wider Data Stories Gallery of user-made World Cup dashboards. A nice window into the grassroots, build-it-yourself end of the spectrum.

(Two more worth a look in the same spirit: Tactical Football Analysis for written post-match tactical breakdowns, and Flourish, whose football chart templates are a quick way to build your own tournament visualisations.Also, have a look at this substack on how to build team-shape visualisations.)

What the analysis actually looks like

The reason this data is worth sharing is that it produces visuals that change how you see a match. A few of the workhorse formats:

  • Pass networks map who connects to whom and where, turning a team’s structure into a readable shape — you can see at a glance whether a side is building through its full-backs, overloading one flank, or bypassing midfield entirely.
  • Tracking heatmaps show where players and teams actually spend their time, exposing the difference between nominal position and real behaviour, and revealing how compact or stretched a side is in and out of possession.
  • xG (expected goals) shot maps put a probability on every chance so you can judge whether a team created genuine danger or simply accumulated low-value shots — useful, as long as you remember it’s a guide, not a verdict.

The principle underneath all of them is the same one I keep coming back to: the best statistics don’t make football more complicated, they make it easier to understand. A line-break count, a compactness number or a clean sprint map isn’t there to replace the coach’s eye — it’s there to sharpen the question the coach is already asking as well as providing additional information for support staff to improve how players are prepared.

The other half of the story: injuries and player health

It would be a strange omission, for me especially, to write about World Cup data and talk only about what happens when players are on the pitch. The same tournament that generates all this performance data also generates a great deal of information about the cost of playing it — and FIFA itself frames its data mission broadly, as unlocking the potential of video and data to drive technical development and education, not just tactical analysis.

This World Cup has put player load firmly in the spotlight. The expansion to 48 teams and 104 matches, layered on top of an already congested club calendar, has prompted warnings from sports-medicine specialists that fatigue and tight scheduling are pushing injury rates up — with the knee particularly exposed to the constant cutting, pivoting and rapid changes of direction the modern game demands. FIFA’s own venue medical staff have pointed to the injuries they see most often in elite players: ankle sprains, and hamstring and calf strains. None of that is new in kind, but the volume and the schedule are. At the end of each tournament there is always a published paper on the injury surveillance activities conducted by the FIFA medical team. If you want to read more about our experience at the FIFA 2022 World Cup you can read the following papers:

Serner A, Chamari K, Hassanmirzaei B, Moreira F, Bahr R, Massey A, Grimm K, Clarsen B, Tabben M. Time-loss injuries and illnesses at the FIFA world cup Qatar 2022. Sci Med Footb. 2025 Aug;9(3):275-282. doi: 10.1080/24733938.2024.2357568. Epub 2024 Jun 11. PMID: 38860817.

Schumacher YO, Kings D, Whiteley R, Dharman A, Taqtaq G, Mc Court P, Alkhelaifi K, Targett S, Holtzhausen L, Pieles GE, Dzendrowskyj P, Zikria BA, Bordalo M, Al Hussein I, D’Hooghe P, Al-Kuwari A, Cardinale M. Medical services at the FIFA world cup Qatar 2022. Br J Sports Med. 2023 Oct 27;58(1):42–9. doi: 10.1136/bjsports-2023-106855. Epub ahead of print. PMID: 37890964; PMCID: PMC10804010.

Bordalo M, Serner A, Yamashiro E, Al-Musa E, Djadoun MA, Al-Khelaifi K, Schumacher YO, Al-Kuwari AJ, Massey A, D’Hooghe P, Cardinale M. Imaging-detected sports injuries and imaging-guided interventions in athletes during the 2022 FIFA football (soccer) World Cup. Skeletal Radiol. 2025 Apr;54(4):819-828. doi: 10.1007/s00256-023-04451-z. Epub 2023 Sep 16. PMID: 37715819; PMCID: PMC11845536.

Bordalo M, Evans T, Allenjawi S, Targett S, Dzendrowskyj P, Al-Kuwari AJ, Cardinale M, D’Hooghe P. Management of radiology services during the 2022 FIFA football (soccer) World Cup. Skeletal Radiol. 2025 Apr;54(4):647-653. doi: 10.1007/s00256-023-04486-2. Epub 2023 Nov 9. PMID: 37943308; PMCID: PMC11845430.

Alsenoy KV, Raisi LA, Shamsi FA, Thomson A, D’Hooghe P. Service Planning and Provision During Qatar’s 2022 FIFA World Cup and 2023 AFC Asian Cup: Sports Podiatry. J Am Podiatr Med Assoc. 2025 Sep-Oct;115(5):24-176. doi: 10.7547/24-176. PMID: 41166158.

For the public, the most visible “injury data” is the running tracker — outlets such as ESPN, The Independent and others maintain live lists of who is ruled out or racing to be fit, and this tournament has tested several squads hard, with sides like Brazil and the Netherlands losing first-choice players before a ball was kicked. These are journalism rather than open datasets, but they perform a real function: a structured, continuously updated public record of availability.

FIFA has a long history of medical research around its tournaments, and the combination of tracking data (sprint loads, high-intensity distances, accelerations) with injury records is exactly the kind of linkage that could move us from counting injuries to understanding and preventing them. The performance data and the medical data are two halves of the same picture; sharing both is how we protect the players who generate it.

Why sharing the data matters

I want to close on the part I care about most, because it’s easy to treat “FIFA released some metrics” as a minor technical footnote. It isn’t.

It democratises insight. When tracking-derived data was private, the gap between the richest and poorest programmes was partly a data gap. Publishing EFI, and giving every World Cup team the same AI tools, narrows that gap. Insight stops being a function of budget alone.

It creates a shared language. When a coach in one country and an analyst in another can both point to the same definition of a line break or “in contest” possession, conversations get more precise and more productive. Common metrics are the grammar of a common discussion.

It invites scrutiny and reproducibility. The moment a method is public, people like Doğan Parlak can try to rebuild it, stress-test it, and improve it. That is exactly how a field matures — not by guarding methods, but by exposing them to challenge.

It stimulates new ways of analysing the game. This is the part that excites me as a scientist. Hand the same dataset to a hundred curious people and you will get analyses no single organisation would ever have commissioned. Open data is generative: it produces questions, tools and visualisations that wouldn’t otherwise exist, and the elite game gets smarter as a result.

Football has always been understood through stories and through the eye. What FIFA’s data initiative does is add a third lens — a transparent, shareable, contestable one — and then, remarkably, hand it to everyone. The numbers are interesting. But the decision to share them is what will change how we understand the game at the very highest level. So, well done to my FIFA colleagues for this initiative.

At Aspetar we have produced a special issue of our Journal dedicated to Football and the World Cup. You can access it by clicking on the cover page.

Protecting Young Athletes: reflections from my talk at the 1st CYSM Conference in London

Last Friday I had the pleasure of speaking in London at the Centre for Youth Sports Medicine (CYSM) Annual Conference 2026, on a topic close to my heart: how we protect young athletes while helping them develop. My talk was titled “Protecting young athletes: performance progressions, realistic expectations and injury prevention,” and what follows is a short summary of what I covered, the science behind it, and a new project I shared at the end

Young athletes are not mini-adults

The starting point of the talk is a simple but frequently ignored idea: a young athlete is not a scaled-down version of a senior one. Youth development sits on an individually unique and constantly changing base of physical growth, biological maturation and behavioural development. Because of this, success at an early age does not guarantee success at senior level, and selection systems in most sports quietly favour early maturers — children who happen to be bigger and stronger sooner, not necessarily those with the most long-term potential.

This matters because the way we coach, test and select needs to account for where a child actually is in their development, not simply how many years have passed since their birth.

Age is not just a number

A large part of the talk dealt with the difference between three kinds of age:

  • Chronological age — years since birth.
  • Training age — how many years a young person has spent in structured training.
  • Biological age — where the individual actually sits on the maturation curve, based on physiological markers.

Two children of the same chronological age can be years apart biologically. Around the growth spurt — the period of Peak Height Velocity (PHV) — this gap becomes especially important for training and injury risk. I walked through the methods we use to assess maturity status and timing, from skeletal age estimation (Tanner–Whitehouse, Greulich–Pyle, Fels) to anthropometric approaches based on height, leg length and body mass. I also flagged an important caution from our own work: prediction methods developed on one population can systematically over- or under-predict in another, so the difference between skeletal and chronological age should be used to put test results into context rather than as an absolute truth (for some recent work on this, read this paper from Dr Lorenzo Lolli here).

Talent, the relative age effect, and the road from youth to senior

I then turned to talent identification and the relative age effect — the well-documented bias in which athletes born early in the selection year are over-represented in youth squads and ranking lists. Across athletics and team sports such as Italian football, the data show how strongly birth-quarter skews youth selection. Yet relatively younger or later-maturing athletes often progress better when they transition to senior level — the so-called “underdog hypothesis.”

This connects to one of the central findings I shared, drawn from analyses of tens of thousands of athletes’ careers: early success is not a prerequisite for success as an adult. In jumping events in track and field, only around 8% of males and 16% of females ranked in the world top 50 at age 16 went on to reach the top 50 as seniors. In sprints, on average only about 17% of men and 21% of women were in the top 50 both as under-18s and as seniors. Different pathways can lead to similar outcomes, and many talented juniors never make the transition — for reasons ranging from maturation and selection bias to injury, burnout, loss of funding, or simply moving to another sport.

Injuries and load in young athletes

The final scientific section focused on injury. Young athletes face a distinct set of growth-related conditions — Osgood-Schlatter disease at the knee, Sever’s disease at the heel, gymnast’s wrist, Little League shoulder and elbow, apophysitis around the hip, and more. Our own prospective work in a full-time athletics academy was, to our knowledge, the first to examine growth rates and skeletal maturation as injury risk factors in a large adolescent cohort engaged in full time athletics. We found that rapid growth in stature and leg length, a younger skeletal age and a faster maturity tempo were all associated with an increased risk of bone and growth-plate injuries.

This raises hard questions about how we manage load. Most studies still quantify training simply as “time,” without accounting for what athletes actually do in that time — two athletes can train for the same number of hours doing completely different work. Better load monitoring, including the thoughtful use of wearables and AI, is one of the areas where I think we can genuinely improve.

The book chapter: The Young Athlete

Much of this material is drawn together in a chapter I co-authored with Gennaro Boccia, Paolo Riccardo Brustio, James Baker and Eirik Halvorsen Wik — Chapter 9, “The Young Athlete,” in the Sports Physician Handbook (Fourth Edition of the FIMS Team Physician Manual), edited by Pitsiladis, Yung, Hutchinson and Pigozzi (Academic Press, 2026, pp. 199–235, the link the book is here).

The chapter brings together the themes of the talk into a single reference for clinicians and practitioners: defining the elite young athlete, understanding growth and maturation and why they matter, the relative age effect and the junior-to-senior transition, performance progression and realistic expectations, and the prevention and management of youth injuries and illness — including data from recent Youth Olympic Games.

A thank you to the CYSM, ISEH and the audience

I want to thank the Center for Youth Sports Medicine and the Institute of Sport, Exercise and Health for hosting me and for the care they put into convening this event. The ISEH has been a genuine centre of excellence for sport and exercise medicine since its creation as a legacy of the 2012 London Olympic Games, and its commitment to the health of young athletes is exactly the kind of leadership this field needs. I was equally grateful for the audience — the questions, the engagement and the obvious dedication of so many practitioners to getting this right. Youth sport sits at the intersection of health, development and performance, and it is genuinely encouraging to see so much interest in protecting the young people at the centre of it.

Bringing athletics data to life: a new (experimental) project

One recurring frustration in this area is that the data needed to put a young athlete’s results into context — how the best in the world actually developed over time — is hard to access and harder to compare against. So I have been building something to help coaches interested in Athletics.

I am sharing here an early look at the Athletics Performance Tracker, an AI-assisted project that brings athletics results databases to life and makes them accessible to everyone. The aim is to let coaches, athletes, parents and researchers explore how performance develops with age and compare a young athlete’s results against those of top-ranked athletes.

It currently covers around 30 World Athletics events with thousands of athletes and tens of thousands of performances, and includes:

  • Development curves showing how the world’s best progress from age 12 to 40, with mean values and confidence intervals across the top 10, 20, 50 or 100 athletes.
  • Year-on-year analysis of absolute and percentage improvements for individual athletes or whole cohorts.
  • World top lists for every event, with comparisons to the previous year.

You can explore it here: athletics-tracker.fly.dev.

An important caveat: this is very much an experimental project and still under active development. It is intended for research and exploration, it is not affiliated with World Athletics, and features and data will continue to change. I would welcome feedback as it evolves.

In summary

Early success is not a requirement for success as an adult. Assessing maturity status is key to interpreting training and test results. Training should be progressive, varied and developmentally appropriate, and we need to understand injury patterns — and load — far better than we currently do, with more work needed on young female athletes in particular. Above all, health and longevity in sport should always come before performance at a young age. If we get that balance right, we give more young athletes the chance to fulfil their potential — and to stay healthy doing it.

Pointing the AI toolkit at the Giro d’Italia 2026

In my last post I shared my first proper experiment with AI tools — scraping publicly available muscle-injury data from the major football leagues and pulling it into a dashboard that I could update and share with almost no manual work. I promised I would do more, and it did not take me long to find an excuse. With the Giro d’Italia in full swing, I could not resist pointing the same toolkit at a race I have been following obsessively for weeks.

So this time I used ClaudeClaude Cowork and GitHub to build a live dashboard for the 2026 Giro d’Italia. It scrapes publicly available race data, pulls it together, and refreshes itself — and you can explore it directly at the bottom of this post or on its own GitHub page.

How the GIRO finished

And what a Giro it turned out to be. Jonas Vingegaard (Visma–Lease a Bike) rode with a control that bordered on the imperious, lighting up the brutal Piancavallo stage to put the result beyond any doubt and then rolling into Rome in the maglia rosa. For a rider who has already won the Tour de France and the Vuelta, this maiden Giro completes the set of all three Grand Tours — a milestone worth pausing on, whatever you make of the strength of the opposition this May.

Behind him, Felix Gall (Decathlon CMA CGM) took second, a little over five minutes back, with Jai Hindley (Red Bull–Bora–hansgrohe) completing a fine return to the podium in third. Thymen Arensman (Netcompany–Ineos) ended up just off the box in fourth.

The minor jerseys produced their own subplots. Paul Magnier (Soudal Quick-Step) was the sprinter of the race and took the cyclamen points jersey, while Giulio Ciccone (Lidl-Trek) went hunting for mountain points with real appetite and claimed the blue. The young riders’ classification went right down to the wire between Afonso Eulálio (Bahrain Victorious) and Davide Piganzoli — I’ll let the dashboard below tell you who held the white jersey in the end, and who took the bunch sprint on the streets of Rome. That, really, is the whole point of the exercise: the numbers update themselves, so I don’t have to.

How I built the data source and dashboard

The workflow was remarkably similar to the football injuries project, which is exactly what I find so interesting about these tools — once you understand the pattern, you can reuse it for almost anything.

I asked Claude and Claude Cowork to gather the publicly available race data — general classification, stage results, the points and mountains battles, rider profiles — and to organise it into something I could actually look at rather than squint at across a dozen browser tabs. The agents then built the dashboard itself, and I hosted the whole thing on GitHub using GitHub Pages, which is free and gives me a clean public link. Because the page lives on GitHub and reads the underlying data, I can refresh it whenever I like, or automate the update entirely, and then simply embed it back here on the blog with a single line of code.

The dashboard is organised into a few tabs: an overview, the stages, the evolution of the GC over the three weeks, the points battle, the individual rider profiles, and — for the data nerds among us — a set of estimated power figures.

All of this took me a fraction of the time it would have done even a year ago, and with effectively no programming on my part. That is genuinely new, and worth pausing on.

The usual health warning

As I wrote last time, I will keep being honest about the limitations. The data here are scraped from publicly available sources, so their veracity and accuracy are only ever as good as the source — and in cycling, numbers move and get corrected constantly. The estimated power values deserve a particularly large pinch of salt: these are modelled figures derived from public information, not measurements from a calibrated power meter, and anyone who has worked in performance physiology knows how much can hide behind a single wattage number. Treat them as a bit of fun and a conversation starter, not as evidence.

With those caveats firmly in place, I have mostly used this as another chance to learn what these tools can and cannot do — how to gather, share and visualise data quickly, and where the human still very much needs to stay in the loop. Critical thinking, as ever, is key.

Have a play

Here is the dashboard, embedded live. It updates itself, so it should already be showing the final classifications from Rome.

Update

I updated some views and with each individual rider now you can see the summary of their participation and what they did in each stage.