Most popular ▴ See a list of all my posts! The experts got it wrong. Why are there no great Windows 10 apps? How moving the Capital helps Hartlepool. Gender bias calculator The Centre of the UK Defending Uber BusTracker Imagination not needed. Part 1. Imagination not needed. Part 2. Imagination not needed. Part 3. Why Birmingham fails Who is London? Innovation on buses. Heathrow

PDFs and Data ▾ Global open data and PDFs. Improving PDFs for Science. Improving PDFs for Planners. PDFAttacher. A Clearer Plan Hybrid PDFs PDF test-off. PDF Profiler Making PDFs play nicely with data

Housing ▾ Counting households. 1. Counting households. 2. The housing market works (where we let it) Hexmaps Adonis is wrong on housing Car free Birmingham

Regional Growth ▾ Measuring tech in the UK and France in 10 steps. Defending the Zombie graph. Channel 4 must move to Mancheseter Measuring innovation 1: meetups Measuring innovation 2: scientific papers. The UK city-size abnormality. Cities not cheese: why France is productive. How moving the Capital helps Hartlepool. Industrial Strategy. Leeds Growth Strategy 5: Limits. Leeds Growth Strategy 4: Focus. Leeds Growth Strategy 3: Inclusive growth. Leeds Growth Strategy 2: Where to grow? Leeds Growth Strategy 1: Why grow? Imagination not needed. Part 1. Imagination not needed. Part 2. Imagination not needed. Part 3. Inclusive growth. The BBC in Manchester 1 The BBC in Manchester 2 What works (growth) North-South divide: we never tried Imitating Manchester Why Birmingham fails Who is London? Researching research Replacing UK steel The Economist & The North The State of the North, 2015 Move the Lords! Calderdale Digital Strategy Maths of inequality Income by MSOA Heathrow and localism The NorthernPowerhouse Centralism and Santa Claus Yorkshire backwards London makes us poor

Transport ▾ Crossrail 2: Where trust in experts dies. Pacers: crap trains, worth keeping. A Yorkshire transport policy. Stop telling me to learn from London. Fixing it ourselves: bus data in the North. Open fare data will be hard. Transport is too complex! Investment is political London loses when it blocks Leeds' growth The Centre of the UK Defending Uber BusTracker Train time map What works (growth) The Value of Time Innovation on buses. Heathrow 1975 WYMetro Plan

Politics & Economics ▾ GDP measures are like toilets. The UK's private postcodes restrict innovation. Yorkshire could learn from Ireland's success. Alternatives to GDP are a waste of time. Fiscal balance in the UK "Not like London" Innovation takes time to measure Fifa and the right In defence of the € GDP mystery Liberal protectionists 5 types of EU voter Asylum responsibilities STEM vs STEAM The Economist & Scotland BBC Bias? Northern rail consultation What holds us back? Saving the Union Summing it up

Positive ▾ Bike Lights Playful Everywhere Greggs vs. Pret Guardian comment generator Consult less, do more! More things for Leeds! Cartoons PubQuest: Birmingham

Tech ▾ What's holding back opendata in the UK? Anti-trust law saved computing 1 Anti-trust law saved computing 2 Open Data Camp Cardiff Why are there no great Windows 10 apps? Tap to pay. Open Data in Birmingham Defending Uber BusTracker Train time map Building a TechNation How the UK holds back TechNorth GDS is Windows 8 OpenData at the BBC SimFlood SimSponge See me speak Digital Health Leeds Empties Leeds Site Allocations Building a Chrome extension I hate webkit Visualising mental health Microsoft's 5 easy wins Epson px700w reset Stay inside the Bubble

Old/incomplete ▾ Orange price rises The future of University Cherish our Capital Dealing with NIMBYs Sponsoring the tube Gender bias calculator MetNetMaker Malaria PhD Symbian Loops Zwack Kegg Project The EU Eduroam & Windows 8 Where is science vital? The Vomcano 10 things London can shove Holbeck Waterwheel

Last modified: 25 October 2017

Global open data

Recently I published a report into how PDFs could work better with data. The responses have been polarised.

Lots of people who rely on documents for their work have been positive. Lots of people who work hard and sacrifice a lot in the cause of open data were extremely angry. One very notable thing was how angry Americans were.

From Britain there was a fear that the report would be misinterpreted. From America, the response was more accusatory. In places it went too far; suggesting I was deliberately undermining the open data community and had sold my honour to Adobe.

I’ve watched enough Leeds United vs. Millwall games to deal with far worse accusations, but it did get me thinking. Why were Americans so much more upset than anyone else?

The world is big

The clue to the answer came from a fantastic comment by Mor Rubinstein at 360 Giving. I’d said in my report that the battle between tables and PDF was won, but she wondered where I’d looked.

The truth is that I work mostly with local open data in Leeds and Birmingham in the UK. I also work with UK national open data, and French national and local open data. I speak with people all around the world too, but I work in Europe.

It’s a decent spread of experience; two levels of government in two of the three security council members who are philosophically positive about open data. But my experience is certainly not the world.

So this morning I broadened my outlook. I visited national open data portals in Kenya, Morocco, Chile, Malaysia, The USA, The UK, and France, plus our local open data portal in Leeds. I counted the formats that data was being published in. You can see and add to the spreadsheet yourself, but here’s the summary in a picture.



I didn’t know.

The USA seems to be extremely unusual in the number of PDFs it publishes, and I didn’t know. Like the UK, it's been turning away from the world recently, and fewer and fewer people I work with visit. I haven't for years. I missed this, and I missed Japan.

My report is clear that releasing spreadsheets is much better than printing to PDFs. And for most of the world, I remain convinced that the battle between tables and PDFs is well on the way to being won.

But in the USA it clearly isn’t. I want my report to give countries where this battle is being won the tools that they need to extend open data’s reach into new areas; where documents and not tables remain king. Following the feedback from the USA I’ll make it clearer in my report that the priority there is still to stop printing spreadsheets as PDFs.


There are lots of caveats with this quick investigation. Just some are,

  1. I’ve done this work in a morning. Other organisations like Open Knowledge foundation and The Open Data Barometer have worked for years. I’d welcome their opinions.
  2. There are lots of other sources of open data than national open data portals. I’ve not worked with sub-national and sub-regional government anywhere except in the UK and France. I’ve not looked at FOIs.
  3. Quantity is not a great way to judge release systems. If the most important document is locked up in a PDF, it doesn’t matter if there are a hundred tables of less useful information.
  4. There are PDFs in the French datasets, they’re locked away in the zips. In the ten examples I looked at, all were visual representations of the included raw data. This wasn’t PDFs instead of data, it was PDFs in addition to data. I still think that can provide value.


blog comments powered by Disqus