Positive: Bike Lights Playful Everywhere Greggs vs. Pret Guardian comment generator Consult less, do more! More things for Leeds! Cartoons PubQuest: Birmingham

Politics: Industrial Strategy. Counting households. 1. Counting households. 2. Leeds Growth Strategy 1. Imagination not needed. Part 1. Imagination not needed. Part 2. Imagination not needed. Part 3. Calderdale Digital Strategy The Value of Time Inclusive growth. NIMBYs cause the housing crisis Innovation on buses. Fifa and the right Ward explorer Income by MSOA Heathrow and localism In defence of the € The BBC in Manchester What works (growth) Maths of inequality GDP mystery Liberal protectionists 5 types of EU voter Why Birmingham fails Who is London? Researching research Heathrow Car free Birmingham North-South divide: we never tried Imitating Manchester Asylum responsibilities The NorthernPowerhouse Centralism and Santa Claus STEM vs STEAM Replacing UK steel The State of the North, 2015 Adonis is wrong on housing The Economist & Scotland The Economist & The North The future of University BBC Bias? Yorkshire backwards London makes us poor Northern rail consultation What holds us back? Move the Lords! Saving the Union Summing it up

Tech: Tap to pay. Open Data in Birmingham Defending Uber BusTracker Building a TechNation How the UK holds back TechNorth GDS is Windows 8 OpenData at the BBC SimFlood SimSponge See me speak Train time map Digital Health Leeds Empties Leeds Site Allocations Building a Chrome extension I hate webkit Visualising mental health Microsoft's 5 easy wins Epson px700w reset Stay inside the Bubble

Old or incomplete: Orange price rises Cherish our Capital 1975 WYMetro Plan Dealing with NIMBYs Sponsoring the tube Gender bias calculator MetNetMaker Malaria PhD Symbian Loops Zwack Kegg Project The EU Eduroam & Windows 8 Where is science vital? The Vomcano 10 things London can shove Holbeck Waterwheel

Building a really simple page-scraping Chrome extension.

and understanding how it works.

Want to parse the content of a website? More comfortable coding in javascript and displaying your results in HTML than you are using Scrapy at a Python command prompt? A google Chrome extension might be perfect for you.

Sadly, the best guide to building a simple but functional page-scraping Chrome extension is quite complicated. So I’ve learned from it and written a much simpler Hello World Chrome extension for page scraping.

Download the source code and the packed extension, and have a look, it's less than 40 lines of code. If you need help installing it follow Google's instructions. For the important part of understanding how it works, I've drawn some pictures.

Get content from a page.

My example will get content from the currently loaded page and display it in the Chrome extension's popup. Here the active tab is on Nokia's homepage and that title is displayed in my extension's popup.

Bundle an extension.

There are five important parts to the extension. The logo, the popup page's html file, the popup page's javascript file, and the manifest.json file which tells Chrome how to bundle these files together into an extension.

Inject the payload.

The fifth important part of the extension solves the cross-site scripting problem. An extension is effectively a little website, and for sensible security reasons scripts from one website can't easily access the content on another website. popup.js can access the content on popup.html and change it, but it's blocked from accessing the content of the currently loaded web page unless that page specifically allows it, which it almost never will.

Chrome has access to both pages and you can tell it to inject and run the payload.js script in the current webpage. Once injected the payload.js script can access and change the content of the currently active tab and send messages back to the popup.js script using the chrome runtime messaging service. Since we've set popup.js as a persistent background script in the extension manifest it will keep listening for messages from popup.js until Chrome closes.

Add more features.

If it all works properly, your extension should display the current tab's title. Once you've seen how it works you can extend this Hello World extension however you like. The payload.js script can do anything it likes with the current web page, including navigating somewhere else, or clicking a link. The chrome runtime messaging service supports JSON objects so you can easily pass formatted data between your extension and the current page.

Thanks for reading, and in case you missed the first download link,

Download the sourcecode and the packed extension.

comments powered by Disqus