Web Crawling with Node.js, it’s an interesting world

Today’s some fun time! I’ll try to scrape a website, I wanted something simple but unique, so I chose to scrape Google search results(Oh the irony!)
I am not at all a Javascript expert, but picking up NodeJS seems to be really much more fun than the days of doing Python based scraping(yes I am old!). The obvious reason is JS allows much more convenient DOM parsing, and if you use one of the gazillion JS based frameworks, you are gonna get it very fast.
Let’s dive into an example straight using Osmosis(https://github.com/rchipka/node-osmosis), which I started with, which would be a no-brainer library to start with for anyone. We take a very simple example which fires up the Google search URL and then extracts some information about the result.
Examining Google Search DOM
This is how HTML for a google search result looks, I have cleaned out a lot of things from what actually

Original URL: http://feedproxy.google.com/~r/feedsapi/BwPx/~3/VlBhvRp7q-4/

Original article

Proudly powered by WordPress | Theme: Baskerville 2 by Anders Noren.

Up ↑

%d bloggers like this: