Basic Web Scraping with Emacs

Web scraping is the extraction of data from web pages. But most web pages aren’t designed to accomodate automated data extraction; instead, they’re designed to be easily read by humans, with colors and fonts and pictures and all sorts of junk. This makes web scraping tricky. There are two predominant techniques for web scraping: HTML parsing and browser automation.

Before going on, I must confess a shameful secret: I don’t understand HTML very well. It’s just too ugly to get me interested. Every so often I’ll try to sit down and read about HTML, and I usually get bored and quit right around the time they get to unordered lists (). Why couldn’t they just use S-expressions? Do the brackets and explicit close tags actually add anything? Whatever, it doesn’t matter. The bottom line is that I hate dealing with HTML and I’d prefer to avoid it if I possibly can.

So


Original URL: http://feedproxy.google.com/~r/feedsapi/BwPx/~3/wb2JL12Hqn4/web-scraping.html

Original article

Conf-Cal: A Human-Readable Conference Calendar Format and Library

a human-readable conference calender format and library

When you have a conference, it can be a pain in the butt to keep
the schedule and generate html out of it. This little library
allows with little effort to keep all the important information in
one place in a human readable form.
It makes sure that all the important data is there:
Name
Day
Location
Google Place ID of the location! (to be used from a map)
Timezone (taken from the geo location using geo-tz)
Rooms
Event Title/Description/Presenter
Automatic Breaks Calculation (times between the slots are automatically breaks)
Automatic Slot Calculation (just enter the times and it can figure out the slots)
Automatically generates IDs for each entry that can be overridden
(to preserve deep links even when data changes)
You can process this format with a very lightweight Node JS library.
Here is an example calendar:
Mighty Superhero Gathering
on 2019/01/01
at Top of the World#ChIJvZ69FaJU6DkRsrqrBvjcdgU

[Main Room]
10:00-10:20 Opening
10:20-11:00 Doing the right thing by Super Man #keynote

Super Man will talk


Original URL: http://feedproxy.google.com/~r/feedsapi/BwPx/~3/dht_Wr_PttE/conf-cal

Original article

I made _api which is an autogenerated CRUD API built on LowDB and ExpressJS

Table of Contents
Introduction
_api is an autogenerated CRUD API built on LowDB and ExpressJS. All you need to do is edit a configuration file and you will have a basic CRUD API ready to use!
Foreword
_api came about due to sheer curiosity. It’s important to understand however, _api was not built for large-scale applications, it’s underlying database layer LowDB is built on pure JSON objects for storage and Lodash for parsing. This doesn’t mean that _api isn’t worth your time, the biggest positive that I see with _api is the sheer simplicity of both the concept and the code written thereby simplifying development and maintenance.
Features
Automatic CRUD API generation through the editing of config files (theres only 3 files).
Comprehensive CRUD operations.
Find endpoint supports data operators such as — filter, sort, and slice.
Update and remove endpoints support find by id as well as filter.
Update and remove endpoints support modifying multiple documents through the


Original URL: http://feedproxy.google.com/~r/feedsapi/BwPx/~3/ESAPZjYmuGI/_api

Original article

Proudly powered by WordPress | Theme: Baskerville 2 by Anders Noren.

Up ↑

%d bloggers like this: