Asynchronous JavaScript

Outcomes

Understand and use command line options.
Understand and use callbacks to produce asynchronous code.
Understand the JSON data format and know how to convert between it and JavaScript objects.
Understand Screen Scraping

IO is Expensive

Waiting for IO to complete is big waste of resources Three solutions: synchronous processes Apache threads Node

NodeJS Threading Model

NodeJS runs in a single thread JavaScript supports lambda / callbacks Callbacks run in their own threads After callback thread is destroyed

Using Request.

Main methods correspond to HTTP verbs:

request.get(url, callback)
request.put(url, data, callback)
request.post(url, data, callback)
request.del(url, callback)

Be careful, because callbacks are asynchronous

Callbacks

A callback (higher-order) function

Passed around like a variable

a function that is passed to another function as a parameter

the callback function is called (or executed) inside the other Function.

When we pass a callback function as an argument to another function, we are only passing the function definition.

The containing function has the callback function in its parameter as a function definition

The function is not executed in the parameter.

It can execute the callback anytime.

Callbacks are important!

NodeJS runs in a single threaded event loop.

If a long-running operation occurs, the process stops "blocks" until the event has finished.

To prevent blocking operations any long running activities are run in callbacks.

The callback is a function that should be run after the operation is complete.

While it is processing, control is passed back to the main event loop.

Simple GET request with callback:

'use strict'
const request = require('request')
request.get( 'http://api.fixer.io/latest?symbols=GBP', (err, res, body) => {
  if (err) {
  console.log('could not complete request')
  }
  console.log(body)
})

Data Exchange Formats

RESTful APIs send data across the Internet

Needs to be transmitted as text (ASCII/UniCode)

Needs to communicate both the data and its structure.

Variables
Objects
Arrays

Common data exchange formats

XML (Extensible Markup Language)
JSON (JavaScript Object Notation)
YAML (Yet Another Markup Language)
CSV (Comma-Separated Values)

XML Example

<address>
  <org>Coventry University</org>
  <street>4 Gulson Road</street>
  <city>Coventry</city>
  <country>United Kingdom</country>
  <postcode>CV1 5FB</postcode>
</address>

JSON Example

address {
  "org": "Coventry University",
  "street": "4 Gulson Road",
  "city": "Coventry",
  "country": "United Kingdom",
  "postcode": "CV1 5FB",
}

YAML Example

address:
  org: "Coventry University"
  street: "4 Gulson Road"
  city: "Coventry"
  country: "United Kingdom"
  postcode: "CV1 5FB"

CSV Example

"org", "street", "city", "country", "postcode" 
"Coventry University", "4 Gulson Road", "Coventry", "United Kingdom", "CV1 5FB"

Why do we prefer the JSON format?

Text-based
Position independent
Lightweight
Interoperable with JavaScript Objects

Converting to and from JSON

const jsObj = {
firstname: 'John',
lastname: "Doe"
}
const jsonStr = JSON.stringify(jsObj)
const jsonStr2 = JSON.stringify(jsObj, null, 2)
const newObj = JSON.parse(jsonStr)

Screen Scraping

Sometimes called Data Scraping

Extracting data from a human-readable web page

Why use screen scraping?

Some data not available through an API

Usually a last resort

Sometimes companies scrape their own websites!

There are some challenges:

Complex process
Needs deconstructable URLs
Success depends on the DOM not changing
Most search results are paginated

Deconstructable URLs.

To access search results:

Search term needs to be inserted into URL

To access resources:

Product ID needs to be inserted into URL.

Here are some examples:

Amazon Book Search URL (javascript)

https://www.amazon.co.uk/s/ref=nb_sb_noss_2?url=search-alias%3Dstripbooks&field-keywords=javascript
https://www.amazon.co.uk/s/?url=search-alias%3Dstripbooks&field-keywords=javascript

Guardian Bookstore

http://bookshop.theguardian.com/catalogsearch/result/?q=javascript&order=relevance&dir=desc
http://bookshop.theguardian.com/catalogsearch/result/?q=javascript

BBC iPlayer search for history

http://www.bbc.co.uk/iplayer/search?q=history

Accessing resources.

Amazon Books

https://www.amazon.co.uk/JavaScript-Definitive-Guide-Guides/dp/0596805527/ref=sr_1_2?s=books&ie=UTF8&qid=1476384737&sr=1-2&keywords=javascript
https://www.amazon.co.uk/dp/0596805527

http://bookshop.theguardian.com/javascript-patterns.html
But the ISBN is 0596806752

http://www.bbc.co.uk/iplayer/episode/b019c88d/the-grammar-school-a-secret-history-episode-2
http://www.bbc.co.uk/iplayer/episode/b019c88d

Screen scraping techniques

Browse to web page using Google Chrome
Open the Developer tools (elements tab)
Expand DOM structure and see what content it controls
Uniquely identify the data
Extract data using JQuery patterns

Module for screen scraper.

Process is messy

Needs updating when page structure changes

Need to isolate in its own module

Keep public interface simple

Public interface:

Pass a search string and get a JavaScript array in return
Pass a resource identifier and get a JavaScript resource back