Skip to content
Permalink
7b4e193ca6
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Go to file
 
 
Cannot retrieve contributors at this time
237 lines (169 sloc) 5.32 KB
# Asynchronous JavaScript
Outcomes
- Understand and use command line options.
- Understand and use callbacks to produce asynchronous code.
- Understand the JSON data format and know how to convert between it and JavaScript objects.
- Understand Screen Scraping
## IO is Expensive
Waiting for IO to complete is big waste of resources
Three solutions:
synchronous
processes Apache
threads Node
## NodeJS Threading Model
NodeJS runs in a single thread
JavaScript supports lambda / callbacks
Callbacks run in their own threads
After callback thread is destroyed
Using Request.
Main methods correspond to HTTP verbs:
```
request.get(url, callback)
request.put(url, data, callback)
request.post(url, data, callback)
request.del(url, callback)
```
Be careful, because callbacks are asynchronous
## Callbacks
A callback (higher-order) function
Passed around like a variable
a function that is passed to another function as a parameter
the callback function is called (or executed) inside the other Function.
When we pass a callback function as an argument to another function, we are only passing the function definition.
The containing function has the callback function in its parameter as a function definition
The function is not executed in the parameter.
It can execute the callback anytime.
Callbacks are important!
NodeJS runs in a single threaded event loop.
If a long-running operation occurs, the process stops "blocks" until the event has finished.
To prevent blocking operations any long running activities are run in callbacks.
The callback is a function that should be run after the operation is complete.
While it is processing, control is passed back to the main event loop.
Simple GET request with callback:
```javascript
'use strict'
const request = require('request')
request.get( 'http://api.fixer.io/latest?symbols=GBP', (err, res, body) => {
if (err) {
console.log('could not complete request')
}
console.log(body)
})
```
## Data Exchange Formats
RESTful APIs send data across the Internet
Needs to be transmitted as text (ASCII/UniCode)
Needs to communicate both the data and its structure.
- Variables
- Objects
- Arrays
Common data exchange formats
- XML (Extensible Markup Language)
- JSON (JavaScript Object Notation)
- YAML (Yet Another Markup Language)
- CSV (Comma-Separated Values)
XML Example
```xml
<address>
<org>Coventry University</org>
<street>4 Gulson Road</street>
<city>Coventry</city>
<country>United Kingdom</country>
<postcode>CV1 5FB</postcode>
</address>
```
JSON Example
```json
address {
"org": "Coventry University",
"street": "4 Gulson Road",
"city": "Coventry",
"country": "United Kingdom",
"postcode": "CV1 5FB",
}
```
YAML Example
```yaml
address:
org: "Coventry University"
street: "4 Gulson Road"
city: "Coventry"
country: "United Kingdom"
postcode: "CV1 5FB"
```
CSV Example
```csv
"org", "street", "city", "country", "postcode"
"Coventry University", "4 Gulson Road", "Coventry", "United Kingdom", "CV1 5FB"
```
Why do we prefer the JSON format?
- Text-based
- Position independent
- Lightweight
- Interoperable with JavaScript Objects
Converting to and from JSON
```javascript
const jsObj = {
firstname: 'John',
lastname: "Doe"
}
const jsonStr = JSON.stringify(jsObj)
const jsonStr2 = JSON.stringify(jsObj, null, 2)
const newObj = JSON.parse(jsonStr)
```
## Screen Scraping
Sometimes called Data Scraping
Extracting data from a human-readable web page
Why use screen scraping?
Some data not available through an API
Usually a last resort
Sometimes companies scrape their own websites!
There are some challenges:
- Complex process
- Needs deconstructable URLs
- Success depends on the DOM not changing
- Most search results are paginated
Deconstructable URLs.
To access search results:
- Search term needs to be inserted into URL
To access resources:
- Product ID needs to be inserted into URL.
Here are some examples:
Amazon Book Search URL (javascript)
```
https://www.amazon.co.uk/s/ref=nb_sb_noss_2?url=search-alias%3Dstripbooks&field-keywords=javascript
https://www.amazon.co.uk/s/?url=search-alias%3Dstripbooks&field-keywords=javascript
```
Guardian Bookstore
```
http://bookshop.theguardian.com/catalogsearch/result/?q=javascript&order=relevance&dir=desc
http://bookshop.theguardian.com/catalogsearch/result/?q=javascript
```
BBC iPlayer search for history
```
http://www.bbc.co.uk/iplayer/search?q=history
```
Accessing resources.
Amazon Books
```
https://www.amazon.co.uk/JavaScript-Definitive-Guide-Guides/dp/0596805527/ref=sr_1_2?s=books&ie=UTF8&qid=1476384737&sr=1-2&keywords=javascript
https://www.amazon.co.uk/dp/0596805527
http://bookshop.theguardian.com/javascript-patterns.html
But the ISBN is 0596806752
http://www.bbc.co.uk/iplayer/episode/b019c88d/the-grammar-school-a-secret-history-episode-2
http://www.bbc.co.uk/iplayer/episode/b019c88d
```
Screen scraping techniques
- Browse to web page using Google Chrome
- Open the Developer tools (elements tab)
- Expand DOM structure and see what content it controls
- Uniquely identify the data
- Extract data using JQuery patterns
Module for screen scraper.
Process is messy
Needs updating when page structure changes
Need to isolate in its own module
Keep public interface simple
Public interface:
- Pass a search string and get a JavaScript array in return
- Pass a resource identifier and get a JavaScript resource back