Server-Side

Web Services

HTTP Protocol
More HTTP: Other Features
HTTP Services

Version

v1.3	07 October 2021
v1.2	05 October 2021
v1.1	30 September 2021
v1.0	28 September 2021

Acknowledgments

Thanks to:

Hamzeh Roumani, who has shaped EECS-4413 into a leading hands-on CS course at EECS and who generously shared all of his course materials and, more importantly, his teaching philosophy with me;
Parke Godfrey, my long-suffering Master’s supervisor and mentor; and
Suprakash Datta for giving me this opportunity to teach this course.

Printable version of the talk

Download PDF

HTTP

The Protocol

A human protocol: Hi... Hello... How much is this?... $50... I'll give you $40.

A network protocol: TCP Connection Request. TCP Connection Reply. Get https://www.eecs.yorku.ca/. Returns the contents of the File.

What is a protocol?

A protocol defines format & order of messages sent and received among network entities, and actions taken on message transmission, receipt.

The Protocol Stack

Hypertext Transfer Protocol (HTTP)

The Hypertext Transfer Protocol (HTTP) is an application layer protocol for distributed, collaborative, hypermedia information systems, used primarily with the WWW (World Wide Web) in the client-server model where a web browser is a client communicating with the webserver which is hosting the website. Since 1990, this has become the foundation for data communication. HTTP is a standard and stateless protocol that is used for different purposes as well using extensions for request methods, error codes, as well as headers.

Created by Tim Berners-Lee at CERN (1991)
Standardized and much expanded by the IETF.
HTTP/1.0 (RFC 1945), HTTP/1.1 (RFC 2068), HTTP/2 (RFC 7540)
Other related protocols: HTTPS and WebSocket.
Rides on top of the TCP protocol, standard on port 80
TCP provides: reliable, bi-directional, in-order byte stream
Goal: transfer objects between systems
Do not confuse with other WWW concepts:
- HTTP is not page layout language (that is HTML)
- HTTP is not object naming scheme (that is URLs)

HTTP In Operation

HTTP Request

HTTP Request Methods

GET

Used to request a particular resource data from the Web server by specifying the parameters as a query string (name and value pairs) in the URL part of the request.

Are bookmark-able as they appear in the URL.
Cache-able.
Saved in the browser history if it is executed using a web browser.
Character length restrictions (2048 chars maximum).
Cannot be used to send binary data.
Data can only be retrieved and have no other effect.
When communicating sensitive data such as login credentials, should not be used.
A safe and ideal method to request data only.

HEAD

Almost identical to the GET method, but the only difference is that it will not return any response body. The HEAD request becomes useful for testing whether the GET request will actually respond before making the actual GET request.

DELETE

Used to delete any specific resource.

TRACE

Used for performing a message loop-back, which tests the path for the target resource. It is useful for debugging purposes.

CONNECT

Used for establishing a tunnel to the server recognized by a given URI.

POST

Used to send data to the Web server in the request body of HTTP.

Not be bookmark-able as they do not appear in the URL.
Not cached.
Not saved as history by the web browsers.
No restriction on the amount of data to be sent.
Can be used to send ASCII as well as binary data.
Use when communicating sensitive data, such as when submitting an HTML form.
Security depends on the HTTP protocol.
By using secure HTTP (HTTPS), information is protected.

PUT

Requests that the target resource creates or updates its state with the state defined by the representation enclosed in the request. Used to update existing resources with uploaded content or to create a new resource if the target resource is not found. The difference between POST and PUT is that PUT requests are static, which means calling the same PUT method multiple times will not yield a different result.

PATCH

Requests that the target resource modifies its state according to the partial update defined in the representation enclosed in the request.

OPTIONS

Used for describing the communication preferences for any target resource.

Sources: Introduction to HTTP (w3schools.in).

Universal Resource Identifiers (URI)

URIs are also known as WWW addresses, Uniform Resource Name (URN), and the Uniform Resource Locator (URL). These are formatted, case-insensitive strings that identify a web service, a resource, a website, etc.

URI = "http :" "//" host[ ":" port ][ abs_path ["?" query]]

A standard way to send many name/value pairs in a single string (QUERY_STRING or Form data)
Specified in RFC 2396 ‘Uniform Resource Identifiers (URI): Generic Syntax’.

Rules of URL-Encoding

All submitted URLs, query strings or form data should be concatenated into single strings of ampersand (&) separated name=value pairs, one pair for each form tag or query parameter. Like this:
```
form_tag_name_1=value_1&form_tag_name_2=value_2&...
```
Spaces in a name or value are replaced by a plus (+) sign or “%20” (a percent sign followed by 20). This is because url’s cannot have spaces in them and under METHOD=GET, the form data is supplied in the query string in the url.
Other characters (ie, =, &, +) are replaced by a percent sign (%) followed by the two-digit hexadecimal equivalent of the punctuation character in the Ascii character set.
Otherwise, it would be hard to distinguish these characters inside a query or form variable from those between the variables in the first rule above.

HTTP Response

HTTP Response Status Codes

1XX	INFORMATIONAL
100	HTTP CONTINUE
101	SWITCHING PROTOCOLS

2XX	SUCCESS
200	OK
201	CREATED
202	ACCEPTED
203	NON AUTHORITATIVE INFORMATION
204	NO CONTENT
205	RESET CONTENT
206	PARTIAL CONTENT

4XX	CLIENT ERROR
400	BAD REQUEST
401	UNAUTHORIZED
402	PAYMENT REQUIRED
403	FORBIDDEN
404	NOT FOUND
405	METHOD NOT ALLOWED
406	NOT ACCEPTABLE
407	PROXY AUTHENTICATION REQUIRED
408	REQUEST TIME OUT
409	CONFLICT
410	GONE
411	LENGTH REQUIRED
412	PRECONDITION FAILED
413	REQUEST ENTITY TOO LARGE
414	REQUEST URI TOO LARGE
415	UNSUPPORTED MEDIA TYPE

3XX	REDIRECTION
300	MULTIPLE CHOICES
301	MOVED PERMANENTLY
302	MOVED TEMPORARILY
303	SEE OTHER
304	NOT MODIFIED
305	USE PROXY

5XX	SERVER ERROR
500	INTERNAL SERVER ERROR
501	NOT IMPLEMENTED
502	BAD GATEWAY
503	SERVICE UNAVAILABLE
504	GATEWAY TIME OUT
505	HTTP VERSION NOT SUPPORTED

Try out HTTP (client side) for yourself

Telnet to your favorite Web server, e.g.:

$ telnet www.eecs.yorku.ca 80

Open TCP connection to port 80 (default http server port) at www.eecs.yorku.ca. Anything typed is sent to port 80 at www.eecs.yorku.ca.

Type in a GET HTTP request:

$ GET /course_archive/2021-22/F/4413/index.html HTTP/1.1

Type this at the prompt (followed by two carriage returns) to send this minimal GET request to the HTTP server.

Look at response message sent by the HTTP server!

More HTTP

Other Features

Conditional GET
Redirection
Basic Authentication
Open Authentication (OAuth)

Persistence & Cookies
Keep-Alive (with HTTP/1.1)
Web Caching

Conditional Get

If-modified-since request header

The client tells the server it has data and asks the server whether it has a fresher version or the client is up to date.

Goal: Don’t send object if the client has an up-to-date copy (cached).
Client: Specify the date of cached copy in the HTTP request: If-modified-since: <date>.
Server: Response contains no object if the cached copy is up to date: HTTP/1.1 304 Not Modified.

Request and response between client and server when an object not modified and request header contains "If-modified-since", server responses "HTTP/1.1 304 Not Modified"; then when the object is modified, the server response is "HTTP/1.1 200 OK" followed by the data.

Session Management

Cookies

Problem: HTTP is stateless

Server does not maintain status information across client requests
No way to correlate multiple request from same user

Small amount of information (typically server-generated user id)
Sent by client with each request
Updated by server with response

Client-Server Interaction: Cookies

Server sends Cookie to client in response message: Set-Cookie: 1678453.
Client presents Cookie in later requests: Cookie: 1678453.
Server matches presented-cookie with server-stored info:
- Authentication
- Remembering user preferences, previous choices

Request and response between client and server using a cookie. The server sets a cookie on the client. Whenever the client returns and makes a request to the server, it sends the cookie back.

Session Management

Sessions

HTTP cannot maintain state (RESTful) but we can:

Client-Side

State maintained by client and sent as needed to server.

Network Side

State shuffled back and forth with every request/response
Typically through hidden fields, URL Rewriting, or Cookies

Server Side

Server keeps it in memory or a database with a key derived from the client’s credentials (known though authentication or assigned).
The key (cookie) is stored in an HTTP header
(network side).

Session Management

Cookies vs Sessions

Cookies

A cookie is a small file with the maximum size of 4KB that the web server stores on the client computer. Once a cookie has been set, all page requests that follow return the cookie name and value. A cookie can only be read from the domain that it has been issued from.

Sessions

A session is a global variable stored on the server. Each session is assigned a unique id which is used to retrieve stored values. Whenever a session is created, a cookie containing the unique session id is stored on the user’s computer and returned with every request to the server. If the client browser does not support cookies, the unique session id is displayed in the URL. Sessions have the capacity to store relatively large data compared to cookies.

Key Differences

Cookies	Sessions
Cookies are client-side files that contain user information	Sessions are server-side files which contain user information. More secure and tamper-proof.
Cookies are not secure, as data is stored in a text file on the client. If an unauthorized user gets access to a client’s browser, they can tamper with the data.	Sessions are more secured compared to cookies, as they save data in encrypted form and are stored on the webserver.
Cookie ends depending on the lifetime you set for it	A session ends when a user closes their browser.
A cookie is not dependent on Session.	A session is dependent on Cookie.
You don’t need to start cookie as it is stored on the client’s machine	You need to start the session.
There is no way to delete or unset a cookie as it resides on the client.	You can destroy (invalidate) a session from the server.
The official maximum cookie size is 4KB	Within a session you can store as much data as you like. The only limits you can reach is the maximum memory consumable at one time.

Redirection

In HTTP, redirection is triggered by a server sending a special redirect response to a request. Redirect responses have status codes that start with 3, and a Location header holding the URL to redirect to.

When browsers receive a redirect, they immediately load the new URL provided in the Location header. Besides the small performance hit of an additional round-trip, users rarely notice the redirection.

Request and response between client and server when resource is permanently moved, server responses "HTTP/1.1 301 Moved Permanently" with the "Location" header; then the client requests the resource at the new "Location" and the server response is "HTTP/1.1 200 OK" followed by the data.

Keep-Alive - Persistent Connections

HTTP 1.0 Problem:

Each request opens new connection
- Starting up is slow
- Takes several packets
Short transfers are hard on TCP
- Stuck in “slow start” phase of TCP connection
- Loss recovery is poor when windows are small
Lots of extra connections
- Increases server state/processing

HTTP 1.1 Solution:

Keeps connection open for a time after server response so that multiple requests can ride on single connection
Reduced connection setup overhead.

Non-Persistent

HTTP/1.0
Server parses request, responds, and closes TCP connection
2 RTTs to fetch each object
Each object transfer suffers from slow start
Most 1.0 browsers used parallel TCP connections.

Persistent

Default for HTTP/1.1
On same TCP:
- connection: server,
- parses request, responds, parses new request, …
Client sends requests for all referenced objects as soon as it receives base HTML.
Fewer RTTs and less slow start.
Prefetching

Basic Authentication

When challenged, the client sends their user ID and password in the clear to the server.
Not secure enough (snooping is easy) but useful for simple things

Client-Server Interaction: Authentication

Authentication goal: Control access to server documents
Stateless: client must present authorization in each request
Authorization: typically name, password
- Sends Authorization header line in request
- If no authorization presented, server refuses access, sends WWW-Authenticate header line in response.
Browser caches name and password so that user does not have to repeatedly enter it.

Open Authentication (OAuth)

Web Caches (Proxy Servers)

Web Caching

Improve performance
- Scalability
- Response time
- Load balancing
- Availability
- Saves network and server resources
Proxy cache
- Done at the client-side

Goal

Fill client request without going to origin server.

User sets browser: Web accesses via web cache
Client sends all HTTP requests to web cache
- If object at web cache, web cache immediately returns object in HTTP response
- Else requests object from origin server, then returns HTTP response to client

Two client computers connecting to two origin web servers via a proxy server.

HTTP

Services

HTTP Client

Web browser = TCP client + HTTP + HTML/CSS/JS + DOM

Diagram of webserver connected to the client's web browser.
The webserver is the intermediary between the client and static
files on the Network file system (NFS) or App Servers
or CGI scripts

HTTP Server

Web server = TCP Server + Port 80 + HTTP.

Built-in static file serving
Built-in scalability
Built-in security (HTTPS + auth) via .htaccess
Built-in telemetry (logs and error logs)
Extensibility: PHP (can violate view migration); CGI (good & language agnostic); App Servers (best): Tomcat JSP, WebSphere, WebLogic, NodeJS, ASP.NET, …).

The client makes a request by specifying a URL and additional info.
The webserver:
1. Receives the request. (in the URL)
2. Identifies the request as a CGI request.
3. Locates the program corresponding to the request.
4. Starts up the handling program (heavy weight process creation!!)
5. Feeds request parameters to the handler (through stdin or environment variables).
The Handler:
1. Executes with the given request parameters
2. The Output of the handler is sent via stdout back to the webserver for rerouting back to the requesting web browser.
3. Output is typically a web page. Could be plaintext, JSON, or XML.
4. Terminates.

Common Gateway Interface (CGI)

The Common Gateway Interface (CGI) is a simple interface for running external programs, software or gateways under an information server in a platform-independent manner.

Diagram of CGI workflow. The client submits a complete form, and
the webserver passes the request and data to a CGI script, which
processes it and replies with an output response which is forwarded back
to the client by the webserver.

More on CGI

CGI defines the abstract parameters, known as metavariables, which describe the client’s request. Together with a concrete programmer interface this specifies a platform-independent interface between the script and the HTTP server.

Metavariables

AUTH_TYPE	REMOTE_IDENT	QUERY_STRING
CONTENT_LENGTH	REMOTE_USER	REMOTE_ADDR
CONTENT_TYPE	REQUEST_METHOD	REMOTE_HOST
GATEWAY_INTERFACE	SCRIPT_NAME	SERVER_PROTOCOL
PATH_INFO	SERVER_NAME	SERVER_SOFTWARE
PATH_TRANSLATED	SERVER_PORT

Example

Item	Value
GATEWAY_INTERFACE	CGI/1.1
SERVER_PROTOCOL	HTTP/1.1
SERVER_SOFTWARE	Apache/2.4.48 (Unix) PHP/7.4.20 OpenSSL/1.0.2k mod_wsgi/4.7.1 Python/3.8
SERVER_NAME	www.eecs.yorku.ca
SERVER_PORT	443
REQUEST_METHOD	GET
SCRIPT_NAME	/~vwchu/test.cgi
QUERY_STRING	key=value&a=1234
REMOTE_ADDR	130.63.96.170
REMOTE_PORT	39256
REMOTE_USER	vwchu

The server code and the application code are all mixed together in the run method.

First, move the application code out of the run method into a separate doRequest method. Note that the doRequest method requires a lot of arguments, too many.

So, we will group them into Request and Response classes.

Take the doRequest method entirely out of the HTTPServer class into a subclass, MainService.

Put the Request class and all of the methods in HTTPServer class related to handling a request out of the HTTPServer class into a new class called RequestContext. Do the same for the Response class and all of the methods in HTTPServer class related to forming a response, resulting in the ResponseContext class.

Split the MainService class into each of the different services. One service per class. Each class has itself own doRequest method.

That leaves our MainService class with the barest of run methods and a map that registers the available service handlers to their respective resource paths. Later, we will see how Tomcat does it via annotations.

Aside

Refactoring our Code

Goals

Code maintainability
Modularity
Encapsulation & abstraction
Separation of concerns
Handling complexity
We shouldn’t have to rebuild the machinery that runs the webserver every time we want to create a web service!

If we continue refactoring, we might end up with something like this…

Despite all this added complexity, it’s okay. Because to us, web app developers, the web server’s code is a black-box that we access via public APIs.

Complex UML diagram of the refactored webserver and webapp.

This slide is intentionally left blank.

Return to Course Page or Part II.

Server-Side

Web Services

Version

Acknowledgments

Thanks to:

Printable version of the talk

HTTP

The Protocol

What is a protocol?

The Protocol Stack

Hypertext Transfer Protocol (HTTP)

HTTP In Operation

HTTP Request

HTTP Request Methods

Universal Resource Identifiers (URI)

Rules of URL-Encoding

HTTP Response

HTTP Response Status Codes

Try out HTTP (client side) for yourself

More HTTP

Other Features

Conditional Get

If-modified-since request header

Session Management

Cookies

Problem: HTTP is stateless

Solution: Store a Cookie on the client-side

Client-Server Interaction: Cookies

Session Management

Sessions

Client-Side

Network Side

Server Side

Session Management

Cookies vs Sessions

Cookies

Sessions

Key Differences

Redirection

Keep-Alive - Persistent Connections

HTTP 1.0 Problem:

HTTP 1.1 Solution:

Non-Persistent vs Persistent Connections

Non-Persistent

Persistent

Basic Authentication

Client-Server Interaction: Authentication

Open Authentication (OAuth)

Web Caches (Proxy Servers)

Web Caching

Goal

HTTP

Services

HTTP Client

HTTP Server

Common Gateway Interface (CGI)

More on CGI

Metavariables

Example

Aside

Refactoring our Code

Goals

If we continue refactoring…