Server-Side

Web Services


  • HTTP Protocol
  • More HTTP: Other Features
  • HTTP Services

Version


v1.3 07 October 2021
v1.2 05 October 2021
v1.1 30 September 2021
v1.0 28 September 2021

Acknowledgments


Thanks to:
  • Hamzeh Roumani, who has shaped EECS-4413 into a leading hands-on CS course at EECS and who generously shared all of his course materials and, more importantly, his teaching philosophy with me;
  • Parke Godfrey, my long-suffering Master’s supervisor and mentor; and
  • Suprakash Datta for giving me this opportunity to teach this course.

Download PDF

HTTP


The Protocol

A human protocol: Hi... Hello... How much is this?... $50... I'll give you $40.

A network protocol: TCP Connection Request. TCP Connection Reply. Get https://www.eecs.yorku.ca/. Returns the contents of the File.

What is a protocol?

A protocol defines format & order of messages sent and received among network entities, and actions taken on message transmission, receipt.

The Protocol Stack


The protocol stack of layers (from bottom to top): physical, datalink, network, transport and application.

The protocol stack of layers (from bottom to top): physical, datalink, network, transport and application.

Hypertext Transfer Protocol (HTTP)

The Hypertext Transfer Protocol (HTTP) is an application layer protocol for distributed, collaborative, hypermedia information systems, used primarily with the WWW (World Wide Web) in the client-server model where a web browser is a client communicating with the webserver which is hosting the website. Since 1990, this has become the foundation for data communication. HTTP is a standard and stateless protocol that is used for different purposes as well using extensions for request methods, error codes, as well as headers.

  • Created by Tim Berners-Lee at CERN (1991)
  • Standardized and much expanded by the IETF.
  • HTTP/1.0 (RFC 1945), HTTP/1.1 (RFC 2068), HTTP/2 (RFC 7540)
  • Other related protocols: HTTPS and WebSocket.
  • Rides on top of the TCP protocol, standard on port 80
  • TCP provides: reliable, bi-directional, in-order byte stream
  • Goal: transfer objects between systems
  • Do not confuse with other WWW concepts:
    • HTTP is not page layout language (that is HTML)
    • HTTP is not object naming scheme (that is URLs)

HTTP In Operation

HTTP Request

HTTP Request Methods

GET
Used to request a particular resource data from the Web server by specifying the parameters as a query string (name and value pairs) in the URL part of the request.
  • Are bookmark-able as they appear in the URL.
  • Cache-able.
  • Saved in the browser history if it is executed using a web browser.
  • Character length restrictions (2048 chars maximum).
  • Cannot be used to send binary data.
  • Data can only be retrieved and have no other effect.
  • When communicating sensitive data such as login credentials, should not be used.
  • A safe and ideal method to request data only.
HEAD
Almost identical to the GET method, but the only difference is that it will not return any response body. The HEAD request becomes useful for testing whether the GET request will actually respond before making the actual GET request.
DELETE
Used to delete any specific resource.
TRACE
Used for performing a message loop-back, which tests the path for the target resource. It is useful for debugging purposes.
CONNECT
Used for establishing a tunnel to the server recognized by a given URI.
POST
Used to send data to the Web server in the request body of HTTP.
  • Not be bookmark-able as they do not appear in the URL.
  • Not cached.
  • Not saved as history by the web browsers.
  • No restriction on the amount of data to be sent.
  • Can be used to send ASCII as well as binary data.
  • Use when communicating sensitive data, such as when submitting an HTML form.
  • Security depends on the HTTP protocol.
  • By using secure HTTP (HTTPS), information is protected.
PUT
Requests that the target resource creates or updates its state with the state defined by the representation enclosed in the request. Used to update existing resources with uploaded content or to create a new resource if the target resource is not found. The difference between POST and PUT is that PUT requests are static, which means calling the same PUT method multiple times will not yield a different result.
PATCH
Requests that the target resource modifies its state according to the partial update defined in the representation enclosed in the request.
OPTIONS
Used for describing the communication preferences for any target resource.

Universal Resource Identifiers (URI)

URIs are also known as WWW addresses, Uniform Resource Name (URN), and the Uniform Resource Locator (URL). These are formatted, case-insensitive strings that identify a web service, a resource, a website, etc.

URI = "http :" "//" host[ ":" port ][ abs_path ["?" query]]
  • A standard way to send many name/value pairs in a single string (QUERY_STRING or Form data)
  • Specified in RFC 2396 ‘Uniform Resource Identifiers (URI): Generic Syntax’.
Rules of URL-Encoding
  1. All submitted URLs, query strings or form data should be concatenated into single strings of ampersand (&) separated name=value pairs, one pair for each form tag or query parameter. Like this:
    form_tag_name_1=value_1&form_tag_name_2=value_2&...
    
  2. Spaces in a name or value are replaced by a plus (+) sign or “%20” (a percent sign followed by 20). This is because url’s cannot have spaces in them and under METHOD=GET, the form data is supplied in the query string in the url.
  3. Other characters (ie, =, &, +) are replaced by a percent sign (%) followed by the two-digit hexadecimal equivalent of the punctuation character in the Ascii character set.
  4. Otherwise, it would be hard to distinguish these characters inside a query or form variable from those between the variables in the first rule above.

HTTP Response

HTTP Response Status Codes

1XX INFORMATIONAL
100 HTTP CONTINUE
101 SWITCHING PROTOCOLS
2XX SUCCESS
200 OK
201 CREATED
202 ACCEPTED
203 NON AUTHORITATIVE INFORMATION
204 NO CONTENT
205 RESET CONTENT
206 PARTIAL CONTENT
4XX CLIENT ERROR
400 BAD REQUEST
401 UNAUTHORIZED
402 PAYMENT REQUIRED
403 FORBIDDEN
404 NOT FOUND
405 METHOD NOT ALLOWED
406 NOT ACCEPTABLE
407 PROXY AUTHENTICATION REQUIRED
408 REQUEST TIME OUT
409 CONFLICT
410 GONE
411 LENGTH REQUIRED
412 PRECONDITION FAILED
413 REQUEST ENTITY TOO LARGE
414 REQUEST URI TOO LARGE
415 UNSUPPORTED MEDIA TYPE
3XX REDIRECTION
300 MULTIPLE CHOICES
301 MOVED PERMANENTLY
302 MOVED TEMPORARILY
303 SEE OTHER
304 NOT MODIFIED
305 USE PROXY
5XX SERVER ERROR
500 INTERNAL SERVER ERROR
501 NOT IMPLEMENTED
502 BAD GATEWAY
503 SERVICE UNAVAILABLE
504 GATEWAY TIME OUT
505 HTTP VERSION NOT SUPPORTED

Try out HTTP (client side) for yourself


  1. Telnet to your favorite Web server, e.g.:
$ telnet www.eecs.yorku.ca 80

Open TCP connection to port 80 (default http server port) at www.eecs.yorku.ca. Anything typed is sent to port 80 at www.eecs.yorku.ca.


  1. Type in a GET HTTP request:
$ GET /course_archive/2021-22/F/4413/index.html HTTP/1.1

Type this at the prompt (followed by two carriage returns) to send this minimal GET request to the HTTP server.


  1. Look at response message sent by the HTTP server!

More HTTP


Other Features

  • Conditional GET
  • Redirection
  • Basic Authentication
  • Open Authentication (OAuth)
  • Persistence & Cookies
  • Keep-Alive (with HTTP/1.1)
  • Web Caching

Conditional Get

If-modified-since request header

The client tells the server it has data and asks the server whether it has a fresher version or the client is up to date.

  • Goal: Don’t send object if the client has an up-to-date copy (cached).
  • Client: Specify the date of cached copy in the HTTP request: If-modified-since: <date>.
  • Server: Response contains no object if the cached copy is up to date: HTTP/1.1 304 Not Modified.

Request and response between client and server when an object not modified and request header contains "If-modified-since", server responses "HTTP/1.1 304 Not Modified"; then when the object is modified, the server response is "HTTP/1.1 200 OK" followed by the data.

Session Management

Cookies

Problem: HTTP is stateless

  • Server does not maintain status information across client requests
  • No way to correlate multiple request from same user
  • Small amount of information (typically server-generated user id)
  • Sent by client with each request
  • Updated by server with response

Client-Server Interaction: Cookies

  • Server sends Cookie to client in response message: Set-Cookie: 1678453.
  • Client presents Cookie in later requests: Cookie: 1678453.
  • Server matches presented-cookie with server-stored info:
    • Authentication
    • Remembering user preferences, previous choices

Request and response between client and server using a cookie. The server sets a cookie on the client. Whenever the client returns and makes a request to the server, it sends the cookie back.

Session Management

Sessions

HTTP cannot maintain state (RESTful) but we can:

Client-Side

  • State maintained by client and sent as needed to server.

Network Side

  • State shuffled back and forth with every request/response
  • Typically through hidden fields, URL Rewriting, or Cookies

Server Side

  • Server keeps it in memory or a database with a key derived from the client’s credentials (known though authentication or assigned).
  • The key (cookie) is stored in an HTTP header
    (network side).

Request and response between client and server using a session ID. When a user logs in, the server creates a new session, stores the session ID on the server or in a database, and returns the session ID in a cookie to the client. When the client returns and makes a request to the server, it sends the cookie with the session ID back. The server checks if the session exists, matches a username, and is still valid before responding with username-specific content.

Session Management

Cookies vs Sessions

Cookies

A cookie is a small file with the maximum size of 4KB that the web server stores on the client computer. Once a cookie has been set, all page requests that follow return the cookie name and value. A cookie can only be read from the domain that it has been issued from.

Sessions

A session is a global variable stored on the server. Each session is assigned a unique id which is used to retrieve stored values. Whenever a session is created, a cookie containing the unique session id is stored on the user’s computer and returned with every request to the server. If the client browser does not support cookies, the unique session id is displayed in the URL. Sessions have the capacity to store relatively large data compared to cookies.

Key Differences

Cookies Sessions
Cookies are client-side files that contain user information Sessions are server-side files which contain user information. More secure and tamper-proof.
Cookies are not secure, as data is stored in a text file on the client. If an unauthorized user gets access to a client’s browser, they can tamper with the data. Sessions are more secured compared to cookies, as they save data in encrypted form and are stored on the webserver.
Cookie ends depending on the lifetime you set for it A session ends when a user closes their browser.
A cookie is not dependent on Session. A session is dependent on Cookie.
You don’t need to start cookie as it is stored on the client’s machine You need to start the session.
There is no way to delete or unset a cookie as it resides on the client. You can destroy (invalidate) a session from the server.
The official maximum cookie size is 4KB Within a session you can store as much data as you like. The only limits you can reach is the maximum memory consumable at one time.

Redirection

In HTTP, redirection is triggered by a server sending a special redirect response to a request. Redirect responses have status codes that start with 3, and a Location header holding the URL to redirect to.

When browsers receive a redirect, they immediately load the new URL provided in the Location header. Besides the small performance hit of an additional round-trip, users rarely notice the redirection.

Request and response between client and server when resource is permanently moved, server responses "HTTP/1.1 301 Moved Permanently" with the "Location" header; then the client requests the resource at the new "Location" and the server response is "HTTP/1.1 200 OK" followed by the data.

Keep-Alive - Persistent Connections

HTTP 1.0 Problem:

  • Each request opens new connection
    • Starting up is slow
    • Takes several packets
  • Short transfers are hard on TCP
    • Stuck in “slow start” phase of TCP connection
    • Loss recovery is poor when windows are small
  • Lots of extra connections
    • Increases server state/processing

In HTTP 1.0, each request requires an initial TCP request. The server accepts the TCP connection and responds with a TCP reply. The client can then send a HTTP request with the given URL. The server then responds with a '200 OK' status code and the contents of the requested file. When the client received the HTML document, it parses it and sends HTTP requests for each resources. Each request requires the client reestablishing a new TCP connection with the server. Each request requires 2 RTTs.

HTTP 1.1 Solution:

  • Keeps connection open for a time after server response so that multiple requests can ride on single connection
  • Reduced connection setup overhead.

In HTTP 1.1, each request requires an initial TCP request. The server accepts the TCP connection and responds with a TCP reply. The client can then send a HTTP request with the given URL. The server then responds with a '200 OK' status code and the contents of the requested file. When the client received the HTML document, it parses it and sends HTTP requests for each resources. Unlike HTTP 1.0, the client does not need to reestablish a new TCP connection with the server. Each request requires only 1 RTT, except the first request which requires 2 RTTs. The TCP connection persists for a certain amount of time, specified by a timeout set by the server.

Non-Persistent vs Persistent Connections

Non-Persistent

  • HTTP/1.0
  • Server parses request, responds, and closes TCP connection
  • 2 RTTs to fetch each object
  • Each object transfer suffers from slow start
  • Most 1.0 browsers used parallel TCP connections.

Persistent

  • Default for HTTP/1.1
  • On same TCP:
    • connection: server,
    • parses request, responds, parses new request, …
  • Client sends requests for all referenced objects as soon as it receives base HTML.
  • Fewer RTTs and less slow start.
  • Prefetching

Basic Authentication

  • When challenged, the client sends their user ID and password in the clear to the server.
  • Not secure enough (snooping is easy) but useful for simple things

Client-Server Interaction: Authentication

  • Authentication goal: Control access to server documents
  • Stateless: client must present authorization in each request
  • Authorization: typically name, password
    • Sends Authorization header line in request
    • If no authorization presented, server refuses access, sends WWW-Authenticate header line in response.
  • Browser caches name and password so that user does not have to repeatedly enter it.

Basic authentication. A client requests a protected page. The server responds with a '401 Unauthorized' status code and includes in the message header 'WWW-Authenticate'. The client replies with the 'Authorization' header with the client's username and password encoded in Base64. The server verifies the username and password. If verified, the server responds with the protected page and a status code of '200 OK'. If the server fails to authenticate the user, it responds with the '403 Forbidden' status code.

Open Authentication (OAuth)

Open authentication. The client first connects to the app server. The app server generates a random one-time code and redirects the client to the login page on the login server. It also attaches one of two keys that it previously exchanged with the login server as a query parameter. The client logs into the login server normally. When authenticated, the login server redirects the client back to the app server with a generated hash using the two exchanged keys and the one-time code. The app server verifies that the two keys and the one-time code are correctly. Once the app server can ascertain the two keys and one-time code are correct, then the client is accepted as logged in.

Web Caches (Proxy Servers)

Web Caching

  • Improve performance
    • Scalability
    • Response time
    • Load balancing
    • Availability
    • Saves network and server resources
  • Proxy cache
    • Done at the client-side

Goal

Fill client request without going to origin server.

  • User sets browser: Web accesses via web cache
  • Client sends all HTTP requests to web cache
    • If object at web cache, web cache immediately returns object in HTTP response
    • Else requests object from origin server, then returns HTTP response to client

Two client computers connecting to two origin web servers via a proxy server.

HTTP


Services

HTTP Client

Web browser = TCP client + HTTP + HTML/CSS/JS + DOM

Diagram of webserver connected to the client's web browser.
  The webserver is the intermediary between the client and static
  files on the Network file system (NFS) or App Servers
  or CGI scripts

HTTP Server

Web server = TCP Server + Port 80 + HTTP.

  • Built-in static file serving
  • Built-in scalability
  • Built-in security (HTTPS + auth) via .htaccess
  • Built-in telemetry (logs and error logs)
  • Extensibility: PHP (can violate view migration); CGI (good & language agnostic); App Servers (best): Tomcat JSP, WebSphere, WebLogic, NodeJS, ASP.NET, …).
  1. The client makes a request by specifying a URL and additional info.
  2. The webserver:
    1. Receives the request. (in the URL)
    2. Identifies the request as a CGI request.
    3. Locates the program corresponding to the request.
    4. Starts up the handling program (heavy weight process creation!!)
    5. Feeds request parameters to the handler (through stdin or environment variables).
  3. The Handler:
    1. Executes with the given request parameters
    2. The Output of the handler is sent via stdout back to the webserver for rerouting back to the requesting web browser.
    3. Output is typically a web page. Could be plaintext, JSON, or XML.
    4. Terminates.

Common Gateway Interface (CGI)

The Common Gateway Interface (CGI) is a simple interface for running external programs, software or gateways under an information server in a platform-independent manner.

Diagram of CGI workflow. The client submits a complete form, and
  the webserver passes the request and data to a CGI script, which
  processes it and replies with an output response which is forwarded back
  to the client by the webserver.

More on CGI

CGI defines the abstract parameters, known as metavariables, which describe the client’s request. Together with a concrete programmer interface this specifies a platform-independent interface between the script and the HTTP server.

Metavariables
AUTH_TYPE REMOTE_IDENT QUERY_STRING
CONTENT_LENGTH REMOTE_USER REMOTE_ADDR
CONTENT_TYPE REQUEST_METHOD REMOTE_HOST
GATEWAY_INTERFACE SCRIPT_NAME SERVER_PROTOCOL
PATH_INFO SERVER_NAME SERVER_SOFTWARE
PATH_TRANSLATED SERVER_PORT  
Example
Item Value
GATEWAY_INTERFACE CGI/1.1
SERVER_PROTOCOL HTTP/1.1
SERVER_SOFTWARE Apache/2.4.48 (Unix) PHP/7.4.20 OpenSSL/1.0.2k mod_wsgi/4.7.1 Python/3.8
SERVER_NAME www.eecs.yorku.ca
SERVER_PORT 443
REQUEST_METHOD GET
SCRIPT_NAME /~vwchu/test.cgi
QUERY_STRING key=value&a=1234
REMOTE_ADDR 130.63.96.170
REMOTE_PORT 39256
REMOTE_USER vwchu

The server code and the application code are all mixed together in the run method.

First, move the application code out of the run method into a separate doRequest method. Note that the doRequest method requires a lot of arguments, too many.

So, we will group them into Request and Response classes.

Take the doRequest method entirely out of the HTTPServer class into a subclass, MainService.

Put the Request class and all of the methods in HTTPServer class related to handling a request out of the HTTPServer class into a new class called RequestContext. Do the same for the Response class and all of the methods in HTTPServer class related to forming a response, resulting in the ResponseContext class.

Split the MainService class into each of the different services. One service per class. Each class has itself own doRequest method.

That leaves our MainService class with the barest of run methods and a map that registers the available service handlers to their respective resource paths. Later, we will see how Tomcat does it via annotations.

Aside

Refactoring our Code

Goals

  • Code maintainability
  • Modularity
  • Encapsulation & abstraction
  • Separation of concerns
  • Handling complexity
  • We shouldn’t have to rebuild the machinery that runs the webserver every time we want to create a web service!

If we continue refactoring…

If we continue refactoring, we might end up with something like this…

Despite all this added complexity, it’s okay. Because to us, web app developers, the web server’s code is a black-box that we access via public APIs.

Complex UML diagram of the refactored webserver and webapp.

This slide is intentionally left blank.

Return to Course Page or Part II.