Server-Side
Web Services
- HTTP Protocol
- More HTTP: Other Features
- HTTP Services
Version
v1.3 | 07 October 2021 |
v1.2 | 05 October 2021 |
v1.1 | 30 September 2021 |
v1.0 | 28 September 2021 |
Acknowledgments
Thanks to:
- Hamzeh Roumani, who has shaped EECS-4413 into a leading hands-on CS course at EECS and who generously shared all of his course materials and, more importantly, his teaching philosophy with me;
- Parke Godfrey, my long-suffering Master’s supervisor and mentor; and
- Suprakash Datta for giving me this opportunity to teach this course.
Printable version of the talk
HTTP
The Protocol
What is a protocol?
A protocol defines format & order of messages sent and received among network entities, and actions taken on message transmission, receipt.
The Protocol Stack
Hypertext Transfer Protocol (HTTP)
The Hypertext Transfer Protocol (HTTP) is an application layer protocol for distributed, collaborative, hypermedia information systems, used primarily with the WWW (World Wide Web) in the client-server model where a web browser is a client communicating with the webserver which is hosting the website. Since 1990, this has become the foundation for data communication. HTTP is a standard and stateless protocol that is used for different purposes as well using extensions for request methods, error codes, as well as headers.
- Created by Tim Berners-Lee at CERN (1991)
- Standardized and much expanded by the IETF.
- HTTP/1.0 (RFC 1945), HTTP/1.1 (RFC 2068), HTTP/2 (RFC 7540)
- Other related protocols: HTTPS and WebSocket.
- Rides on top of the TCP protocol, standard on port 80
- TCP provides: reliable, bi-directional, in-order byte stream
- Goal: transfer objects between systems
- Do not confuse with other WWW concepts:
- HTTP is not page layout language (that is HTML)
- HTTP is not object naming scheme (that is URLs)
HTTP In Operation
HTTP Request
HTTP Request Methods
- GET
- Used to request a particular resource data from the Web server by
specifying the parameters as a query string (name and value pairs) in the URL
part of the request.
- Are bookmark-able as they appear in the URL.
- Cache-able.
- Saved in the browser history if it is executed using a web browser.
- Character length restrictions (2048 chars maximum).
- Cannot be used to send binary data.
- Data can only be retrieved and have no other effect.
- When communicating sensitive data such as login credentials, should not be used.
- A safe and ideal method to request data only.
- HEAD
- Almost identical to the GET method, but the only difference is that it will not return any response body. The HEAD request becomes useful for testing whether the GET request will actually respond before making the actual GET request.
- DELETE
- Used to delete any specific resource.
- TRACE
- Used for performing a message loop-back, which tests the path for the target resource. It is useful for debugging purposes.
- CONNECT
- Used for establishing a tunnel to the server recognized by a given URI.
- POST
- Used to send data to the Web server in the request body of HTTP.
- Not be bookmark-able as they do not appear in the URL.
- Not cached.
- Not saved as history by the web browsers.
- No restriction on the amount of data to be sent.
- Can be used to send ASCII as well as binary data.
- Use when communicating sensitive data, such as when submitting an HTML form.
- Security depends on the HTTP protocol.
- By using secure HTTP (HTTPS), information is protected.
- PUT
- Requests that the target resource creates or updates its state with the state defined by the representation enclosed in the request. Used to update existing resources with uploaded content or to create a new resource if the target resource is not found. The difference between POST and PUT is that PUT requests are static, which means calling the same PUT method multiple times will not yield a different result.
- PATCH
- Requests that the target resource modifies its state according to the partial update defined in the representation enclosed in the request.
- OPTIONS
- Used for describing the communication preferences for any target resource.
Universal Resource Identifiers (URI)
URIs are also known as WWW addresses, Uniform Resource Name (URN), and the Uniform Resource Locator (URL). These are formatted, case-insensitive strings that identify a web service, a resource, a website, etc.
URI = "http :" "//" host[ ":" port ][ abs_path ["?" query]]
- A standard way to send many name/value pairs in a single string (QUERY_STRING or Form data)
- Specified in RFC 2396 ‘Uniform Resource Identifiers (URI): Generic Syntax’.
Rules of URL-Encoding
- All submitted URLs, query strings or form data should be concatenated into
single strings of ampersand (&) separated name=value pairs, one pair for each
form tag or query parameter. Like this:
form_tag_name_1=value_1&form_tag_name_2=value_2&...
- Spaces in a name or value are replaced by a plus (+) sign or “%20” (a percent sign followed by 20). This is because url’s cannot have spaces in them and under METHOD=GET, the form data is supplied in the query string in the url.
- Other characters (ie, =, &, +) are replaced by a percent sign (%) followed by the two-digit hexadecimal equivalent of the punctuation character in the Ascii character set.
- Otherwise, it would be hard to distinguish these characters inside a query or form variable from those between the variables in the first rule above.
HTTP Response
HTTP Response Status Codes
1XX | INFORMATIONAL |
---|---|
100 | HTTP CONTINUE |
101 | SWITCHING PROTOCOLS |
2XX | SUCCESS |
---|---|
200 | OK |
201 | CREATED |
202 | ACCEPTED |
203 | NON AUTHORITATIVE INFORMATION |
204 | NO CONTENT |
205 | RESET CONTENT |
206 | PARTIAL CONTENT |
4XX | CLIENT ERROR |
---|---|
400 | BAD REQUEST |
401 | UNAUTHORIZED |
402 | PAYMENT REQUIRED |
403 | FORBIDDEN |
404 | NOT FOUND |
405 | METHOD NOT ALLOWED |
406 | NOT ACCEPTABLE |
407 | PROXY AUTHENTICATION REQUIRED |
408 | REQUEST TIME OUT |
409 | CONFLICT |
410 | GONE |
411 | LENGTH REQUIRED |
412 | PRECONDITION FAILED |
413 | REQUEST ENTITY TOO LARGE |
414 | REQUEST URI TOO LARGE |
415 | UNSUPPORTED MEDIA TYPE |
3XX | REDIRECTION |
---|---|
300 | MULTIPLE CHOICES |
301 | MOVED PERMANENTLY |
302 | MOVED TEMPORARILY |
303 | SEE OTHER |
304 | NOT MODIFIED |
305 | USE PROXY |
5XX | SERVER ERROR |
---|---|
500 | INTERNAL SERVER ERROR |
501 | NOT IMPLEMENTED |
502 | BAD GATEWAY |
503 | SERVICE UNAVAILABLE |
504 | GATEWAY TIME OUT |
505 | HTTP VERSION NOT SUPPORTED |
Try out HTTP (client side) for yourself
- Telnet to your favorite Web server, e.g.:
$ telnet www.eecs.yorku.ca 80
Open TCP connection to port 80 (default http server port) at
www.eecs.yorku.ca
. Anything typed is sent to port 80 at
www.eecs.yorku.ca
.
- Type in a GET HTTP request:
$ GET /course_archive/2021-22/F/4413/index.html HTTP/1.1
Type this at the prompt (followed by two carriage returns) to send this minimal GET request to the HTTP server.
- Look at response message sent by the HTTP server!
More HTTP
Other Features
- Conditional GET
- Redirection
- Basic Authentication
- Open Authentication (OAuth)
- Persistence & Cookies
- Keep-Alive (with HTTP/1.1)
- Web Caching
Conditional Get
If-modified-since request header
The client tells the server it has data and asks the server whether it has a fresher version or the client is up to date.
- Goal: Don’t send object if the client has an up-to-date copy (cached).
- Client: Specify the date of cached copy in the HTTP request:
If-modified-since: <date>
. - Server: Response contains no object if the cached copy is up to date:
HTTP/1.1 304 Not Modified
.
Session Management
Sessions
HTTP cannot maintain state (RESTful) but we can:
Client-Side
- State maintained by client and sent as needed to server.
Network Side
- State shuffled back and forth with every request/response
- Typically through hidden fields, URL Rewriting, or Cookies
Server Side
- Server keeps it in memory or a database with a key derived from the client’s credentials (known though authentication or assigned).
- The key (cookie) is stored in an HTTP header
(network side).
Redirection
In HTTP, redirection is triggered by a server sending a special redirect
response to a request. Redirect responses have status codes that start with 3
,
and a Location
header holding the URL to redirect to.
When browsers receive a redirect, they immediately load the new URL provided in
the Location
header. Besides the small performance hit of an additional
round-trip, users rarely notice the redirection.
Keep-Alive - Persistent Connections
HTTP 1.0 Problem:
- Each request opens new connection
- Starting up is slow
- Takes several packets
- Short transfers are hard on TCP
- Stuck in “slow start” phase of TCP connection
- Loss recovery is poor when windows are small
- Lots of extra connections
- Increases server state/processing
HTTP 1.1 Solution:
- Keeps connection open for a time after server response so that multiple requests can ride on single connection
- Reduced connection setup overhead.
Non-Persistent vs Persistent Connections
Non-Persistent
HTTP/1.0
- Server parses request, responds, and closes TCP connection
- 2 RTTs to fetch each object
- Each object transfer suffers from slow start
- Most 1.0 browsers used parallel TCP connections.
Persistent
- Default for
HTTP/1.1
- On same TCP:
- connection: server,
- parses request, responds, parses new request, …
- Client sends requests for all referenced objects as soon as it receives base HTML.
- Fewer RTTs and less slow start.
- Prefetching
Basic Authentication
- When challenged, the client sends their user ID and password in the clear to the server.
- Not secure enough (snooping is easy) but useful for simple things
Client-Server Interaction: Authentication
- Authentication goal: Control access to server documents
- Stateless: client must present authorization in each request
- Authorization: typically name, password
- Sends
Authorization
header line in request - If no authorization presented, server refuses access, sends
WWW-Authenticate
header line in response.
- Sends
- Browser caches name and password so that user does not have to repeatedly enter it.
Open Authentication (OAuth)
Web Caches (Proxy Servers)
Web Caching
- Improve performance
- Scalability
- Response time
- Load balancing
- Availability
- Saves network and server resources
- Proxy cache
- Done at the client-side
Goal
Fill client request without going to origin server.
- User sets browser: Web accesses via web cache
- Client sends all HTTP requests to web cache
- If object at web cache, web cache immediately returns object in HTTP response
- Else requests object from origin server, then returns HTTP response to client
HTTP
Services
HTTP Client
Web browser = TCP client + HTTP + HTML/CSS/JS + DOM
HTTP Server
Web server = TCP Server + Port 80 + HTTP.
- Built-in static file serving
- Built-in scalability
- Built-in security (HTTPS + auth) via
.htaccess
- Built-in telemetry (logs and error logs)
- Extensibility: PHP (can violate view migration); CGI (good & language agnostic); App Servers (best): Tomcat JSP, WebSphere, WebLogic, NodeJS, ASP.NET, …).
- The client makes a request by specifying a URL and additional info.
- The webserver:
- Receives the request. (in the URL)
- Identifies the request as a CGI request.
- Locates the program corresponding to the request.
- Starts up the handling program (heavy weight process creation!!)
- Feeds request parameters to the handler (through stdin or environment variables).
- The Handler:
- Executes with the given request parameters
- The Output of the handler is sent via stdout back to the webserver for rerouting back to the requesting web browser.
- Output is typically a web page. Could be plaintext, JSON, or XML.
- Terminates.
Common Gateway Interface (CGI)
The Common Gateway Interface (CGI) is a simple interface for running external programs, software or gateways under an information server in a platform-independent manner.
More on CGI
CGI defines the abstract parameters, known as metavariables, which describe the client’s request. Together with a concrete programmer interface this specifies a platform-independent interface between the script and the HTTP server.
Metavariables
AUTH_TYPE | REMOTE_IDENT | QUERY_STRING |
CONTENT_LENGTH | REMOTE_USER | REMOTE_ADDR |
CONTENT_TYPE | REQUEST_METHOD | REMOTE_HOST |
GATEWAY_INTERFACE | SCRIPT_NAME | SERVER_PROTOCOL |
PATH_INFO | SERVER_NAME | SERVER_SOFTWARE |
PATH_TRANSLATED | SERVER_PORT |
Example
Item | Value |
---|---|
GATEWAY_INTERFACE | CGI/1.1 |
SERVER_PROTOCOL | HTTP/1.1 |
SERVER_SOFTWARE | Apache/2.4.48 (Unix) PHP/7.4.20 OpenSSL/1.0.2k mod_wsgi/4.7.1 Python/3.8 |
SERVER_NAME | www.eecs.yorku.ca |
SERVER_PORT | 443 |
REQUEST_METHOD | GET |
SCRIPT_NAME | /~vwchu/test.cgi |
QUERY_STRING | key=value&a=1234 |
REMOTE_ADDR | 130.63.96.170 |
REMOTE_PORT | 39256 |
REMOTE_USER | vwchu |
Aside
Refactoring our Code
Goals
- Code maintainability
- Modularity
- Encapsulation & abstraction
- Separation of concerns
- Handling complexity
- We shouldn’t have to rebuild the machinery that runs the webserver every time we want to create a web service!
If we continue refactoring…
If we continue refactoring, we might end up with something like this…
Despite all this added complexity, it’s okay. Because to us, web app developers, the web server’s code is a black-box that we access via public APIs.
This slide is intentionally left blank.
Return to Course Page or Part II.