COSC-6421: Advanced Database Systems
York University
Winter 2007
Assn #2: SQL
Querying the York River Bookseller Database

In this project, you will work with an existing database, the York River Bookseller's Database (YRB DB). Actually, each will install his or her own private copy of the database on which to work. You are to implement some SQL queries over the YRB DB, using our IBM DB2 server.

There are five queries that you are to write. Each is worth four points, for a total of twenty points for Assn #2. I am the manager and I provide you with English specifications for the queries that I want answered from the database. You are providing SQL implementations of these queries for YRB DB to do the job.

Each query is given a name for bookkeeping. Also, the answer table for each is provided! You have successfully implemented the SQL query if you get the same result when run in DB2 on YRB DB. (And, of course, you have not cheated by making a simple query on purpose that manages to print the same results!) Otherwise, your query is unsuccessful. Grading per query is as follows: two points for style and logical correctness; and two points for "compiling" and producing the correct answer set.

 
  The YRB Database

Two scripts are provided:

The script yrb-create will create the YRB DB schema for you. It will also populate the tables with mock data. The script yrb-drop is provided for convenience. It will drop your copy of YRB DB from your DB2 schema space. If you mess things up, you can always drop YRB DB and then re-create it easily.

To create the YRB DB in your DB2 schema space:

  • % db2 -tf yrb-create

Read the schema definition in yrb-create for YRB DB to understand the design and what the YRB DB is about.

York River Booksellers is an online bookstore. Customers belong to various clubs. Everybody belongs to at least the club basic. Books are available via different offers per club. Thus the price of a book is determined by which offer (thus club) it was bought under.

Consider an order to be all the books a customer bought at the same time (when in purchase). These will be shipped together.

A DB2 database named c6421 has been created on the DB2 server on Prism for your use. Log into, or open a window, on any Prism machine (e.g., Red). There, you can connect to the database to set up your YRB DB and execute your queries.

See the resources links on our class homepage for basic information about DB2, connecting, and basic commands.

 
  The Queries
1. uniclub

To how many university clubs (for example, Guelph, UofT, Waterloo, and York) does each customer belong?

Do not allow duplicate rows in the answer table. If a customer belongs to none, list zero for that customer. Also, you are not allowed to use the words Guelph, UofT, Waterloo, or York in your query. We may add new university clubs in the near future.

Order by name + city.

Answer table for uniclub.

Hint: See the SQL operator like in the textbook (pp 140-141, 3rd ed.).

2. pairs

Which pairs of customers have bought at least three books in common?

Print three columns: two with the customers' names and one with the number of books in common. Do not return any duplicates. Furthermore, say 'Mark Dogfurry' and 'Zebulon Zilio' have four books in common, only output ('Mark Dogfurry', 'Zebulon Zilio', 4) and not ('Zebulon Zilio', 'Mark Dogfurry', 4)!

If 'Mark Dogfurry' and 'Zebulon Zilio' have each bought the same book three times, this does not count. It has to be at least three different books.

Order by the names. For each tuple, be certain that name first (column one) is alphabetically before name second (column two).

Answer table for pairs.

3. orders

All the books a customer orders at the same time (when) are considered to be part of the same "order". Those books are shipped together to the customer and the customer is billed for the entire order.

What is the bill for each order?

Show the customer's name and city, the date and time of the order (not the timestamp!), and the bill. (The column function date extracts the date from a timestamp, and the column function time extracts the date from a timestamp.) Order by name + city + when.

Answer table for orders.

Hint: Want to shorten one of your decimal column types so your lines do not wrap on output? You can cast it to be smaller. For instance,

cast(col_a as decimal(5,2))

This will print decimals (numbers) with 5 numerals to the left of the "." and 2 to the right.

4. weights

What is the total weight of each order?

Order by weight, descending, and then by customers' names (in cases of ties on weight).

Answer table for weights.

5. percentage

For each language, what is the city that has the largest percentage of book sales in that language, across cities which have had at least 20 books in total sold?

Measure book sales in number of books.

For example, say 25% of the books sold in Yorktown are French, 22% in Harrisonburg are French, 18% in Richmond, and so forth (and each of this cities qualifies in that at least 20 books have been sold in each), "French Yorktown 25" would be in the output.

Show language, city, percentage, and number of books (sold in that city in that language) for the output rows. Order by language + city.

Answer table for percentage.

 
  Deliverables

Due by 11:59pm Friday 16 March 2007.

For each query, write your query in a file with the corresponding name as above (uniclub, pairs, orders, weights, and percentage).

Do not include the answers generated by each of your queries. I will test your queries myself.

When you are finished, use the submit command to turn in your work. We shall publish soon the submit script that you are to use. The command will look something like:

% submit 6421 two uniclub pairs orders weights percentage

 
  Hints and DB2 Comments

DB2 SQL provides a useful clause called with. It is easiest to show it's use by example.

refund

Who has bought a book via some club that he or she could have gotten less expensively using another club to which he or she belongs? List the customer's name, the book's title and year, the quantity bought, the price for which it was bought, and the best price for which it could have been bought.

    with
        best (cid, title, year, lowest) as
            (select distinct M.cid, O.title, O.year, min(price)
                 from yrb_member M, yrb_purchase P, yrb_offer O
                 where M.club = O.club and
                       M.cid = P.cid and
                       P.title = O.title and P.year = O.year
                 group by M.cid, O.title, O.year)
    select C.name, P.title, P.year, qnty, price, lowest
        from yrb_customer C, yrb_purchase P, Best B, yrb_offer O
        where P.cid = B.cid and P.title = B.title and
              P.year = B.year and P.title = O.title and
              P.year = O.year and P.club = O.club and
              C.cid = P.cid and
              O.price > B.lowest
        order by C.name, P.title, P.year;
    

Answer table for query refund.

One may have more tables in the with clause, if need be:

	with
		first (...) as
			(...),
		second (...) as
			(...),
		third (...) as
			(...)
	select ...
		...;
	

Note that you can use the (temporary) view first inside the definition for second, and the views first and second inside the definition for third, and so forth.

DB2 is picky about nested SELECTS always being named via AS. For example,

	select ...
		from (select ...) as A,
			 (select ...) as B,
			 ...
		...;
	

Even if you do not need the alias names A or B later on, give the sub-queries (nested SELECTS) names anyway. Otherwise, DB2 will complain.