EECS-3421A
Introduction to Database Systems

York University
Fall 2015
Project #3:
SQL Jeopardy:
Querying the York River Bookseller Database
  Project

In this project, you will work with an existing database, the York River Bookseller's Database (YRB DB). Actually, each will install his or her own private copy of the database on which to work. For SQL Jeopardy, you are to implement a number of SQL queries over the YRB DB. The project is 10% of the total grade.

There are ten queries that you are to write. Each is worth one point for a total of ten points for Project #3. I am the manager and I provide you with English specifications for the queries that I want answered from the database. You are providing SQL implementations of these queries for YRB DB to do the job.

Each query is given a name for bookkeeping. Also, the answer table for each is provided! You have successfully implemented the SQL query if you get the same result when run in DB2 on YRB DB. (And, of course, you have not cheated by making a simple query on purpose that manages to print the same results!) Otherwise, your query is unsuccessful. Grading per query is all-or-nothing. Either your query produces the right results, or it does not.

The queries below are arranged roughly in order of increasing difficulty. The ones at the beginning are straightforward, and the ones towards the end are difficult. Do not get too discouraged if you are unable to do the final few.

Honesty: For projects, you are permitted to confer with others, seek advice, and (to a reasonable extent) help. However, remember that copying someone else's queries and claiming them as your own work is plagiarism. You must do your own work.

 
  The YRB Database

Two scripts are provided:

The script yrb-create will create the YRB DB schema for you. It will also populate the tables with mock data. The script yrb-drop is provided for convenience. It will drop your copy of YRB DB from your DB2 schema space. If you mess things up, you can always drop YRB DB and then re-create it easily.

To create the YRB DB in your DB2 schema space:

  • % db2 -tf yrb-create

Read the schema definition in yrb-create for YRB DB to understand the design and what the YRB DB is about.

York River Booksellers is an online bookstore. Customers belong to various clubs. Everybody belongs to at least the club basic. Books are available via different offers per club. Thus the price of a book is determined by which offer (thus club) it was bought under.

Consider an order to be all the books a customer bought at the same time (when in purchase). These will be shipped together.

 
  The Queries
1. ancient

Who are the customers who made a purchase before January 1 2002 ('2002-1-1')?

Show customer's name and city and the date (not timestamp!) of purchase. Eliminate duplicates.

Order by name + city + date

Answer table for ancient.

Hint: The attribute when is of type timestamp. This is different than type date. So to use it say in a WHERE condition to compare it with, say, '2002-1-1'. you will need to cast it to type date: cast(when as date). Likewise, '2002-1-1' is a string! So for DB2 to consider it as a date, you must cast it as well: cast('2002-1-1' as date).

Hint: One can specify in the ORDER BY clause whether the ordering should be descending or ascending for each attribute.

order by col_a asc, col_b desc, col_c desc, ...

The default is ascending. For highest to lowest, one wants descending.

2. clubreport

For each club, what are the total sales, the number of distinct book titles (title + year) bought via that club, and the number of customers who belong to that club?

Order by total sales, from highest to lowest, and then by club name in cases of ties.

Answer table for clubreport.

3. repeat

Which customers have bought more than one copy of the same book over time?

Show the customer's name, the book's title and year, and how many copies were purchased.

Order by name + title + year.

Answer table for repeat.

4. nofrench

Which cities had no one in that city purchase any books in French?

Do not have duplicates in the answer table.

Order by city.

Answer table for nofrench.

5. uniclub

To how many university clubs (for example, Guelph, UofT, Waterloo, and York) does each customer belong?

Do not allow duplicate rows in the answer table. If a customer belongs to none, list zero for that customer. Also, you are not allowed to use the words Guelph, UofT, Waterloo, or York in your query. We may add new university clubs in the near future.

Order by name + city.

Answer table for uniclub.

6. percentage

For each language, what is the city that has the largest percentage of book sales in that language, across cities which have had at least 20 books in total sold?

Measure book sales in number of books.

For example, say 25% of the books sold in Yorktown are French, 22% in Harrisonburg are French, 18% in Richmond, and so forth (and each of this cities qualifies in that at least 20 books have been sold in each), "French Yorktown 25" would be in the output.

Show language, city, percentage, and number of books (sold in that city in that language) for the output rows. Order by language + city.

Answer table for percentage.

7. orders

All the books a customer orders at the same time (when) are considered to be part of the same "order". Those books are shipped together to the customer and the customer is billed for the entire order.

What is the bill for each order?

Show the customer's name and city, the date and time of the order (not the timestamp!), and the bill. Order by name + city + when.

Answer table for orders.

Hint: Want to shorten one of your decimal column types so your lines do not wrap on output? You can cast it to be smaller. For instance,

cast(col_a as decimal(5,2))

This will print decimals (numbers) with 5 numerals to the left of the "." and 2 to the right.

8. weights

What is the total weight of each order?

Order by weight, descending, and then by customers' names (in cases of ties on weight).

Answer table for weights.

9. billing

As in Query orders, all the books a customer orders at the same time (when) are considered to be part of the same "order". Those books are shipped together to the customer and the customer is billed for the entire order.

What is the bill for each order, with the shipping cost added?

The shipping cost is as follows: The weight of the order is looked up in the 'yrb_shipping' table. If the weight is X grams, the entry just higher than (or equal to) X is found in the shipping table and the associated shipping price is added. For instance, if the order's weight is 1447 grams, the entry '1500 5.00' is found, and so the cost is $5.00.

Show the customer's name and city, the date and time of the order (not the timestamp!), the bill without the shipping charge, and the total bill (with shipping). Order by name + city + when.

Answer table for billing.

Hint: This query is an extension of query orders; it added another column that shows the total price, with shipping. Also note that you can reuse your query weights to help figure out the shipping cost for each purchase.

You may find DB2 SQL's with clause useful for this query. (You probably would find with useful for query percentage too and perhaps others.) It lets you declare "temporary tables" for use within a query. So you can store the results of one query for use in another. See below on this page for an example.

10. droppable

A club is droppable if all the purchases that are in the database could still have been made by the customers, just using the remaining clubs instead.

Note that each "reassigned" club per purchase must be a club for which the purchase's customer is a member. If there are no other legitimate clubs for reassignment for some purchase, the club in question is not droppable.

For each droppable club, how much more money (or less!) would YRB have made if that club had never existed?

Assume that a club "reassignment" for each purchase involving the dropped club replaces it with a best offer (across the remaining, legitimate clubs) for that customer.

Order by club.

Answer table for droppable.

 
  Deliverables

Due by 11:59pm Friday 4 December 2015.

For each query, write your query in a file with the corresponding name as above (ancient, clubreport, repeat, nofrench, uniclub, percentage, orders, weights, billing, and droppable).

Do not include the answers generated by each of your queries. We will be testing your queries ourselves for grading your project.

When you are finished, use the submit command to turn in your work. We shall publish soon the submit script that you are to use. The command will look something like:

% submit 3421 yrb ancient clubreport repeat nofrench uniclub percentage orders weights billing droppable

 
  Hints & DB2 Comments

DB2 SQL provides a useful clause called with. It is easiest to show its use by example.

refund

Who has bought a book via some club that he or she could have gotten less expensively using another club to which he or she belongs? List the customer's name, the book's title and year, the quantity bought, the price for which it was bought, and the best price for which it could have been bought.

with
    best (cid, title, year, lowest) as (
        select distinct M.cid, O.title, O.year, min(price)
            from yrb_member M, yrb_purchase P, yrb_offer O
            where M.club = O.club and
                  M.cid = P.cid and
                  P.title = O.title and P.year = O.year
            group by M.cid, O.title, O.year
    )
select C.name, P.title, P.year, qnty, price, lowest
    from yrb_customer C, yrb_purchase P, Best B, yrb_offer O
    where P.cid = B.cid and P.title = B.title and
          P.year = B.year and P.title = O.title and
          P.year = O.year and P.club = O.club and
          C.cid = P.cid and
          O.price > B.lowest
    order by C.name, P.title, P.year;

Answer table for query refund.

with clause

One may have more tables in the with clause, if need be:

with
    first (...) as
        (...),
    second (...) as
        (...),
    third (...) as
        (...)
select ...
    ...;

Note that you can use the (temporary) view first inside the definition for second, and the views first and second inside the definition for third, and so forth.

DB2 is picky about nested SELECTS always being named via AS. For example,

select ...
    from (select ...) as A,
         (select ...) as B,
         ...
    ...;

Even if you do not need the alias names A or B later on, give the sub-queries (nested SELECTS) names anyway. Otherwise, DB2 will complain.