Okay, after populating the data tables, it's time to do some querying against your data.
You'll start with some simple queries on the Product table. You'll continue by seeing how to associate products with their categories using the ProductCategory junction table, and finally you'll see how to associate the products with their departments.
In the following sections, you'll look at queries that involve the Product table alone, not relating it to Category or Department. Let's start with one of the simplest meaningful queries that you can make of your database.
A single SELECT statement combined with the ORDER BY clause does the trick in this case; this query should work the same for any existing relational database software:
SELECT Name, Description FROM Product ORDER BY Name;
As a result of this query, you should see the product names and descriptions you typed earlier when populating the Product table.
This isn't much more complicated than the previous query. This time, you filter the results based on the OnCatalogPromotion field, which needs to have a value of 1. Here's the query:
SELECT Name, Description FROM Product WHERE OnCatalogPromotion=1 ORDER BY Name;
Note that Access will store the OnCatalogPromotion value as -1, so you'll need to alter the WHERE clause appropriately to match the required records.
Based on your sample data, this query should return the following data:
Name Description --------------------- ------------------------------------- Beast Mask Red-eyed and open-mouthed scary mask guaranteed to scare! Devil Horn Boppers These red glitter boppers are guaranteed to attract attention. They will soon be under your spell! Vampire Disguise Vampire Set consisting of wicked wig, fangs, and fake blood. Spinechilling!
For this query, you need to use the ORDER BY clause to order the list of products by Price in descending order:
SELECT Name, Price FROM Product ORDER BY Price DESC;
Note that you can use multiple criteria with the ORDER BY clause. If, say, you want the products with the same price to be sorted by name in ascending order, you'd add a few bits to the query:
SELECT Name, Price FROM Product ORDER BY Price DESC, Name ASC;
The ASC keyword is optional because ascending is the default order anyway.
Now let's see how you get only the top five most expensive products. In practice, you need to limit the results you get from the previous query. This isn't handled in the same way by each RDBMS, so let's see how you can do that with each of them. In any case, you expect to get this list of results:
Name Price ----------------------------------- ---------------- Miss Santa 49.9900 Cleopatra Kit 14.9900 Vampire Disguise 9.9900 Horned Devil Mask 5.9900 Beast Mask 5.9900
You may be wondering what would happen if you had asked for the top four most expensive items because you have two items priced at $5.99. Well, the way we've written the following queries, one of the $5.99 items would simply be omitted from the list. We're not going to worry too much about that here, but the way to get around this problem is described in detail, for each database, in Chapter 4, "Summarizing and Grouping Data."
SQL Server and Access support the TOP clause. To get the top five most expensive products, you need this command:
SELECT TOP 5 Name, Price FROM Product ORDER BY Price DESC, Name ASC;
With SQL Server, you can alternatively use SET ROWCOUNT to limit the number of returned rows. Here's an example:
SET ROWCOUNT 5; SELECT Name, Price FROM Product ORDER BY Price DESC, Name ASC;
Also, you can limit by specifying what percent of the result set to return. The following command returns the top 15 percent of most expensive products:
MySQL has the LIMIT keyword that has similar functionality. The following is the MySQL command that returns the first five most expensive products:
SELECT Name, Price FROM Product ORDER BY Price DESC, Name ASC LIMIT 0,5;
As you can see, the LIMIT clause has two parameters. The first one specifies the index of the first returned record (starting with zero), and the second specifies how many rows to return.
With Oracle, you use the ROWNUM pseudocolumn to limit the number of returned rows:
SELECT Name, Price FROM (SELECT Name, Price FROM Product ORDER BY Price DESC, Name ASC) WHERE ROWNUM<=5;
This technique works only when you restrict the query for a ROWNUM that starts with one (such as ROWNUM <=5). If you try to restrict using another range, such as ROWNUM BETWEEN m and n, where m is greater than one, the query will return no results. You'll see a workaround to this limitation in the next exercise.
ROWNUM is a pseudocolumn that associates a row number to each record returned by a query. What's important to note here is that ROWNUM values are assigned before the ORDER BY clause executes. For this reason, you need to use a subselect instead of simply limiting by ROWNUM in the first place, like this:
SELECT Name, Price FROM Product WHERE ROWNUM<=5 ORDER BY Price DESC, Name ASC
This query doesn't return the expected results because the WHERE filter is applied first and takes the first five rows (in a random order). These rows are then ordered by Price and Name (which isn't what you intended to do).
In real-world scenarios, you frequently need to present the user with lists of products, categories, and so on. When the list is long enough, a typical practice is to present a number of products at a time (five in these examples) and let the user browse the list using Next 5 Products and Previous 5 Products buttons.
In this exercise, you'll see how to retrieve the rows m to n (say, five to 10) in an ordered list of products. This time you'll order the products by their IDs.
At the first sight, it may seem that you can simply get the required products with a simple command such as this:
SELECT Name FROM Product WHERE ProductID BETWEEN 5 AND 10
However, this method isn't reliable because you aren't guaranteed to have the ProductIDs in sequential order, so you aren't guaranteed to get five products back. Additionally, this method is useless if the user wants the list of products sorted by any other column than ProductID.
For the purpose of these examples, you'll ask for the second group of five products (products size to 10), but because currently you have only eight products, this is what you should get:
ProductID Name ------------- ------------------------------------------- 6 Sequinned Devil Horns 7 Devil Horn Boppers 8 Bug Eyed Glasses
Let's see how you can ask your databases to give you a specific portion of an ordered list of products. The solution is different for each RDBMS, so let's analyze each one individually.
With MySQL, this problem turns out to be a piece of cake. With MySQL, you use the LIMIT keyword, just like in the previous example but using different parameters this time:
SELECT ProductID, Name FROM Product ORDER BY ProductID LIMIT 5,5;
This time you specified 5 as the first parameter (instead of 0), mentioning that you want retrieve rows starting with the sixth record in the result set (remember that the row number is zero-based). The second parameter specifies how many rows you want to retrieve.
With Oracle, you use a command similar to the one you saw in the previous example. However, there's a catch—you can't limit the result set based on the ROWNUM pseudocolumn being between m and n if m>1. If you execute the following query, no rows will be returned:
SELECT ProductID, Name FROM (SELECT ProductID, Name FROM Product ORDER BY ProductID) WHERE ROWNUM BETWEEN 6 AND 10;
The workaround to this problem is as follows:
SELECT ProductID, Name FROM (SELECT ProductID, Name, ROWNUM AS rn FROM (SELECT ProductID, Name FROM Product ORDER BY ProductID) ) inner WHERE inner.rn BETWEEN 6 AND 10;
Here you "saved" the ROWNUM values in a new row of the subquery, so you could filter correctly the results in the outer query.
Alternatively, and more elegantly, you can use the RANK analytic function, as described in Chapter 4, "Summarizing and Grouping Data":
SELECT ProductID, Name FROM ( SELECT RANK() OVER (ORDER BY ProductID) As Ranking, ProductID, Name FROM Product ORDER BY PRODUCTID ) WHERE Ranking BETWEEN 6 AND 8;
SQL Server doesn't provide you with a simple way to achieve your goal. The TOP clause works well as long as you're only interested in the top records of a query. There are a number of workarounds for this limitation.
Let's discuss the most obvious solution first. This uses the SQL-99 syntax (so with some tweaking it works with other databases as well) and uses correlated subqueries (correlated subqueries were discussed in Chapter 6, "Combining SQL Queries"):
SELECT ProductID, Name FROM Product AS P1 WHERE (SELECT COUNT(*) FROM Product AS P2 WHERE P1.ProductID >= P2.ProductID) BETWEEN 6 AND 10 ORDER BY ProductID;
Note |
This method only works if the list of products is ordered by a unique column.You can't use it, for example, if the products are ordered by price and if there's more than one product with the same price. |
Remember that a correlated subquery executes once for every record of the outer query. In this example, for each product returned by the outer query, the subquery calculates how many products have a lower ID. You're looking for products for which that number is between six and 10.
The second method uses the TOP function. The idea is simple, but the SQL code will look a bit awkward at first. If you want to retrieve products between six and 10, here's a working strategy:
First, you get the TOP 10 products, ordered by ProductID (ascending). The products you're interested in are the bottom of this list.
You order the resulting list in descending order by ProductID (so the rows you're interested in will be at the TOP).
You get the TOP 5 products from list.
You finally sort the remaining products in ascending order by ProductID.
By using subqueries, you can implement all these steps in a single SQL command:
SELECT ProductID, Name FROM (SELECT TOP 5 ProductID, Name FROM (SELECT TOP 10 ProductID, Name FROM Product ORDER BY ProductID) AS P1 ORDER BY ProductID DESC) AS P2 ORDER BY ProductID
The last technique I'll present for SQL Server consists of using a temporary table with an IDENTITY column. This solution is versatile so that it can also be implemented for other databases as well. For more information about temporary tables and IDENTITY columns, please read Chapter 12. The code for this solution, presented below, is self-explanatory:
/* Create a temporary table named #Product, having an IDENTITY column */ CREATE TABLE #Product (RowNumber SMALLINT NOT NULL IDENTITY(1,1), ProductID INT, Name VARCHAR(50)) /* Populate the temporary table with records from the table we browse through. The RowNumber column will generate consecutive numbers for each product. We're free to order the list of products by any criteria. */ INSERT INTO #Product (ProductID, Name) SELECT ProductID, Name FROM Product ORDER BY Price DESC /* Retrieve the requested group or records from the temporary table */ SELECT ProductID, Name FROM #Product WHERE RowNumber BETWEEN 6 and 10 /* Drop the temporary table */ DROP TABLE #Product
The first query that was presented for SQL Server (using correlated subqueries) works with no modifications with DB2. Here it is again, for reference:
SELECT ProductID, Name FROM Product AS P1 WHERE (SELECT COUNT(*) FROM Product AS P2 WHERE P1.ProductID >= P2.ProductID) BETWEEN 6 AND 10 ORDER BY ProductID;
The second solution is DB2 specific. It uses the ROWNUMBER function in a sub-query to generate a "rownumber" fabricated column, which is then used to filter and retrieve the specified range of products:
WITH temp AS (SELECT ProductID, Name, ROWNUMBER() OVER(ORDER BY ProductID) AS row FROM Product) SELECT ProductID, Name FROM temp WHERE row>5 and row<=10
ROWNUMBER, RANK, and DENSERANK are three analytical functions supported by DB2 that come in handy when you need to perform tasks that imply record numbering or ranking. The functions RANK and DENSE_RANK are also supported by Oracle, along with several others. (Chapter 4, "Summarizing and Grouping Data," demonstrates how to use the RANK function.) You'll also see them in action again, later in this case study. For detailed information about the analytic functions supported and how they work, please refer to your vendor's SQL reference manual.
In the previous exercises, you asked for products without associating them with the existing categories or departments. In reality, most of your queries regarding products will also involve the categories to which they belong.
In real-world scenarios, it's likely that you'll need to gather a list of products that belong to a specific category. In this case, the category of interest will always be identified by its ID (its primary key). You're interested in getting both the Name and Description of the products that belong to a specific category.
If you want only the IDs of the products associated with a specific Category, a simple query like this on ProductCategory solves the problem:
SELECT ProductID FROM ProductCategory WHERE CategoryID=3;
However, to get the Name and Description, you need to perform an INNER JOIN between Product and ProductCategory. You learned how to join tables in Chapter 7, "Querying Multiple Tables." This query returns all the products that belong to the category with an ID of 3:
SELECT Name FROM Product INNER JOIN ProductCategory ON Product.ProductID = ProductCategory.ProductID WHERE ProductCategory.CategoryID = 3;
As an alternative syntax, you could use the WHERE clause to join your two tables:
SELECT Name FROM Product, ProductCategory WHERE Product.ProductID = ProductCategory.ProductID AND ProductCategory.CategoryID = 3;
The results of the query look like this:
Name --------------------- Beast Mask Cleopatra Kit Horned Devil Mask Sequinned Devil Horns
Compared to the previous query, this section covers the opposite scenario. Now you know the product, and you need to find the categories associated with it. Here's the query that lists the categories that contain the product with ProductID of 6:
SELECT Category.CategoryID, Name FROM Category INNER JOIN ProductCategory ON Category.CategoryID = ProductCategory.CategoryID AND ProductCategory.ProductID = 6;
Caution |
Access doesn't like this join syntax because it's a bit more picky about how joins are constructed (see Chapter 7, "Querying Multiple Tables" for more information). Instead of using the JOIN syntax, you should use the alternative subquery method supplied here or alter the join appropriately. |
You can achieve the same result using subqueries. The subquery returns the IDs of categories associated with the product you want, and using these results you get the names and descriptions you're interested in:
SELECT CategoryID, Name FROM Category WHERE CategoryID IN (SELECT CategoryID FROM ProductCategory WHERE ProductCategory.ProductID = 6);
Caution |
Remember that MySQL doesn't support subqueries. None of the examples that include subqueries will run in MySQL. |
Based on the sample data, there are two matching categories for the product for which you were looking:
CategoryID Name -------------- -------------------- 3 Masks 4 Sweet Stuff
This kind of query is useful in situations where you want to find out which products are similar to a specified product. So how do you find out what products belong to the same categories as another product (remember that a product can belong to more than one category)?
After having run the previous two exercises, this should be fairly straightforward. First, remember how you extracted the products that belong to a category:
SELECT Name FROM Product INNER JOIN ProductCategory ON Product.ProductID = ProductCategory.ProductID AND ProductCategory.CategoryID = 3;
Now you need to determine which products belong to one or more categories, so you'll need to do something like this:
SELECT Name FROM Product INNER JOIN ProductCategory ON Product.ProductID = ProductCategory.ProductID WHERE ProductCategory.CategoryID IN (<list of categories>)
All you need to do is substitute <list of categories> for the list of categories in which you're interested.
Here's the query that alphabetically lists all the products that belong to the same categories as the product with the ID of 6:
SELECT DISTINCT Product.ProductID, Product.Name FROM Product INNER JOIN ProductCategory ON Product.ProductID = ProductCategory.ProductID WHERE ProductCategory.CategoryID IN (SELECT Category.CategoryID FROM Category INNER JOIN ProductCategory ON Category.CategoryID = ProductCategory.CategoryID WHERE ProductCategory.ProductID = 6) ORDER BY Name;
Here are the results:
ProductID Name ----------- --------------------- 1 Beast Mask 2 Cleopatra Kit 7 Devil Horn Boppers 3 Horned Devil Mask 6 Sequinned Devil Horns
Note that you use SELECT DISTINCT to guard against duplicate results. You need this because the subquery might return more categories, and the products that belong to more than one of them would be listed more than once (because, remember, a product can belong to more than one category!).
Unfortunately, the previous statement won't execute on MySQL because MySQL doesn't support nested queries. Access also doesn't like this particular syntax, but in just a moment, you'll see an alternative syntax that does work on Access.
The solution presented earlier is probably the most commonly used. However, as anticipated, you can obtain the same results using joins, with a much more elegant query. This solution also works with MySQL, which doesn't support subqueries:
SELECT DISTINCT P1.ProductID, P1.Name FROM Product P1 INNER JOIN ProductCategory PC1 ON P1.ProductID = PC1.ProductID INNER JOIN ProductCategory PC2 ON PC1.CategoryID = PC2.CategoryID WHERE PC2.ProductID = 6 ORDER BY Name;
Tip |
Referring to Figure 14-6 will help you construct join queries. Just follow the path of the table joins in the figure, and everything will become clearer. |
Notice that, in this example, we've used some table aliases to make the SQL code a bit shorter (ProductCategory is, after all, quite a long word to type!).
This is the same query, written with the alternate syntax (which works on Access):
SELECT DISTINCT P1.ProductID, P1.Name FROM Product P1, ProductCategory PC1, ProductCategory PC2 WHERE P1.ProductID = PC1.ProductID AND PC1.CategoryID = PC2.CategoryID AND PC2.ProductID = 6 ORDER BY Name;
Wow, you have just joined three tables! And, to add to the confusion, you used two instances of the ProductCategory table. The action of joining two instances of the same table is called a self-join, and it works just as if they were two separate tables.
Joining the two ProductCategory instances on the CategoryID field generates a list of ProductID/ProductID pairs, containing products associated with the same category.
Now that you have this list of products, which are related by their category, the rest is simple: In the right side of the list, you filtered according to the ProductID that you were looking for (6 in this case). In this manner, in the left side of the list there remained only the products related to the product with an ID of 6. Finally, in the left side of the list you joined with the Product table in order to get the names and descriptions of the resulted products.
Note that you still need the DISTINCT keyword to filter duplicate records from products that belong to more than one of the resulting categories.
Because you're expert in table joins by now, let's look at another simple exercise: Say you want to get a list containing categories and their products, like this:
Category Name Product Name --------------------------------- ----------------------- Exotic Costumes Cleopatra Kit Exotic Costumes Miss Santa Masks Horned Devil Mask Masks Cleopatra Kit Masks Sequinned Devil Horns Scary Costumes Vampire Disguise Scary Stuff Bug Eyed Glasses Sweet Stuff Sequinned Devil Horns Sweet Stuff Devil Horn Boppers
If you need to associate categories with products, the path followed by joins is quite clear. You have the ProductCategory table, which contains the IDs of associated categories and products. You join this table on its left and right sides to get the category and product names associated with the IDs. Here's the query that does this for you:
SELECT C.Name as "Category Name", P.Name as "Product Name" FROM Product P INNER JOIN ProductCategory PC ON P.ProductID = PC.ProductID INNER JOIN Category C ON PC.CategoryID = C.CategoryID ORDER BY C.Name, P.Name;
The same query, rewritten using WHERE statements instead of JOIN statements, looks like this (and works on Access):
SELECT C.Name as "Category Name", P.Name as "Product Name" FROM Product P, ProductCategory PC, Category C WHERE P.ProductID = PC.ProductID AND PC.CategoryID = C.CategoryID ORDER BY C.Name, P.Name;
This method is less obvious, but it's good to know that there are multiple ways to get the same results with SQL. In this particular scenario, the solution using table joins is likely to be faster, but in practice remember that only a good set of tests can show you what's the best solution for your database:
SELECT C.Name as "Category Name", Product.Name as "Product Name" FROM Product, Category C WHERE Product.ProductID IN (SELECT ProductID FROM ProductCategory WHERE ProductCategory.CategoryID = C.CategoryID) ORDER BY C.Name, Product.Name;
This query is a bit more complicated than the previous ones—if you have problems understanding it, please reread the section on correlated subqueries in Chapter 6, "Combining SQL Queries."
What you did was to ask from the start for a category name and product name without specifying anything about the linking table, ProductCategory. In the outer query, you asked only for the columns you were interested in without specifying how they're related. However, for each category, the WHERE clause uses the correlated subquery to filter out only the products that are associated with it.
Remember that correlated subqueries execute for each record of the main query. In this case, for each category returned by the outer query, the correlated subquery executes and returns the products associated with that category. The outer query gets the results back and uses them to compose the list containing category and product names.
The important detail to understand is the way the categories from the outer query are used in the subquery:
WHERE ProductCategory.CategoryID = C.CategoryID)
This line does the whole trick: The category of the outer query (C.CategoryID) is used in a table join in the correlated subquery. The category ID from the outer query is referenced through the C alias defined for Category.
Say you want to get a list containing all existing categories, along with the most expensive two products for each category. Based on the data you have, the list should contain the following products:
Category Product Price --------------------- ----------------------------- --------- Exotic Costumes Miss Santa 49.9900 Exotic Costumes Cleopatra Kit 14.9900 Masks Cleopatra Kit 14.9900 Masks Beast Mask 5.9900 Scary Costumes Vampire Disguise 9.9900 Scary Stuff Bug Eyed Glasses 2.7500 Sweet Stuff Sequinned Devil Horns 3.7500 Sweet Stuff Devil Horn Boppers 2.5000
Let's see how to obtain this list with SQL Server, Oracle, and DB2.
For SQL Server, here's the query that does the trick. You get the top two products for each category using a correlated subquery:
SELECT C.Name AS "Category", Product.Name AS "Product", Product.Price as "Price" FROM Product, Category C WHERE Product.ProductID IN (SELECT TOP 2 ProductID FROM ProductCategory WHERE ProductCategory.CategoryID = C.CategoryID ORDER BY Price DESC) ORDER BY C.Name, Product.Price DESC;
DB2 doesn't allow you to use FETCH FIRST n ROWS ONLY in subqueries, so you can't simply translate the SQL Server version (which uses TOP) to a DB2 equivalent.
The DB2 solution uses the ROWNUMBER function, which you already saw in an earlier example. However, this time you use PARTITION BY in combination with ORDER BY to create a ranking for each product in descending order of its price, for each category in part:
WITH tmp (CategoryName, ProductName, ProductPrice, rank) AS (SELECT C.Name, P.Name, P.Price, ROWNUMBER() OVER (PARTITION BY C.Name ORDER BY P.Price DESC) AS rank FROM Product P JOIN ProductCategory PC ON P.ProductID = PC.ProductID JOIN Category C ON C.CategoryID = PC.CategoryID) SELECT CategoryName, ProductName, ProductPrice FROM tmp WHERE rank <= 2 ORDER BY CategoryName
PARTITION BY C.Name specifies that you want the products ranking created separately for each category. ORDER BY P.Price DESC specifies that, for each category, you want the products sorted in descending order of their price.
ROWNUMBER (or ROW_NUMBER) always returns a unique number for each returned row. DB2 also supports two similar functions, RANK and DENSERANK (or DENSE_RANK). RANK and DENSERANK return the same value for records that are equal in respect to the ORDER BY criterion—in this case, if two or more products of a category have the same price, they'll receive the same ranking. The difference between RANK and DENSERANK is that RANK leaves gaps in rankings when two or more products have the same ranking. Say, for example, if the first three most expensive products have the same price, they'll all receive a ranking of 1. The following product in the list will have a RANK of 4 but a DENSERANK of 2 because DENSERANK leaves no gaps.
With Oracle, you apply a similar method as for DB2, except that the syntax is a little different. Oracle supports RANK and DENSE_RANK just like DB2, but it doesn't support ROW_NUMBER. For this reason the Oracle query will not be guaranteed to retrieve a maximum of two products for each category:
SELECT CategoryName, ProductName, ProductPrice FROM (SELECT C.Name as CategoryName, P.Name as ProductName, P.Price as ProductPrice, RANK() OVER (PARTITION BY C.Name ORDER BY P.Price DESC) AS rank FROM Product P JOIN ProductCategory PC ON P.ProductID = PC.ProductID JOIN Category C ON C.CategoryID = PC.CategoryID) WHERE rank <= 2 ORDER BY CategoryName;
Here you take things one step further and analyze how you can get different product listings based on the departments to which they belong. The queries become a bit more complicated because a fourth table, Department, comes into play.
Say you know the ID of a department, and you want to get all the products that belong to it.
In the previous exercises, you learned how to extract the products in a specific category. Now, instead of looking for the products in a category, you're looking for the products that belong to a list of categories (more specifically, the list of categories in the department for which you're looking).
Using subqueries, you use the IN keyword to filter the results based on a list of category IDs. Here's the SQL query that does the trick for you:
SELECT Product.ProductID, Product.Name FROM Product INNER JOIN ProductCategory ON Product.ProductID = ProductCategory.ProductID WHERE ProductCategory.CategoryID IN (SELECT CategoryID FROM Category WHERE DepartmentID = 1) ORDER BY Product.Name;
This query outputs these records:
ProductID Name ---------------------- --------------------- 2 Cleopatra Kit 4 Miss Santa 5 Vampire Disguise
Using table joins, you need to join the Product, ProductCategory, and Category tables. You don't need to also join the Department table because you don't need anything from it (its name or description). You filter the results based on the DepartmentID field of the Category table:
SELECT Product.ProductID, Product.Name FROM Product INNER JOIN ProductCategory ON Product.ProductID = ProductCategory.ProductID INNER JOIN Category ON ProductCategory.CategoryID = Category.CategoryID WHERE Category.DepartmentID = 1 ORDER BY Product.Name;
The same query, without using JOIN, looks like this (note that this form will work on Access and pre-9i versions of Oracle):
SELECT Product.ProductID, Product.Name FROM Product, ProductCategory, Category WHERE Product.ProductID = ProductCategory.ProductID AND ProductCategory.CategoryID = Category.CategoryID AND Category.DepartmentID = 1 ORDER BY Product.Name;
Now, what if you want to see a list containing the departments and the products that belong to them: For example:
Department Product -------------------- --------------------- Costume Accessories Beast Mask Costume Accessories Cleopatra Kit Costume Accessories Horned Devil Mask Costume Accessories Sequinned Devil Horns Full Costumes Cleopatra Kit Full Costumes Miss Santa Full Costumes Vampire Disguise Jokes and Tricks Bug Eyed Glasses Jokes and Tricks Devil Horn Boppers Jokes and Tricks Sequinned Devil Horns
This time, apart from the Product, ProductCategory, and Category tables, you'll also need the Department table. You need it to extract the department names.
Using table joins, you can extract this list using the following query. To get a list with products and departments, you make the usual trip starting with the Product table and ending with Department. This trip walks through the tables that have relationships: Product...ProductCategory...Category...Department. If you follow this road, what the query does becomes quite obvious:
SELECT Department.Name AS "Department", Product.Name AS "Product" FROM Department INNER JOIN Category ON Department.DepartmentID = Category.DepartmentID INNER JOIN ProductCategory ON Category.CategoryID = ProductCategory.CategoryID INNER JOIN Product ON ProductCategory.ProductID = Product.ProductID ORDER BY Department.Name, Product.Name;
You ordered departments by name, and for each department the products are also ordered by name.
Here is the same query using the alternate syntax that works on Access, as well as the other platforms:
SELECT Department.Name AS "Department", Product.Name AS "Product" FROM Product, ProductCategory, Category, Department WHERE Product.ProductID = ProductCategory.ProductID AND ProductCategory.CategoryID = Category.CategoryID AND Category.DepartmentID = Department.DepartmentID ORDER BY Department.Name, Product.Name;
If you prefer using correlated subqueries, this is the way to go:
SELECT D.Name AS "Department", P.Name AS "Product" FROM Product P, Department D WHERE ProductID IN (SELECT ProductID FROM ProductCategory INNER JOIN Category ON ProductCategory.CategoryID = Category.CategoryID WHERE Category.DepartmentID = D.DepartmentID) ORDER BY D.Name, P.Name;
On the outer query, you only specify the fields in which you're interested and enforce the relationships between them using the subquery. The correlated (inner) subquery is called for each department returned by the outer query, and it returns the ProductIDs associated with that department.
Note that you have flexibility to write the subquery in any number of ways, including using another subquery instead of table joins, like this:
SELECT D.Name AS "Department", Product.Name AS "Product" FROM Product, Department D WHERE ProductID IN (SELECT ProductID from ProductCategory WHERE ProductCategory.CategoryID IN (SELECT CategoryID FROM Category WHERE D.DepartmentID = DepartmentID)) ORDER BY D.Name, Product.Name;
You'll rarely want to complicate your queries like this, but it demonstrates how flexible SQL can be.
In the previous queries, when you retrieved a list of departments and their products, you always assumed that each department had at least one product. Just for the fun of it, let's add a new department now:
INSERT INTO Department (Name) VALUES ('Books');
With this new department in place, run the previous SQL queries again. Maybe you'll be surprised to see that the new department doesn't show up, but this behavior is quite correct.
If you want to have all the departments listed, regardless of whether they have products, you'll need to modify the first query to use outer joins instead of inner joins. Chapter 7, "Querying Multiple Tables," presented outer joins. Here's the updated query:
SELECT Department.Name AS "Department", Product.Name AS "Product" FROM Department LEFT JOIN Category ON Department.DepartmentID = Category.DepartmentID LEFT JOIN ProductCategory ON Category.CategoryID = ProductCategory.CategoryID LEFT JOIN Product ON ProductCategory.ProductID = Product.ProductID ORDER BY Department.Name, Product.Name;
The result of this query is as follows:
Department Product ---------------------- ----------------------- Books NULL Costume Accessories Beast Mask Costume Accessories Cleopatra Kit Costume Accessories Horned Devil Mask Costume Accessories Sequinned Devil Horns Full Costumes Cleopatra Kit Full Costumes Miss Santa Full Costumes Vampire Disguise Jokes and Tricks Bug Eyed Glasses Jokes and Tricks Devil Horn Boppers Jokes and Tricks Sequinned Devil Horns
You used left outer joins in this example to make sure all departments are returned. If you used right outer joins instead, you would have to make that all products were listed instead-—including the ones that don't belong to any department.
Remember that an outer join always takes all the rows in one side of the join (left or right) and tries to match with rows from the other side of the join. If no match is made, NULL is assumed instead.
This is different from the inner join (the default type of join), which is exclusive—it only returns rows that have matches in both sides of the join. The opposite is the full join (FULL OUTER JOIN), which includes all the rows in both sides of the join.
If you managed to find out how to get a list of products for each department, you're only one step away from counting how many products exist in each department. You do this using the GROUP BY clause and using the COUNT aggregate function:
SELECT Department.Name AS "Department", COUNT(Product.Name) AS "Products" FROM Product INNER JOIN ProductCategory ON Product.ProductID = ProductCategory.ProductID INNER JOIN Category ON ProductCategory.CategoryID = Category.CategoryID INNER JOIN Department ON Category.DepartmentID = Department.DepartmentID GROUP BY Department.Name ORDER BY Department.Name;
Recall that we discussed the GROUP BY clause and aggregate functions in Chapter 4, "Summarizing and Grouping Data."
Here's the results list:
Department Products -------------------- ----------- Costume Accessories 4 Full Costumes 3 Jokes and Tricks 3
If you wanted to count only those products that are on promotion for each department, you'd simply need to filter the products:
SELECT Department.Name AS "Department", COUNT(Product.Name) AS "Featured Products" FROM Product INNER JOIN ProductCategory ON Product.ProductID = ProductCategory.ProductID INNER JOIN Category ON ProductCategory.CategoryID = Category.CategoryID INNER JOIN Department ON Category.DepartmentID = Department.DepartmentID WHERE Product.OnDepartmentPromotion = 1 GROUP BY Department.Name ORDER BY Department.Name;
The results should look like this:
Department Featured Products -------------------- ----------------- Costume Accessories 1 Full Costumes 1 Jokes and Tricks 1
You can obtain the same results using correlated subqueries. There are two main ways you can rewrite the previous queries using correlated subqueries.
In the first example, you first group the products by department using GROUP BY and perform a COUNT for each group:
SELECT D.Name AS "Department", COUNT(P.Name) AS "Products" FROM Department D, Product P WHERE ProductID IN (SELECT ProductID FROM ProductCategory INNER JOIN Category ON ProductCategory.CategoryID = Category.CategoryID WHERE Category.DepartmentID = D.DepartmentID) GROUP BY D.Name ORDER BY D.Name;
In the second solution, you use a subquery to calculate the number of products for each department instead of using GROUP BY to group by departments. Note that the subquery is still a correlated subquery, and it executes once for each department selected by the outer query:
SELECT D.Name AS "Department", (SELECT COUNT(ProductID) FROM ProductCategory INNER JOIN Category ON ProductCategory.CategoryID = Category.CategoryID WHERE Category.DepartmentID = D.DepartmentID) AS "Products" FROM Department D;
Suppose you want to see the average price for all the products in each department. You can do this quickly now because it is similar to the query that counted the number of products in each department:
SELECT Department.Name AS "Department", AVG(Product.Price) AS "Average Price" FROM Product INNER JOIN ProductCategory ON Product.ProductID = ProductCategory.ProductID INNER JOIN Category ON ProductCategory.CategoryID = Category.CategoryID INNER JOIN Department ON Category.DepartmentID = Department.DepartmentID GROUP BY Department.Name;
After executing this query, you see that the Full Costumes department has the most expensive products:
Department Average Price -------------------- --------------------- Costume Accessories 7.6800 Full Costumes 24.9900 Jokes and Tricks 3.0000
You can complicate things a bit more and ask, for example, for the average price in each department but only for those departments having at least three products. Also, you can request the results listed in descending order of the average department price:
SELECT Department.Name AS "Department", COUNT(Product.Name) AS "Products", AVG(Product.Price) AS "Average Price" FROM Product INNER JOIN ProductCategory ON Product.ProductID = ProductCategory.ProductID INNER JOIN Category ON ProductCategory.CategoryID = Category.CategoryID INNER JOIN Department ON Category.DepartmentID = Department.DepartmentID GROUP BY Department.Name HAVING COUNT(Product.Price)>=3 ORDER BY AVG(Product.Price) DESC;
Because all departments have at least three products, you'll receive all of them back; this time, they're listed in descending order of their average price:
Department Products Average Price -------------------- ----------- --------------------- Full Costumes 3 24.9900 Costume Accessories 4 7.6800 Jokes and Tricks 3 3.0000