Querying data from multiple tables using subqueries is a great capability, but it doesn't always give you everything you need, and the syntax can become a bit messy and confusing. Also, as discussed in the previous chapter, when a subquery executes for each match of an outer query, performance can be affected. For this reason, SQL allows you to query multiple tables simultaneously with a simpler syntax. This capability doesn't completely replace some of the techniques you saw in the previous chapter involving subqueries across multiple tables, but it's extremely powerful when you simply want a result set that contains information from more than one table. Because relational databases are designed to hold data in more than one table, where records in one table are usually associated with records in other tables, this functionality is essential.
In this chapter, you'll look at the various ways that multitable queries can be performed using what's known in SQL parlance as joins. You'll start, as usual, with the basics before building up to more complex situations. The simpler material concerns situations where you get a result set consisting of matches between rows in different tables based on certain criteria, either by looking at equality between column values or by using some other comparison. The more complex techniques involve forcing unmatched rows from one or more tables to be included in the results you receive, even if there's no match with a row in another table.
Let's consider a simple example. Imagine that someone at the example university wants to know when rooms are occupied and when they're free. You could find out this information by querying the Class and Room tables from the sample InstantUniversity database:
SELECT Class.ClassID, Class.Time, Room.RoomID FROM Room, Class;
Unfortunately, this won't yield the results you're after. You might think that you'd simply get a set of rows that consists of all the columns for both tables, with each row containing data from a row in the first table and a row from the second. Well, you'd be right. However, what you may not imagine is quite how many rows you'd get (try it out for yourself and see what happens!).
Basically, the result of the join is the Cartesian product of the elements in each set. For example, the Cartesian product of the two sets {a, b, c} and {a, b} is the following set of pairs:
{(a, a), (a, b), (b, a), (b, b), (c, a), (c, b)}
What happens in this example is that each row in Room is combined with each row in Class to give a row in the result set. So, the first row of Room is combined with the first row of Class for the first row in the result set. Next, the first row of Room is combined with the second row of Class, then the third row, the fourth row, and so on. The number of elements in the Cartesian product is the product of the number of elements in each set. In the example, you have nine rooms and 10 classes, so you have 90 results. Taking this further, if you had 100 rooms in your university and 100 classes scheduled, you'd end up with 10,000 rows of results!
In generic terms, imagine the following pseudoquery:
SELECT Table1.Column1, Table2.Column2, Table2.Column3 FROM Table1, Table2
Consider the situation where the previous query is executed, with the column Table1.Column1 having the values T1C1V1, T1C1V2, and so on, with Table1.Column2 having T1C2V1, T1C2V2, and with the same naming scheme for Table2.Column1. The first few rows returned will be as follows:
Table1.Column1 Table1.Column2 Table2.Column1 --------------- --------------- -------------- T1C1V1 T1C2V1 T2C1V1 T1C1V1 T1C2V1 T2C1V2 T1C1V1 T1C2V1 T2C1V3 T1C1V1 T1C2V1 T2C1V4
and so on for the rest of the rows in Table2.Column1. Next, you move on to the second row in Table1:
Table1.Column1 Table1.Column2 Table2.Column1 --------------- --------------- -------------- T1C1V2 T1C2V2 T2C1V1 T1C1V2 T1C2V2 T2C1V2 T1C1V2 T1C2V2 T2C1V3 T1C1V2 T1C2V2 T2C1V4
As before, you'll get one result for each row in Table2.
This type of join is known as a cross join, or Cartesian product. In fact, in all the RDBMSs covered in this book, except DB2, you can write the previous as follows:
SELECT Table1.Column1, Table1.Column2, Table2.Column3 FROM Table1 CROSS JOIN Table2
This CROSS JOIN operator is the SQL-92 standard.
In some circumstances, you might want this to happen but not many. What you need to do is to specify which rows in Table1 should be joined with which rows in Table2. The way you tend to do this is to use table relationships. For example, you might stipulate that a row with a primary key value of x in Table1 is joined to a row with the same foreign key value in Table2.
That way you might end up with 100 rows if there was a one-to-one correlation, or perhaps even less, if for example you were only joining a filtered group of rows in Table1 to a filtered group of rows in Table2.
The most common way of associating rows from one table to another is via an equi-join. This is where you link rows based on an equality (hence "equi") between the values contained in a column of each row:
SELECT Table1.Column1, Table1.Column2, Table2.Column3 FROM Table1, Table2 WHERE Table1.Column1 = Table2.Column2
As with cross joins, you can also express this using a more explicit keyword form, which is the SQL-92 standard and works for all current versions of the RDBMSs covered in this book:
SELECT Table1.Column1, Table1.Column2, Table2.Column3FROM Table1 JOIN Table2 ON Table1.Column1 = Table2.Column2
Here, the difference is that you use the JOIN keyword rather than a comma and ON rather than WHERE. Because this is the standard form, you'll be using it in subsequent example code.
| Note | For this example syntax to work on all database platforms and produce the same results, JOIN may have to be replaced by INNER JOIN. You'll see this in action in the upcoming example. | 
Returning to the rooms and classes example, if you use the following SQL code, you'll see a much smaller result set:
SELECT Class.ClassID, Class.Time, Room.RoomID FROM Room JOIN Class ON Room.RoomID = Class.RoomID;
You now see just 10 rows, one for each class, indicating what time and in which room each class is held.
You should note that versions of Oracle up to 8.1.7 don't support the JOIN keyword. Instead, simply use the following syntax:
SELECT Room.RoomID, Class.Time, Class.ClassID FROM Room, Class WHERE Room.RoomID = Class.RoomID;
As another example, if you had a table called Customers with a CustomerID column and a table called CreditCards that also had a CustomerID column (going back to an earlier example where you had multiple credit cards per customer), you might write the following:
SELECT Customers.CustomerID,
        Customers.CustomerName,
        CreditCards.CardNumber
FROM Customers
   JOIN CreditCards
   ON Customers.CustomerID = CreditCards.CustomerID;
Assuming there's a one-to-many relationship between these tables (one customer can have multiple credit cards), you'd get one row returned for each credit card in CreditCards. Note that some of these rows would have identical customer information because this is how you've asked for the data.
|  | 
Say you want to report the marks that have been achieved for every exam taken by every student. The following query will do the trick. Execute it against the InstantUniversity database:
SELECT StudentExam.ExamID,
       StudentExam.Mark,
       Student.Name AS StudentName
FROM StudentExam
   JOIN Student
   ON StudentExam.StudentID = Student.StudentID
ORDER BY ExamID;
Note that to run this example with MySQL and Access, you need to use INNER JOIN instead of JOIN:
... FROM StudentExam INNER JOIN Student ...
(We'll explain what this extra clause means in just a moment.)
You should see the following results:
ExamID Mark StudentName ------- ----- ------------- 1 55 John Jones 1 26 Anna Wolff 1 71 Julia Picard 2 62 Anna Wolff ... ... ...
In this example, you've joined data from StudentExam directly to data from the Student table and presented it in a single result set. Rather than getting the StudentID value and having to then extract student names, joining the tables with the equi-join specification of ON StudentExam.StudentID = Student.StudentID gives you direct access to this information. This is much more efficient because you get the information you want without having to run a separate query or, as might be the case, several separate queries for each student.
|  | 
Again, instead of using the JOIN clause in this example, you could have constructed the same query using WHERE clauses as follows:
SELECT StudentExam.ExamID,
        StudentExam.Mark,
        Student.Name AS StudentName
FROM StudentExam, Student
WHERE StudentExam.StudentID = Student.StudentID
ORDER BY ExamID;
Another point to note is that this type of join, and in fact any other type of join, can include additional clauses as examined in other chapters. A WHERE clause, for example, is just as valid in a join as in any other query. The following modification to the previous example gives just those students with a mark of 80 or more:
SELECT StudentExam.ExamID,
        StudentExam.Mark,
        Student.Name AS StudentName
FROM StudentExam JOIN Student
ON StudentExam.StudentID = Student.StudentID
WHERE StudentExam.Mark >= 80
ORDER BY ExamID;
It's important to bear this in mind in later examples, where you won't concentrate on anything other than the joining of tables because there's a whole world of flexibility available.
When it comes to equi-joins, there's no need to stop at two tables. You can include as many tables as you like, taking care to match rows to one another to avoid obtaining thousands of rows. To do this, you simply add another JOIN and ON clause after the first one:
SELECT * FROM Table1 JOIN Table2 ON Table1.Column1 = Table2.Column2 JOIN Table3 ON Table1.Column3 = Table3.Column4
To go back to the customers and credit cards example, imagine that you have another table called Addresses, where each customer has one or more addresses stored. You could expand on the query to get columns from this new table with simple additions:
SELECT Customers.CustomerID,
        Customers.CustomerName,
        CreditCards.CardNumber,
        Addresses.Country
FROM Customers
   JOIN CreditCards
   ON Customers.CustomerID = CreditCards.CustomerID
   JOIN Addresses
   ON Customers.CustomerID = Addresses.CustomerID
Let's look at an example of this using the InstantUniversity database.
| Note | Access doesn't seem to support multitable equi-joins like the other database platforms do. Instead, you need to use a nested join to achieve the same results, as you'll see in a moment. | 
|  | 
The following example displays, for every exam taken by every student, what mark they achieved and the date on which the exam was given. This combines data from the Student and Exam tables via the StudentExam junction table (displaying data from both sides of the many-to-many relationship).
Execute the following statement (again, change JOIN to INNER JOIN for MySQL):
SELECT StudentExam.ExamID,
        StudentExam.Mark,
        Exam.SustainedOn,
        Student.Name AS StudentName
FROM StudentExam
   JOIN Student
   ON StudentExam.StudentID = Student.StudentID
   JOIN Exam
   ON StudentExam.ExamID = Exam.ExamID
ORDER BY StudentExam.ExamID;
This should produce the following results:
ExamID Mark SustainedOn StudentName ------- ----- ------------------------ ------------ 1 55 2003-03-12 00:00:00.000 John Jones 1 26 2003-03-12 00:00:00.000 Anna Wolff 1 71 2003-03-12 00:00:00.000 Julia Picard 2 62 2003-03-13 00:00:00.000 Anna Wolff ... ... ... ...
Here you've extended the previous example by getting the SustainedOn field from Exam and using the ExamID field to join the tables. All you had to do to get this information was to include an extra JOIN ON clause for the new table, and then you could extract the data using Exam.SustainedOn. Note that one more change was necessary: The ORDER BY clause changed from ExamID to StudentExam.ExamID, which was needed because now there are two ExamID columns, one in StudentExam and one in Exam. You have to be explicit when referencing columns in such a situation.
|  | 
So, let's see how you can achieve similar results in Microsoft Access, as well as the other platforms by altering how you nest joins.
|  | 
Access seems to prefer to start from one end of the relationship and use nested joins to display the same results that you saw previously, for example:
SELECT StudentExam.ExamID,
        StudentExam.Mark,
        Exam.SustainedOn,
        student.Name
FROM Exam
   INNER JOIN (student
      INNER JOIN StudentExam ON student.StudentID = StudentExam.StudentID)
   ON Exam.ExamID = StudentExam.ExamID
ORDER BY StudentExam.ExamID;
This syntax will work on all platforms. Again, you see the same results:
ExamID Mark SustainedOn StudentName ------- ----- ------------------------ ------------ 1 55 2003-03-12 00:00:00.000 John Jones 1 26 2003-03-12 00:00:00.000 Anna Wolff 1 71 2003-03-12 00:00:00.000 Julia Picard 2 62 2003-03-13 00:00:00.000 Anna Wolff ... ... ... ...
In this example, you first perform a join between the Student and StudentExam tables, then you join the resulting table to the Exam table to produce the results. Access requires that you use INNER JOIN instead of JOIN, and it also requires brackets around the first join.
|  | 
All equi-joins use the equality operator (=) in their WHERE clause. However, this is by no means essential. You can use any comparison operator you like, for example:
SELECT * FROM Table1, Table2 WHERE Table1.Column1 < Table2.Column2
This is likely to result in more results than an equi-join because there are likely to be more matches in the WHERE clause.
Let's look at an example:
|  | 
First, let's see which classes are being held in which rooms at a certain time, as you saw earlier.
Execute the following against the InstantUniversity database:
SELECT Room.RoomID, Class.Time, Class.ClassID FROM Room JOIN Class ON Room.RoomID = Class.RoomID;
You should see the following list:
RoomID Time ClassID ------- ---------------------------------- 6 Mon 09:00-11:00 1 5 Mon 11:00-12:00, Thu 09:00-11:00 2 3 Mon 14:00-16:00 3 2 Tue 10:00-12:00, Thu 14:00-15:00 4 9 Tue 14:00-16:00 5 2 Tue 16:00-17:00 Thu 15:00-17:00 6 3 Wed 09:00-11:00 7 8 Wed 11:00-13:00 Fri 09:00-11:00 8 5 Fri 11:00-13:00 9 9 Fri 14:00-16:00 10
Notice that room 2 is busy on Tuesday morning from 10 A.M. to 12 P.M., and room 3 is busy on Monday afternoon from 2 A.M. to 4 P.M.
Second, what if you wanted to have a list of when rooms are free? Alter the query, dropping the ClassID column, as follows:
SELECT Room.RoomID, Class.Time FROM Room JOIN Class ON Room.RoomID <> Class.RoomID;
You should now see the following:
RoomID Time ------- ---------------------------------- ... ... 2 Mon 11:00-12:00, Thu 09:00-11:00 2 Mon 14:00-16:00 2 Tue 14:00-16:00 2 Wed 09:00-11:00 2 Wed 11:00-13:00 Fri 09:00-11:00 2 Fri 11:00-13:00 2 Fri 14:00-16:00 3 Mon 09:00-11:00 3 Mon 11:00-12:00, Thu 09:00-11:00 3 Tue 10:00-12:00, Thu 14:00-15:00 3 Tue 14:00-16:00 3 Tue 16:00-17:00 Thu 15:00-17:00 ... ...
Notice that room 2 doesn't have an available 10 A.M. to 12 P.M. slot, and room 3 is also booked on Monday afternoons from 2 A.M. to 4 P.M. (these times don't appear in this list). This is a bit of a crude way to identify if a room is actually free (the only times that appear are times when there are classes). However, it does at least demonstrate, in this database, how using an inequality operator instead of an equality operator produces many more results because the search is broader in this case.
|  | 
The example queries you've seen so far have had one thing in common—they all tend to be rather long. However, the aliasing technique you've used previously in this book really comes into its own here. For example, you can rewrite the query used earlier:
SELECT StudentExam.ExamID,
        StudentExam.Mark,
        Student.Name AS StudentName
FROM StudentExam
   JOIN Student
   ON StudentExam.StudentID = Student.StudentID
ORDER BY ExamID
using aliases as so:
SELECT SE.ExamID,
        SE.Mark,
        S.Name AS StudentName
FROM StudentExam AS SE
   JOIN Student AS S
   ON SE.StudentID = S.StudentID
ORDER BY ExamID
The query operates in the same way, but you save a fair bit of space, making the important bits easier to read along the way.