Table of Contents
Previous Section Next Section

Using Subqueries

Any query embedded inside another in any way is termed a subquery. The subquery itself may include a subquery and so on, which is where you really start to see how the Structured part of SQL's name fits in! This is also the first time you start to see multitable queries creeping in because there's no reason why a subquery has to query the same table as any of its "parents" or "children." In the next chapter, you'll see how you can include multiple tables in a single query by using table joins; both techniques have their benefits, so we'll present them separately.

All subqueries can be divided into one of two categories: non-correlated or correlated. The true meaning of each of these terms will become clearer as you work through some examples, but basically the following is true:

It can be important to note this distinction in large databases because correlated queries typically take a lot longer to execute, and the time taken will increase dramatically as the volume of data increases. Let's look at some examples of how you can use subqueries. You'll see examples of each category along the way.

Subqueries as Calculated Columns

One simple form of a subquery is where the inner query returns a single result for each row returned by the outer query, usually as a result inserted into a calculated column:

SELECT ColumnA, (SubQuery) AS ColumnB FROM Table;

This type of subquery is useful when you have relationships between one table and another. For example, let's say you have a Customers table and a CreditCards table, and records in the CreditCards table are linked to individual customers via a foreign key. The following query enables you to see how many credit cards are on record for every customer in the database:

SELECT CustomerID, CustomerName, (
   SELECT COUNT(*) FROM CreditCards
   WHERE CreditCards.CustomerID = Customers.CustomerID)
   AS NumberOfCreditCards
FROM Customers

Here the subquery (highlighted in bold type) is one that wouldn't work on its own because it requires data (the CustomerID values) from the Customers table in the outer query but nestles into the outer query without problems. This is an example of a correlated subquery. The subquery executes once for each row acted on by the outer query. Let's look at a quick example showing a similar query in action.

SIMPLE SUBQUERIES
Start example

You're going to construct a query that will allow you to find out the number of exams taken by each student. Enter and execute the following against the InstantUniversity database:

SELECT StudentID, Name,
   (SELECT COUNT(*) FROM StudentExam
   WHERE StudentExam.StudentID = Student.StudentID)
   AS ExamsTaken
FROM Student
ORDER BY ExamsTaken DESC;

This query works in the same way as the one detailed previously, counting records in one table (StudentExam) with an aggregate function based on a value taken from a row in another table (Student). Note that you need to qualify names used in the subquery so you're unambiguous when referring to records in a different table.

Here the data used in the subquery is the StudentID column in the Student table, referred to using full syntax as Student.StudentID. This is compared with the StudentID column in the StudentExam table, referred to as StudentExam.StudentID, such that the number of records in the StudentExam table having the same StudentID value as that in the current record in the Student table is counted (using COUNT(*)).

The same qualification of names is necessary when the subquery works with the same table as the outer query, but here it's essential to use aliases for clarity.

The output from this query is as follows:

   StudentID Name             ExamsTaken

   --------- ---------------- ----------
   1         John Jones       2
   2         Gary Burton      3
   3         Emily Scarlett   2
   4         Bruce Lee        3
   ...       ...              ...

As always, you can use aliases for the columns and tables in the query. The advantages of table aliases become obvious when you're querying multiple tables. If there are relationships between the tables, then it's quite likely that there will be column names that occur in more than one table. In this case, you need to prefix the table name to the column name to make it clear about which column you're talking. Because this can involve a lot of typing, it's useful to be able to replace the table name with an abridged name (usually with one or two letters).

End example
USING ALIASES WITH SIMPLE SUBQUERIES
Start example

Execute the following query against the InstantUniversity database:

SELECT e1.StudentID, e1.ClassID, (
   SELECT COUNT(*) FROM Enrollment e2
   WHERE e1.ClassID = e2.ClassID)—1
   AS OtherStudentsInClass
FROM Enrollment e1
WHERE StudentID = 6;

Here you use aliases, even though you're only querying one table. Each time the outer query returns a row from Enrollment in this example, the inner query looks for records that have an identical entry in the ClassID column. In order to compare the ClassID values from the inner and outer queries, you must distinguish between them. Because the table name is the same, the only way to do this is using aliases, which is why e1 and e2 are used in the example.

Oracle 9i doesn't support the use of the AS keyword in this type of alias, even though it supports it elsewhere. This query will run on SQL Server, DB2, and Access with or without AS.

Note that the subquery also counts the row that's currently being examined by the outer query, so if you want to get the number of other students enrolled in the same class, you must subtract one from the value returned.

There are three rows returned from this query:

   StudentID   ClassID     OtherStudentsInClass

   ----------- ----------- --------------------
   6           3           2
   6           6           3
   6           10          1
End example

Subqueries in the WHERE Clause

There are a number of ways to use subqueries in your SELECT statements. Another option is to use them in a WHERE clause. In its simplest form, you have a query embedded in the WHERE clause of another query:

SELECT ColumnA FROM TableA
WHERE ColumnB = (SubQuery);

One extremely helpful way to use this is to utilize data from one record in the table as part of the search criteria to find other records, all in a single query. For example, the following query will return all exams sustained on the same date as, or before, the exam with an ExamID of 5:

SELECT ExamID, SustainedOn FROM Exam
WHERE SustainedOn <= (
   SELECT SustainedOn FROM Exam WHERE ExamID = 5)
ORDER BY SustainedOn DESC;

Note that you don't need semicolons at the end of subqueries because these are in fact part of the outer statement.

Here the inner query obtains a single value that's the SustainedOn value of the exam with an ExamID of 5:

SELECT SustainedOn FROM Exam WHERE ExamID = 5

This value is used by the outer query, which finds all exams with an equal or earlier SustainedOn value. The results are ordered according to a descending value for SustainedOn, so you can expect the exam with an ExamID of 5 to be at the top of the results, with subsequent records being earlier exams and the earliest appearing last in the list.

Note that MySQL doesn't support subqueries in this way. Instead, you need to use two queries (getting the date when that exam was taken and then passing this date into a second query). For correlated subqueries, you can generally use the JOIN syntax presented in the next chapter.

In this example, both queries work independently on the same data. The subquery does not rely on data from the outer query in order to execute (an easy way to tell this is that the subquery will execute perfectly well as a stand-alone query). This is an example of a non-correlated subquery. The subquery executes once in total and simply passes the resulting value to the outer query.

In the previous example, you worked on a single table, but there's nothing stopping you from using multiple tables. Say, for example, you know that an exam was sustained on March 10, but you're not sure for which course and you need to know. The following query would do the trick:

SELECT Name FROM Course
WHERE CourseID =
(
SELECT CourseID from EXAM
WHERE SustainedOn='10-MAR-03'
);

In the inner query, you find out the CourseID for the exam sustained on March 10. That value is passed to the outer query, which then returns the name of the course. The output should look like this:

NAME
----------------
Core Mathematics

However, the previous query is only suitable if you know for a fact that the query will return only a single row. If, in fact, it returns multiple rows, then the query will fail:

SELECT Name FROM Course
WHERE CourseID =
(
SELECT CourseID from EXAM
WHERE SustainedOn='10-MAR-03'
);
select courseid from exam
*
ERROR at line 4:
ORA-01427: single-row subquery returns more than one row

Fortunately, it's quite easy to get around this problem.

Subqueries that Return Multiple Results

When you have a subquery that returns multiple rows, you simply use the IN keyword to check for set membership. For example:

SELECT ColumnA, ColumnB FROM Table
WHERE ColumnC IN (SELECT ColumnD FROM Table2);

Thus, you can easily rewrite the previous query so that it works for both single row and multiple row cases:

SELECT Name FROM Course
WHERE CourseID IN
(
SELECT CourseID from EXAM
WHERE SustainedOn='26-MAR-03'
);

Let's see a full example that uses this syntax; to add a bit of excitement, you'll use three nested subqueries.

USING SET MEMBERSHIP WITH SUBQUERIES
Start example

Say you want to obtain a list of the students who are taught by Professor Williams. Let's build the query up in stages. First, you find out the ID of Professor Williams:

SELECT ProfessorID FROM Professor
WHERE Name LIKE '%Williams%';

You use this value (in this case, the ProfessorID is 2) to find out the classes that Professor Williams teaches:

SELECT ClassID FROM Class WHERE ProfessorID IN
  (SELECT ProfessorID FROM Professor
  WHERE Name LIKE '%Williams%');

This query gets the ClassID values from the Class table that match your criteria. The set of values returned is passed to the next query:

SELECT StudentID FROM Enrollment WHERE ClassID IN
  (SELECT ClassID FROM Class WHERE ProfessorID IN
    (SELECT ProfessorID FROM Professor
    WHERE Name LIKE '%Williams%'));

This query uses that set of ClassID values to obtain from the Enrollment table the IDs of the students who take these classes. Finally, these IDs are passed to the outermost query:

SELECT StudentID, Name FROM Student WHERE StudentID IN
  (SELECT StudentID FROM Enrollment WHERE ClassID IN
        (SELECT ClassID FROM Class WHERE ProfessorID IN
          (SELECT ProfessorID FROM Professor
          WHERE Name LIKE '%Williams%')));

If you execute the whole query against InstantUniversity, you should obtain the following results:

   StudentID   Name

   ----------- -----------------
   2           Gary Burton
   4           Bruce Lee
   6           Vic Andrews
   8           Julia Picard
   10          Maria Fernandez

This may seem quite a complicated way to go about things, but it works. In fact, you can achieve the same results using simpler syntax with a multiple table JOIN query. You'll be looking at this subject in the next chapter.

End example

Again, the previous example illustrates the use of non-correlated subqueries. No component query relies on its outer query in order to execute. Each query that makes up your subquery executes only once, and each in turn simply passes a value, or a set of values, to an outer query.

Before moving on, it's worth noting that the use of subqueries isn't restricted to the SELECT statement. You can include subqueries in any other SQL statement, such as an INSERT or DELETE statement, if appropriate.

Using Operators with Subqueries

You saw previously how you can use subqueries in conjunction with the IN operator to find rows where a field value belongs to a certain set of values. However, this isn't the only operator you can use with subqueries. In fact, there are four more: EXISTS, ALL, ANY, and SOME (although SOME is merely a synonym for ANY).

Using the EXISTS Operator

The EXISTS operator allows you to find rows that match a particular criterion. EXISTS is always followed by a subquery and evaluates to true if the subquery returns any rows at all.

USING THE EXISTS OPERATOR
Start example

This query returns the names and IDs of all the students who scored less than 40 in any one of their exams:

SELECT StudentID, Name FROM Student s
WHERE EXISTS (
   SELECT StudentID FROM StudentExam e
   WHERE Mark < 40 AND e.StudentID = s.StudentID);

The basic subquery here returns the IDs of all the students who scored less than 40 in an exam:

SELECT StudentID FROM StudentExam e
WHERE Mark < 40

However, this is a correlated subquery, so you want this query to run once for every row in the outer query. You do this by using table aliases and matching the value of the StudentID in the Student and StudentExam tables:

AND e.StudentID = s.StudentID

This query will execute once for every row in the Student table, so if a particular student scored less than 40 in one or more exam, the EXISTS condition for this query will be true, and the row for that student will be returned for the outer query. In total, you've got three of these underperforming students:

   StudentID   Name

   ----------- ------------
   2           Gary Burton
   5           Anna Wolff
   8           Julia Picard
End example

Using the ALL Operator

You're now quite well equipped if you want to find rows where the value in a particular column matches the values in a certain set. But what if you want to find rows where a column value is greater than or less than any or all of the values in a particular set? This is where the remaining operators come in—ALL and ANY.

The ALL operator is used with a subquery and a comparison operator such as =, >, or >= (and so on) and evaluates to true if the value being checked is greater than (or whatever) all the rows returned by the subquery. For example:

SELECT ColumnA FROM TableA
WHERE ColumnA > ALL (SELECT ColumnB FROM TableB);

This query will return all the rows from TableA where the value in ColumnA is greater than every single value in ColumnB of TableB.

USING THE ALL OPERATOR
Start example

Enter and execute the following query:

SELECT StudentID, Grade FROM Enrollment e
WHERE Grade > ALL (
   SELECT Mark FROM StudentExam s
   WHERE s.StudentID = e.StudentID);

Here you're looking for discrepancies between the overall performance of students and their marks in particular exams. In particular, you're looking to find the students whose overall grade for any class is greater than the top mark they received in all their exams.

Again, you use a correlated subquery with table aliases to achieve this. The subquery returns all the exam marks for each student in the outer query:

SELECT Mark FROM StudentExam s
WHERE s.StudentID = e.StudentID

You use this subquery with the ALL operator to find any rows in the Enrollment table where the Grade is greater than the highest exam mark received by the same student. This finds just one student in the sample data:

   StudentID   Grade

   ----------- -----------
   2           68
End example

Using the ANY Operator

ANY works in the same way as ALL but evaluates to true if the condition is true of any single value returned by the subquery. For example, the query:

SELECT ColumnA FROM TableA
WHERE ColumnA > ANY (SELECT ColumnB FROM TableB);

will return all the rows from TableA where the value in ColumnA is greater than any one of the individual values in ColumnB of TableB.

USING THE ANY OPERATOR
Start example

Type in this query and execute it against the InstantUniversity database:

SELECT StudentID, Grade FROM Enrollment e
WHERE Grade < ANY (
   SELECT Mark/2 FROM StudentExam s
   WHERE s.StudentID = e.StudentID);

Again, you're looking for discrepancies between students' exam marks and their overall grades. In this example, you retrieve the ID and grade of any student where that grade is less than half of the mark they got for any one exam.

The example works in a similar way to the previous example: The subquery returns the set of marks that each student in the outer query scored in their exams, divided by two:

SELECT Mark/2 FROM StudentExam s
WHERE s.StudentID = e.StudentID

You use the ANY operator to find out whether a student's grade is less than any single value in this set and, if so, return the student's ID and the grade:

   StudentID   Grade

   ----------- -----------
   5           33
End example

Table of Contents
Previous Section Next Section