So far, all the queries you've seen have returned data from all the rows in a table. However, there are times when this isn't desirable. Often you'll want to extract only a subset of the rows in a table, or even a single row, based on specific criteria. SQL includes a whole host of tools to enable this kind of query, and in the following sections you'll examine them.
First, you'll look at how you can filter out any duplicate rows in your result set. Next, you'll look at how to extract certain rows based on simple conditions and how to combine these conditions for a powerful search facility. You'll move on to see how more advanced conditions and comparisons can give you even greater control over the rows returned, even in situations where you need to be vague in the conditions you set (that is, finding records based on small hints). Finally, you'll examine a topic that will come up time and time again: dealing with NULL data (where no value has been entered for that field).
You'll often encounter situations where the same data value occurs multiple times in the same column, whether by design or accident. Sometimes, you may want to ensure that you retrieve only unique rows of data from a query prior to processing. For example, from a Customers table during a purchase, you wouldn't want to charge a customer twice for a single order!
To filter out duplicate rows of data, you use the DISTINCT keyword. Let's start with the simplest case, where you're returning rows from a single column. If you only want to see unique row values for this column, the query would look as follows:
SELECT DISTINCT ColumnA FROM Table;
This can often be very useful. For example, single columns in a database may have a restricted set of values allowed. To find out all the different values in a column, you don't really want to have to read through every row in the table just to find the values that exist. To demonstrate this clearly, say that you wanted to query the Exam table in the InstantUniversity database to get a list of all the dates on which exams have been taken. You might try the following:
SELECT SustainedOn from Exam;
The result would look as follows:
SustainedOn --------- 3/12/2003 3/13/2003 3/11/2003 3/18/2003 3/19/2003 3/25/2003 3/26/2003 3/10/2003 3/21/2003 3/26/2003 3/24/2003
This is actually quite a simple case because there are only 11 rows and only one duplicate value, 3/26/2003, but in any event it's much more convenient to issue the following query:
The results are as follows:
SustainedOn ------------ 3/10/2003 3/11/2003 3/12/2003 3/13/2003 3/18/2003 3/19/2003 3/21/2003 3/24/2003 3/25/2003 3/26/2003
This time the duplicate value has been removed and you have a simple list of unique exam dates—and as an added bonus you'll see that the dates have been automatically sorted into ascending order.
Let's move on to the case where you want to retrieve multiple columns. When you do this, you can guarantee that every row you obtain contains unique data for the specified set of columns with the following query:
SELECT DISTINCT ColumnA, ColumnB FROM Table;
For example, the following query will return only unique combinations of customer names and credit card details:
SELECT DISTINCT CustomerName, CreditCard FROM Customers;
Note that this doesn't mean you won't have duplicates in either column, only that each combination is unique. You could, for example, have several customers with the same name but with different credit card numbers. To illustrate this point, suppose you changed the query on InstantUniversity as follows:
You would obtain the full 11 rows from the database. This is because there are 11 different combinations of these two fields, even though there's a duplicate in the SustainedOn column. Always remember that DISTINCT acts on the combination of columns you use in the query.
The WHERE clause allows you to filter queries by demanding that the rows in your result set satisfy a particular condition:
SELECT ColumnA, ColumnB FROM Table WHERE Condition;
Condition is very flexible, allowing you to test for equalities and inequalities in column data, ranges of values to look for, and much more. You achieve all this using a simple syntax that includes various operators and keywords that, when combined, allow you to search for pretty much anything.
Before you get to the more complex usages of this clause, though, you'll look at the simplest: equalities. These are particularly useful because they allow you to constrain the rows you obtain from queries in the tightest way. The syntax is as follows:
SELECT ColumnA, ColumnB FROM Table WHERE ColumnC = Value;
The WHERE clause lets you apply a search condition to the rows of data gathered by your SELECT query. The search condition can take various forms, all of which can be evaluated to a Boolean value. In this case, you're only interested in rows for which the condition ColumnC = Value evaluates to TRUE; in other words, you're interested in those rows in which ColumnC has a value of Value. If this condition is TRUE, then the row is returned. If it evaluates to FALSE or UNKNOWN (because the presence of a NULL value), then it's ignored.
Note that there's no need for a WHERE clause of this type to use a column that's being returned by the query (here only ColumnA and ColumnB are returned for rows). However, in practice, because you know what's contained in ColumnC, there's no need to extract this data from the database.
For example, you might extract a single row from a Customers table based on the ID of the customer (assuming that customer IDs are unique):
SELECT FirstName, LastName, Address FROM Customers WHERE CustomerID = 7;
Alternatively, you could extract all the customers with a certain last name:
SELECT FirstName, LastName, Address FROM Customers WHERE LastName = 'Smith';
Of course, in many countries, searching for people called Smith in this way could land you with an awful lot of results!
You don't have to use literals in these comparisons. You could, for example, use equalities between columns:
SELECT FirstName, LastName, Address FROM Customers WHERE LastName = FirstName;
Here all customers with identical first and last names will be found. For example, Major Major would be there.
Suppose you wanted to query the InstantUniversity database to find the name of the student whose ID is 3:
SELECT Name FROM Student WHERE StudentID = 3;
This will yield you the following result:
Name ---------------- Emily Scarlett
This sort of query is extremely useful when you know the ID of a record and want to get other pertinent data about that record. Rather than extract a lot of potentially useless data in a single query, you can use this technique to home in on the data you actually want.
As mentioned earlier, equalities are just one of the many things you can test for in a WHERE clause. You can use several other operators, which vary by RDBMS, as described in Table 2-1.
Operator |
RDBMSs |
Meaning |
---|---|---|
= |
All |
Used to check if values are equal to one another |
!= |
All except Access |
Used to check if values aren't equal to one another |
^= |
Oracle |
Same meaning as != |
<> |
All |
Same meaning as != |
> |
All |
True if the first operand is greater than the second operand |
>= |
All |
Used to check if the first operand is greater than or equal to the second |
!< |
SQL Server, DB2 |
Same as >= (Means "not less than," which is logically equivalent to "greater than or equal to") |
< |
All |
True if the first operand is less than the second operand |
<= |
All |
Used to check if the first operand is less than or equal to the second |
!> |
SQL Server, DB2 |
Same as <= |
As you can see, there's some redundancy here, with the possibility of a choice of operator to use in some situations. The main reason for this is so you can use the operator that fits in best with your development experience—you're likely to have similar operators available in whatever programming language you're used to using.
![]() |
An example SQL query that you could run on the InstantUniversity database is as follows:
SELECT ExamID, SustainedOn, Comments FROM Exam WHERE SustainedOn > '2003-03-20';
In this example, you extract data from the Exam table, only getting data for exams that occur after March 20, 2003. The results from this query are as follows:
ExamID SustainedOn Comments ------ ----------- ----------------------------- 6 3/25/2003 7 3/26/2003 They'll enjoy this one 9 3/21/2003 Two hours long. They might find it tricky 10 3/26/2003 2hr test 11 3/24/2003 Part two should be an hour and a half long
It's worth noting here that it's the RDBMS that's responsible for interpreting a date or time format string, and the specific string you need to use will depend on factors such as what the RDBMS can interpret and your operating system's locale settings. However, the format YYYY-MM-DD as used in this query works on all platforms. See Chapter 5, "Performing Calculations and Using Functions," for more details on working with date values.
![]() |
Access expects dates to be enclosed, Visual Basic–style, within pound characters (#), so this query won't work with Access. Instead, you need to use the following:
Often a single WHERE clause isn't enough to narrow down the data you require. You might want to get only those rows with specified data in one column and other specified data in another column. Alternatively, you might require rows that have certain data in one column or certain data in another column. Here you need to combine multiple WHERE clauses as follows:
SELECT ColumnA, ColumnB FROM Table WHERE Condition LINK Condition;
Note that LINK in the previous query isn't a keyword. Instead, it refers to one of the keywords shown in Table 2-2.
Link Keyword |
Meaning |
---|---|
AND |
Both conditions must be met for the records found. |
OR |
Either condition must be met for the records found. |
For example, suppose you wanted to specify particular conditions on the Occupation and FiscalStatus fields when querying the Customers table. You could ask that both of your conditions be met as follows:
SELECT CustomerID, CustomerName FROM Customers WHERE Occupation = 'Carpenter' AND FiscalStatus = 'Loaded';
Alternatively, you could select rows where either one of the two conditions is met or both are met:
SELECT CustomerID, CustomerName FROM Customers WHERE Occupation = 'Carpenter' OR FiscalStatus = 'Loaded';
Additionally, the NOT keyword can be applied to a condition, making the condition evaluate to true when it isn't met rather than when it is:
SELECT CustomerID, CustomerName FROM Customers WHERE Occupation = 'Carpenter' AND NOT FiscalStatus = 'Bankrupt';
The following SQL statement queries the InstantUniversity database to select only those records from the StudentExam table where the ExamID field is equal to 1 and the IfPassed field is also 1:
SELECT StudentID, Mark, Comments FROM StudentExam WHERE ExamID = 1 AND IfPassed = 1;
Here are the results:
StudentID Mark Comments ----------- ----------- ------------------------ 1 55 Satisfactory 8 71 Excellent, great work!
It's possible to combine two comparison operators to select values that fall within a within a specified range. For example:
SELECT ExamID, SustainedOn, Comments FROM Exam WHERE SustainedOn >= '2003-03-20' AND SustainedOn <= '2003-03-24';
However, it's also possible to use another of the SQL keywords, BETWEEN, to make such range tests more readable:
SELECT ColumnA, ColumnB FROM Table WHERE ColumnC BETWEEN LowerLimit AND UpperLimit;
Here the value of ColumnC is checked to see if it's greater than or equal to LowerLimit and less than or equal to UpperLimit. For example:
SELECT ExamID, SustainedOn, Comments FROM Exam WHERE SustainedOn BETWEEN '2003-03-20' AND '2003-03-24';
You can also use NOT with BETWEEN to exclude a range:
SELECT ExamID, SustainedOn, Comments FROM Exam WHERE SustainedOn NOT BETWEEN '2003-03-20' AND '2003-03-24';
As well as checking to see if values fall inside (or outside) set ranges of values, it's also possible to check to see if values are one of a set number of possibilities.
Again, you can achieve this using operators and keywords that you've already met:
SELECT ColumnA, ColumnB FROM Table WHERE ColumnC = Value1 OR ColumnC = Value2 OR ColumnC = Value3;
However, you can achieve this much more elegantly using the IN keyword in your WHERE clause, along with a set of possible values:
SELECT ColumnA, ColumnB FROM Table WHERE ColumnC IN (Set);
Here Set (enclosed in parentheses) is the set of values you're looking for in ColumnC. You can define sets in a number of ways. The simplest is probably to provide a set of literal values in the query:
SELECT ColumnA, ColumnB FROM Table WHERE ColumnC IN ('Value1', 'Value2', 'Value3');
Alternatively, you can create your set by including by a completely separate SELECT query:
SELECT ColumnA, ColumnB FROM Table WHERE ColumnC IN (SELECT ColumnD FROM Table2);
Note |
This is an example of a subquery, a topic you'll look at in more detail in Chapter 6, "Combining SQL Queries." |
As with the other keywords you've seen in this chapter, IN can be combined with NOT to search for values that don't occur in a set:
SELECT ColumnA, ColumnB FROM Table WHERE ColumnC NOT IN ('Value1', 'Value2', 'Value3');
![]() |
As a more concrete example, you can try selecting information from the StudentExam table in the InstantUniversity database where the StudentID is one of three values:
SELECT StudentID, ExamID, Mark FROM StudentExam WHERE StudentID IN (2, 7, 3);
This query produces the following results:
StudentID ExamID Mark ----------- ----------- ----------- 2 3 44 2 5 39 2 6 63 3 4 78 3 7 82 7 6 84 7 8 62 7 11 68
![]() |
One of the most powerful ways of filtering rows is to search for text values in column data using patterns. You can, for example, look for rows with text columns that contain a certain word or for more complex patterns of characters. To do this, you use the LIKE keyword (you can also specify NOT LIKE) and supply a pattern to match:
Pattern is made up of literal characters plus certain wildcards that represent one or more characters. The wildcards shown in Table 2-3 are supported by all the database systems covered in this book.
Wildcard |
Meaning |
---|---|
% |
Any string of characters, including strings of zero length |
_ |
Any one character |
Caution |
Patterns are case sensitive in Oracle and DB2 but not in SQL Server, MySQL, or Access. |
For example, to search for a string value that contains the text 'manager', you could use the following:
SELECT CustomerID, ContactName, ContactTitle FROM Customers WHERE ContactTitle LIKE '%manager%';
You need to be aware of several issues when using patterns. Perhaps most important, you need to be very specific when you use them. Simply including one space too many could break the query, for example:
SELECT CustomerID, ContactName, ContactTitle FROM Customers WHERE ContactTitle LIKE '%manager %';
This pattern searches for the string 'manager '—including a trailing space. If any columns contain the string 'manager' at the end of their value, there will be no trailing space, and the record won't be matched.
As another example, this query looks for four characters followed by a space at the start of the Name column in the Student table:
SELECT Name FROM Student WHERE Name LIKE '____ %';
This returns all the students whose first names are four characters long:
Name ----------- John Jones Gary Burton Anna Wolff
Another issue to be aware of is that wildcards are reserved characters and must be escaped if you want to include them in a pattern without their wildcard meaning. To do this, you need to include more information when these characters occur in the data you're searching. Let's say you want to look for percentages. The following search isn't good enough:
SELECT CustomerName, CustomerRating FROM Customers WHERE CustomerRating LIKE '%50%%';
Here the % character is interpreted as a wildcard as usual, and the match will actually be for the string '50' rather than '50%'. To get round this, you can use an escape character to force the first % of the %% part of the pattern to be interpreted as the character % rather than a string:
SELECT CustomerName, CustomerRating FROM Customers WHERE CustomerRating LIKE '%50<Escape Char>%%';
There's no escape character defined in SQL, so instead you need to specify your own using the ESCAPE keyword. You can choose any character you like for this, but you should try to choose one that doesn't occur elsewhere in the string, for example:
SELECT CustomerName, CustomerRating FROM Customers WHERE CustomerRating LIKE '%50p%%' ESCAPE 'p';
Here p is used as an escape character and causes the first % character to be interpreted as a character, not as an escape code.
Access doesn't support this syntax, but you can escape the wildcard character by enclosing it in square brackets:
SELECT CustomerName, CustomerRating FROM Customers WHERE CustomerRating LIKE '%50[%]%';
SQL Server also supports this syntax.
![]() |
Let's look at an example from the InstantUniversity database. Suppose you wanted to search for exam results in the StudentExam table that had been described as "great." You could try something like this:
SELECT StudentID, ExamID, Mark, Comments FROM StudentExam WHERE Comments LIKE '%great%';
This works for SQL Server, MySQL, and Access because pattern matching is case insensitive in these systems. However, it will find 'great' but not 'Great' in DB2 and Oracle. To get around this, you need to convert the column to upper case before applying the match. You can do this in both DB2 and Oracle using the UPPER() function:
SELECT StudentID, ExamID, Mark, Comments FROM StudentExam WHERE UPPER(Comments) LIKE '%GREAT%';
This query produces the following results:
StudentID ExamID Mark Comments ----------- ----------- ----------- ------------------------ 3 7 82 Great result! 8 1 71 Excellent, great work!
However, this will also match records with a comment such as 'So far from great I'm expelling this student.' Be careful when using patterns because it's easy to forget things such as this, and sometimes the logical process you use to create a pattern will have unexpected results!
![]() |
SQL Server, Access, and MySQL allow a more complex form of pattern matching based more or less closely on regular expressions. In the case of SQL Server and Access, this consists merely of extensions to the standard LIKE syntax, but MySQL uses a completely different keyword (REGEXP or RLIKE) for regular expression pattern matching.
As well as the wildcards % and _, SQL Server and Access patterns can contain characters enclosed in square brackets. A match will be made if the entire pattern matches using any one of these characters. For example, '%[abcde]%' will match any expression that contains any of the letters A, B, C, D, or E anywhere (matches aren't case sensitive).
You can also use a hyphen to indicate a range of values, so you could also write the previous expression as '%[a-e%]'. You can therefore use [a-z] to represent any alphabetic character, [0-9] to match any numeric character, and [a-z0-9] to match any alphanumeric character.
In addition, SQL Server allows you to prefix the sign ^ to a character or range of characters in order to match only characters that aren't in that range; for example, [^a-z0-9] will match nonalphanumeric characters.
![]() |
Execute the following query against the InstantUniversity database:
SELECT Name FROM Student WHERE Name LIKE '% [a-dw-z]%';
This query looks for all student names that contain a space, followed by a character in the range A–D or W–Z. The effect (because this sample data contains only students with two names) is to return a list of all the students whose second names start with one of the first four or last four letters of the alphabet:
Name -------------- Gary Burton Anna Wolff Vic Andrews Steve Alaska Mohammed Akbar
![]() |
The basic syntax for MySQL's regular expression matching is similar but provides more options. You can use square brackets to match one of a set of characters or a specific range of characters, as in SQL Server and Access, but there are some important differences. First, MySQL regular expression patterns are case sensitive, so to match all alphabetic characters, you need to use [a-zA-Z] instead of just [a-z]. Second, matches will be found anywhere in the string, regardless of whether anything comes before or after (unless you explicitly say otherwise), so you don't need (and in fact can't use) the % wildcard.
There are also some more character-matching patterns available, as shown in Table 2-4.
Character |
Matches |
---|---|
^ |
The start of the string. For example, '^B' matches the first but not the second B in 'BOB'. |
$ |
The end of the string. For example, 'B$' matches the second but not the first B in 'BOB'. |
. |
Any single character. |
* |
The previous character repeated zero or more times. For example, 'Ke*l' matches 'Kl', 'Kel', 'Keel', and so on. |
{n} |
The previous character repeated n times. For example, 'b{3}' matches 'bbb' but not 'bb'. |
To use MySQL's regular expression matching, you use the REGEXP (or RLIKE) keyword instead of LIKE:
SELECT Name FROM Student WHERE Name REGEXP '^[Aa]';
This is equivalent to the following:
SELECT Name FROM Student WHERE Name LIKE 'A%';
![]() |
Execute the following query against the InstantUniversity database:
SELECT Name FROM Student WHERE Name REGEXP '^[AM].*r$';
This query matches all student names that begin with either 'A' or 'M', followed by any number of characters and ending with the character 'r'. Two names in the sample data match these criteria:
Name -------------- Andrew Foster Mohammed Akbar
![]() |