Using Functions in SQL

As you saw in the past chapter, functions can be extremely useful in processing data stored in a database and can save you a lot of work when you deal with results returned by SQL queries. So far, most of the functions you've seen have been aggregate functions, but there are a lot more than that available to you.

The general syntax for calling a function is simply as follows:

FUNC(Parameters);

Here FUNC is the name of the function, and Parameters are the parameters of the function, which may be column names, literals, expressions, or even other functions. Where more than one parameter is required, they're separated by commas:

FUNC(Parameter1, Parameter2, ...);

Functions can be used pretty much anywhere in a SQL statement because they can evaluate to numeric values, strings, Boolean values, and so on.

Unfortunately, functions tend to be very RDBMS specific, where each of the major RDBMSs has its own suite of functions—far too many to list here. However, there's some overlap, and you can examine some of the more commonly used functions without too much worry concerning compatibility issues.

One area where many functions are shared is with mathematical functions. Whatever RDBMS you're using, you're likely to have access to functions that do the following:

Return the absolute value of a value, typically called ABS or similar
Perform trigonometric calculations, such as SIN, COS, and so on
Calculate logarithms, such as LOG10 or LOG, to use base e
Calculate square roots, SQRT
Raise values to a given power, POWER
Generate random numbers, RAND

RDBMSs usually include functions for dealing with date and time values, working with strings (obtaining substrings, replacing characters, reversing character order, and so on), and much more. Typically, access to all the functionality of a database is achieved via functions, including security functionality, general database statistics, configuration, and most other things you might imagine.

As an example, consider the situation where you want to obtain the square root of the numbers in a column, which means using a function such as SQRT, as noted previously. In SQL Server, Oracle, DB2, and MySQL, you could use simply the following:

SELECT SQRT(ColumnName) AS RootOfColumnName FROM TableName;

However, Access doesn't have a SQRT function, so you have to use SQR instead:

SELECT SQR(ColumnName) AS RootOfColumnName FROM TableName;

The result is the same, but the name of the function is different. Sadly, this is typical and can mean checking the function list of the RDBMS you're using to find the exact syntax you need to use.

Because there are so many functions available, and because they're so database-specific, there's really no point in attempting to cover everything here. Instead you'll look at some of the main areas where you need functions—manipulating numbers, strings, and dates and times and converting between different data types—and you'll deal with other individual functions as you come across them in the remainder of the book. Later, at the end of the chapter, you'll also look at the CREATE FUNCTION statement that lets you define your own functions.

Working with Numbers

Previously, you saw integer values being used simply by inserting the numeric value, without any ' characters or other delimiting character:

This holds for all numeric types, but there are also several other elements of notation you can use. First, you can include decimal points for floating-point numbers:

3.142

You can also include scientific notation to include the exponent of a number, for example:

7.34E3

You can use the - symbol to specify either a negative number or a negative exponent (or both):

-2.43E-15

You can also use the + symbol in a similar way although this isn't really necessary because numbers will be interpreted as positive by default.

SQL Server and Access have a specific money data type, which allows for constants including a currency symbol, with all the other options described previously:

-$300.50

Other RDBMSs include additional, more exotic string representations of numeric values, such as preceding a string literal with X in DB2 and MySQL to indicate a hexadecimal value.

Working with Hexadecimal Numbers

The following is a DB2/MySQL hexadecimal number:

X'5A0F'

SQL Server, however, uses the prefix 0x for hexadecimal values, which aren't enclosed in quotes (this is also supported by the latest versions of MySQL):

0x5A0F

Note

By default, MySQL will treat hexadecimal values as ASCII character codes and return the corresponding character(s), but you can ensure that they're treated as numbers by adding zero to them. For example, X'4A' returns 'J', but X'4A' + 0 returns 74.

DB2, MySQL, and Access have a HEX() function that allows you to convert a numeric value into a hexadecimal string. For example, HEX(74) returns '4A'. You can achieve the same thing with Oracle using the TO_CHAR() function. You'll look at this function in a bit more detail later, but it's used to convert different data types to strings. As well as the data to convert, it can take a string that indicates how the string is to be formatted. An 'X' character in this string represents a hexadecimal digit, so you can convert the number 74 to hexadecimal using TO_CHAR(74, 'XX').

Rounding up and Down

As an example of using mathematical functions in SQL, let's look at the functions used to round numbers up or down to the nearest integer or to a specific precision. There are three basic functions involved:

FLOOR(number): Rounds number down to the highest integer less than number. For example, FLOOR(57.4) and FLOOR(57.8) will both return 57.
CEILING(number) or CEIL(number): Rounds number up to the lowest integer greater than number. For example, FLOOR(57.4) and FLOOR(57.8) will both return 58.
ROUND(number, precision): Rounds number to the nearest integer or floating-point number with the specified precision. For example, ROUND(57.4, 0) returns 57 and ROUND(57.8, 0) returns 58. If the number is negative, the number will be rounded on the left side of the decimal point, so ROUND(57.4, -1) returns 60 and ROUND(57.4, -2) returns 100.

Note

Access doesn't support either FLOOR() or CEILING(), but you can achieve the same effect using INT(), which works in much the same way as FLOOR() for numeric values, and ROUND(number + 0.5, 0) to replace CEILING(). Also, Access doesn't support negative precision with ROUND().

To see this in action, let's revisit the example from Chapter 4, "Summarizing and Grouping Data," where you calculated the average mark that a student scored in his or her exams. You'll show the same data (but for all students), but this time you'll present the average as a floating point, rounded up and rounded down and rounded to the nearest integer.

There are slight differences in the way you implement this in the various RDBMSs, so you'll look at these examples separately.

SQL Server and DB2

The same SQL query will run on both SQL Server and DB2, so you'll look at these together. The main issue here is that both SQL Server and DB2 will store the value of AVG(Mark) as an integer, so you need to convert the Mark to a float before calculating the average:

AVG(CAST(Mark AS FLOAT))

You'll look at data type conversion functions in more detail shortly. Also, notice that SQL Server supports only the CEILING() function, but DB2 has both CEIL() and CEILING().

ROUNDING UP AND DOWN (SQL SERVER AND DB2)

Enter this query and execute it against the InstantUniversity database:

SELECT StudentID, AVG(CAST(Mark AS FLOAT)) AS AverageMark,
        FLOOR(AVG(CAST(Mark AS FLOAT))) AS RoundDown,
        CEILING(AVG(CAST(Mark AS FLOAT))) AS RoundUp,
        ROUND(AVG(CAST(Mark AS FLOAT)), 0) AS ClosestInt
FROM StudentExam
GROUP BY StudentID;

This gives the following output:

   StudentID  AverageMark       RoundDown RoundUp  ClosestInt

   ---------  ----------------  --------- -------  ----------
   1          64                64        64       64
   2          48.6666666666667  48        49       49
   3          80                80        80       80
   4          63.3333333333333  63        64       63
   5          52                52        52       52
   6          68.3333333333333  68        69       68
   7          71.3333333333333  71        72       71
   8          51.6666666666667  51        52       52
   9          68.6666666666667  68        69       69
   10         73.5              73        74       74

Oracle

The query for Oracle is similar, but Oracle supports only CEIL(), not CEILING(). Also, you don't need to use the CAST() function (although it's supported, using the same syntax).

ROUNDING UP AND DOWN (ORACLE)

Enter this query into SQL*Plus:

SELECT StudentID, AVG(CAST(Mark AS FLOAT)) AS AverageMark,
        FLOOR(AVG(CAST(Mark AS FLOAT))) AS RoundDown,
        CEIL(AVG(CAST(Mark AS FLOAT))) AS RoundUp,
        ROUND(AVG(CAST(Mark AS FLOAT)), 0) AS ClosestInt
FROM StudentExam
GROUP BY StudentID;

This gives the same output as for the SQL Server/DB2 example:

   StudentID  AverageMark  RoundDown  RoundUp    ClosestInt

   ---------- -----------  ---------- ---------- ----------
   1          64           64         64         64
   2          48.6666667   48         49         49
   3          80           80         80         80
   4          63.3333333   63         64         63
   ...        ...          ...        ...        ...

MySQL

MySQL doesn't support the CAST() function and uses CEILING() rather than CEIL() . Otherwise, the query is the same as for Oracle.

ROUNDING UP AND DOWN (MYSQL)

Enter this query and execute it against the InstantUniversity database:

SELECT StudentID, AVG(Mark) AS AverageMark,
        FLOOR(AVG(Mark)) AS RoundDown,
        CEILING(AVG(Mark)) AS RoundUp,
        ROUND(AVG(Mark), 0) AS ClosestInt
FROM StudentExam
GROUP BY StudentID;

This gives the following output:

   StudentID  AverageMark RoundDown  RoundUp    ClosestInt

   ---------- ----------- ---------- ---------- ----------
   1           64.0000     64          64         64
   2           48.6667     48          49         49
   3           80.0000     80          80         80
   4           63.3333     63          64         63
   ...         ...         ...         ...        ...

Note
This example won't work against a MySQL server running in ANSI mode. Strict ANSI SQL demands that all columns not included in aggregate functions must be included in the GROUP BY clause. This doesn't make any sense (and isn't permitted) for your rounded columns, so this query will only work against MySQL running in standard mode.

Access

The Access version of this query is significantly different because as you saw previously, Access doesn't support the FLOOR() or CEILING() function.

ROUNDING UP AND DOWN (ACCESS)

The Access version of your query looks like this:

SELECT StudentID, AVG(Mark) AS AverageMark,
        INT(AVG(Mark)) AS RoundDown,
        ROUND(AVG(Mark) + .5, 0) AS RoundUp,
        ROUND(AVG(Mark), 0) AS ClosestInt
FROM StudentExam
GROUP BY StudentID;

Here you use INT() to round down the mark. INT() converts a non-integer type to an integer. For floating-point values, this effectively means that INT() works in the same way as FLOOR(). The ROUND() function has the same syntax as in the other RDBMSs. You also use this function to simulate CEILING(): If you add 0.5 to a value and then round this to the nearest integer, this has the effect of rounding up to the next integer.

Here are the results:

   StudentID  AverageMark      RoundDown   RoundUp   ClosestInt

   ---------- ---------------- ---------   --------- ----------
   1          64               64          64        64
   2          48.6666666666667 48          49        49
   3          80               80          80        80
   4          63.3333333333333 63          64        63
   ...        ...              ...         ...       ...

Manipulating Strings

As you've seen in earlier chapters, string literal values are enclosed in ' characters as follows:

'This is a string. Honest.'

One thing you haven't looked at yet is how you embed ' characters themselves in strings, which is a reasonably common occurrence. All you have to do is to replace any single quotes in the string with '':

'It''s cold outside.'

All other characters can be contained in the string literal value as normal.

It's also possible for a string literal to represent an empty string as follows:

''

This is simply two ' characters with nothing in between.

Although the ' character is the ANSI standard, some RDBMSs such as Access and MySQL (when not run in ANSI mode) allow the use of " characters instead. If you use this syntax, then ' characters can be used in the string without doubling up. However, if you want to use " characters, you must instead use "".

In addition, some RDBMSs use escape characters for certain characters such as tabs and carriage returns. MySQL uses syntax similar to that seen in many C++ (and C++ related) implementations, for example, \n for a new line character and \t for a tab. However, these escape codes are the exception rather than the norm.

Using String Manipulation Functions

Most RDBMSs provide a large number of functions that you can use to manipulate strings in various ways, for example, to extract substrings, determine the length of the string, find a character in the string, or convert the string to upper or lower case. Although the behavior of these functions is more or less standard, their names vary slightly from system to system, and in a few cases (such as SUBSTR), the parameter list can vary a little.

In general, these functions take the forms described in Table 5-2.

Table 5-2: String Manipulation Functions
Function	Description
LEFT(string, n)	Returns the n leftmost characters of string.
RIGHT(string, n)	Returns the n rightmost characters of string.
SUBSTRING(string, x, y)	Returns the substring of string of length y that starts at position x. Oracle, DB2, and MySQL permit you to omit the last argument, in which case the substring from x to the end of the string will be returned.
INSTR(string, substr)	Returns the position of the first instance of the character or substring substr within string.
LENGTH(string)	Returns the length of string (that is, the number of characters in string).
UPPER(string)	Returns string converted to upper case.
LOWER(string)	Returns string converted to lower case.

The names of these functions in the individual RDBMSs are as described in Table 5-3.

Table 5-3: Names of the Functions
SQL Server	DB2	Oracle	MySQL	Access
LEFT	LEFT	(Use SUBSTR)	LEFT	LEFT
RIGHT	RIGHT	(Use SUBSTR)	RIGHT	RIGHT
SUBSTRING	SUBSTR	SUBSTR	SUBSTRING	MID
CHARINDEX	POSSTR	INSTR	INSTR	INSTR
LEN	LENGTH	LENGTH	LENGTH	LEN
UPPER	UCASE/UPPER	UPPER	UCASE/UPPER	UCASE
LOWER	LCASE/LOWER	LOWER	LCASE/LOWER	LCASE

Note that SQL Server's CHARINDEX() function takes its parameters in the opposite order of the INSTR() functions. So, CHARINDEX(' ', 'Apress LP') will return 7, the position of the space in 'Apress LP'.

To see more clearly how these functions are used, let's look at an example. In this example, you'll reverse the first and last names of the students in the InstantUniversity database (for example, you'll format 'John Jones' as 'Jones, John').

To do this, you'll find the first occurrence of the space in their name and retrieve the substring from the next character to the end of the name (this is the student's last name). Next, you'll concatenate a comma and a space to this using the concatenation techniques you learned in Chapter 2, "Retrieving Data with SQL." Finally, you'll add the first name, which you retrieve by extracting the substring from the start of the string to the character before the space.

Again, you'll examine the examples for each RDBMS separately.

SQL Server

Apart from the idiosyncratic CHARINDEX() function noted previously, the SQL Server version of the example is fairly straightforward.

MANIPULATING STRINGS (SQL SERVER)

This is the SQL Server version of your query:

SELECT
   RIGHT(Name, LEN(Name) - CHARINDEX(' ', Name) + 1) + ', ' +
   LEFT(Name, CHARINDEX(' ', Name) - 1) AS StudentName
FROM Student
ORDER BY StudentName;

Here you use the RIGHT() function to retrieve the last name for each student and LEFT() to get the first name (although you could have used SUBSTRING()). You also use the LEN() function to calculate the length of the name, and you use this value to calculate the length of the last name. The first four rows output from this query are as follows:

   StudentName

   ---------------
   Akbar, Mohammed
   Alaska, Steve
   Andrews, Vic
   Burton, Gary
   ...

Oracle

Oracle doesn't support LEFT() or RIGHT(), so you'll use SUBSTR() to extract the first and last names from the student's full name as stored in the database.

MANIPULATING STRINGS (ORACLE)

The query looks like this in Oracle:

SELECT
   SUBSTR(Name, INSTR(Name, ' ') + 1) || ', ' ||
   SUBSTR(Name, 1, INSTR(Name, ' ') - 1) AS StudentName
FROM Student
ORDER BY StudentName;

Oracle allows you to use the version of SUBSTR() with only two parameters, retrieving the whole of the remainder of the string from the specified position onward, so you'll use this to extract the last name (the whole string after the space). You retrieve the first name by retrieving the substring from position one (the first character) to the character before the space. Here are the results:

   StudentName

   ---------------
   Akbar, Mohammed
   Alaska, Steve
   Andrews, Vic
   Burton, Gary
   ...

DB2

DB2 does support LEFT() and RIGHT(), but because its version of SUBSTR() can take only two arguments, the query is actually no longer if you don't use them.

MANIPULATING STRINGS (DB2)

This is how the query looks in DB2:

SELECT
   SUBSTR(Name, POSSTR(Name, ' ') + 1) || ', ' ||
   SUBSTR(Name, 1, POSSTR(Name, ' ') - 1) AS StudentName
FROM Student
ORDER BY StudentName;

Apart from the fact that DB2 uses POSSTR() for the more usual INSTR(), this is the same as the Oracle version. The output from this query is the same as on SQL Server and Oracle:

   StudentName

   ---------------
   Akbar, Mohammed
   Alaska, Steve
   Andrews, Vic
   Burton, Gary
   ...

One interesting side note is what happens if you try to perform the query using the LEFT() and RIGHT() functions:

SELECT
   RIGHT(Name, LENGTH(Name) - POSSTR(Name, ' ') + 1) ||
   ', ' || LEFT(Name, POSSTR(Name, ' ') - 1) AS StudentName
FROM Student
ORDER BY StudentName;

What you get is this error:

   An expression resulting in a string data type with a maximum
   length greater than 255 bytes is not permitted in:

   A SELECT DISTINCT statement
   A GROUP BY clause
   An ORDER BY clause
   A column function with DISTINCT
   A SELECT or VALUES statement of a set operator other than
       UNION ALL.

This is because the LEFT() and RIGHT() functions return results as varchar(4000), so although your formatted StudentName will never approach a length of 255 bytes, it's treated by DB2 as potentially greater than that. To correct that, you can set the length of the column using the VARCHAR() function, which you'll look at shortly when you learn about data type conversions.

MySQL

Like DB2, MySQL supports both LEFT() and RIGHT() and the version of SUBSTRING() with just two parameters.

MANIPULATING STRINGS (MYSQL)

The query looks like this in MySQL:

SELECT
   CONCAT(RIGHT(Name, LENGTH(Name) - INSTR(Name, ' ') + 1),
           ', ', LEFT(Name, INSTR(Name, ' ') - 1))
   AS StudentName
FROM Student
ORDER BY StudentName;

This query, using the CONCAT() operator, will work in both standard and ANSI modes. In ANSI mode, you can also use the || operator to perform concatenation, but that won't work in standard mode. Here are the results:

   StudentName

   ---------------
   Akbar, Mohammed
   Alaska, Steve
   Andrews, Vic
   Burton, Gary
   ...

Access

Access supports LEFT() and RIGHT() but uses MID() instead of SUBSTRING().

MANIPULATING STRINGS (ACCESS)

This is how the query looks in Access:

SELECT
   RIGHT(Name, LEN(Name) - INSTR(Name, ' ') + 1) & ', ' &
   LEFT(Name, INSTR(Name, ' ') - 1) AS StudentName
FROM Student
ORDER BY
   RIGHT(Name, LEN(Name) - INSTR(Name, ' ') + 1);

Remember that you can't use the alias column names of calculated columns in Access ORDER BY clauses. This means you have to repeat the formula used to calculate the column. To save space, here you've just sorted by the last name part; because no two of the sample students have the same last name, this won't affect the ordering:

   StudentName

   ---------------
   Akbar, Mohammed
   Alaska, Steve
   Andrews, Vic
   Burton, Gary
   ...

Working with Dates and Times

In general, datetime literal values are a special case of string values where the specific string is formatted in a certain way. Most RDBMSs will accept a wide variety of datetime specifications, but you do have to be careful to write datetime literal values in a form that will work internationally. For example, the following literal value might be considered unacceptable:

'05/02/2003'

This is because the locale settings on a server might mean that this is interpreted either as 'mm/dd/yyyy' or 'dd/mm/yyyy'.

However, reversing the values to 'yyyy/mm/dd' works in every locale. Note that the separator here, /, can be replaced with - or in some formats can be omitted entirely or replaced with a dot (.), with the date expressed as an eight-digit number. The standard form is to use dashes; the following literal, then, is unambiguous:

'2003-05-02'

Note

Note that Access requires datetime literals to be enclosed in pound characters, so you'd write this as #2003-05-02# if you're using Access.

In addition, you can be even clearer by using a more human readable format such as:

'May 2, 2003'

However, formatting a date in this way may take more fiddling around with strings in your code than you'd like.

Time constants are formatted as follows:

'hh:mm:ss.fff'

Here hh is hours, mm is minutes, ss is seconds, and fff is used to get even more precise with fractions of seconds, for example:

'14:59:02.129'

However, you needn't specify as much information as this. You can omit second data, for example:

'14:59'

You can also use a 12-hour clock:

'2:59 PM'

Finally, date and time values can be combined into a single string:

'2003-05-02 14:59:02.129'

Here a space separates the date and time values, and you can use any of the formats described previously for the date and time.

Note that there are certain restrictions placed on SQL date and time formats enforced by the various RDBMSs. For example, when using Oracle it may be necessary to include a DATE, TIME,or TIMESTAMP keyword before the string used to represent the date and time in order to specify how the string should be interpreted.

For example, you could represent a date using the following Oracle literal value:

DATE '2003-05-02'

Oracle also allows time intervals to be represented, using a similar syntax as the previous example with the addition of a requirement to specify what the first number in the interval string is (such as YEAR or DAY, with an optional single-digit precision specification saying how many digits are used for the value, for example, YEAR(4) for a four-digit amount of years) and, optionally, what the last number is (using similar nomenclature) using a suffix. There are two types of Oracle interval, known as YEAR TO MONTH and DAY TO SECOND, where the former has a precision of months or less and the latter seconds or less.

The following shows a DAY TO SECOND interval of five days and four hours:

INTERVAL '5:4' DAY TO HOUR

The following is a YEAR TO MONTH interval of 5,000 years:

INTERVAL '5000' YEAR(4)

For more details on this, consult your Oracle documentation.

Using Datetime Functions

The syntax and functionality of the date functions vary significantly from system to system, so we won't list them here. Instead, we'll present an example for each RDBMS where you calculate how long each student has been enrolled by working out the maximum difference between the current date and the date when they first enrolled.

You group the rows in the Enrollment table by StudentID and use the MAX aggregate function to get the amount of days since the first enrollment (because students may have enrolled in multiple classes, the MAX function finds the earliest). Finally, you order the resultant data by this calculated average to rank students.

In most cases, you can use some form of subtraction to work out the number of days since each student's first enrollment, but the exact syntax varies considerably.

SQL Server

To calculate the difference between two dates in SQL Server, you use the DATEDIFF() function. Subtracting one date from another is permitted in SQL Server, but the return value is itself a date and not very meaningful.

DIFFERENCES BETWEEN DATES (SQL SERVER)

Enter and execute this query against the InstantUniversity database:

SELECT StudentID, MAX(DATEDIFF(dd, EnrolledOn, GETDATE()))
                                               AS DaysEnrolled
FROM Enrollment
GROUP BY StudentID
ORDER BY DaysEnrolled DESC;

DATEDIFF() calculates the time difference between two dates; the first parameter of this function selects the units to work with, in this case, days. You look at how many days have elapsed between the enrollment date stored in the EnrolledOn column and the current date, obtained using the GETDATE() function. The output is as follows:

   StudentID    DaysEnrolled

   -----------  -----------
   4            152
   7            152
   10           152
   9            149
   8            149
   5            149
   6            149
   1            149
   3            142
   2            -216

Note that the last value is negative because some of the enrollment dates are in the future (at the time of writing). This wouldn't happen if you were using real data, of course, but it does demonstrate that DATEDIFF() returns a signed rather than an absolute value.

Oracle

With Oracle, you can perform a straightforward subtraction of two dates.

DIFFERENCES BETWEEN DATES (ORACLE)

The Oracle version of the query looks like this:

SELECT StudentID,
        FLOOR(MAX(CURRENT_DATE - EnrolledOn)) AS DaysEnrolled
FROM Enrollment
GROUP BY StudentID
ORDER BY DaysEnrolled DESC;

You retrieve the current date and time using CURRENT_DATE and subtract the value of the EnrolledOn column to find out how long each student has been enrolled for each class. This operation returns an INTERVAL value; because this can contain a time portion (expressed using decimals), you round this down to the nearest day using FLOOR() (you use FLOOR() rather than CEILING() to include only complete days). The output is the same as for the SQL Server version:

   StudentID    DaysEnrolled

   -----------  ------------
   4            152
   7            152
   10           152
   ...          ...

DB2

You again use subtraction to calculate the number of days each student has been enrolled with DB2, but you need first to convert the dates into a simple count of days by calling the DAYS() function.

DIFFERENCES BETWEEN DATES (DB2)

In DB2, the query looks like this:

SELECT StudentID,
        MAX(DAYS(CURRENT_DATE) - DAYS(EnrolledOn))
                                     AS DaysEnrolled
FROM Enrollment
GROUP BY StudentID
ORDER BY DaysEnrolled DESC;

Again, you retrieve the present date using CURRENT_DATE. The DAYS() function allows you to represent a date as a simple integer by returning the number of days since December 31, 1 BC, so you just need to calculate the difference between the DAYS() values for the current date and for the EnrolledOn date:

   StudentID    DaysEnrolled

   -----------  ------------
   4            152
   7            152
   10           152
   ...          ...

MySQL

The MySQL version of the query is similar to the DB2 version except that you use the TO_DAYS() function.

DIFFERENCES BETWEEN DATES (MYSQL)

This is the MySQL version of the query:

SELECT StudentID,
       MAX(TO_DAYS(CURRENT_DATE) - TO_DAYS(EnrolledOn))
                                           AS DaysEnrolled
FROM Enrollment
GROUP BY StudentID
ORDER BY DaysEnrolled DESC;

As in Oracle and DB2, you use CURRENT_DATE to retrieve the present date. TO_DAYS() works in a similar way to DB2's DAYS() function, but it returns the number of days since the start of a mythical "Year Zero" (this figure is 365 days greater than the figure returned by DAYS()):

   StudentID    DaysEnrolled

   -----------  ------------
   4            152
   7            152
   10           152
   ...          ...

Access

The Access version of the query is the simplest; all you need to do is perform a straightforward subtraction.

DIFFERENCES BETWEEN DATES (ACCESS)

The query looks like this in Access:

SELECT StudentID, MAX(DATE() - EnrolledOn) AS DaysEnrolled
FROM Enrollment
GROUP BY StudentID
ORDER BY MAX(DATE() - EnrolledOn) DESC;

Here you use the DATE() function to get the current date. Simply subtracting one date from another returns the difference in days between the two dates. As in other Access examples, you can't use the alias of the calculated column in the ORDER BY clause, so you need to repeat the formula. Here are the results:

   StudentID    DaysEnrolled

   -----------  ------------
   4            152
   7            152
   10           152
   ...          ...

Performing Data Type Conversions

You've already seen a few examples where you need to convert a value into a different data type. It may be necessary, for example, when you want to perform string concatenation including a numeric value or when you want to round a floating-point value to an integer for display. Most RDBMSs provide both generic functions that you can use to cast to any type and functions for casting to specific types such as strings or integers.

Casting to Strings

Let's look first at the functions that exist specifically for converting non-string types to strings. In most cases, you can also use these functions to alter the length of a string field.

SQL Server

In general, casts on SQL Server are performed using the CAST() or CONVERT() functions, which you'll look at shortly. However, you can use the STR() function to convert floating-point values into strings. This takes the following form:

STR(float, [length], [precision])

where float is the number you want to convert to a string, length is the length of the string, and precision is the number of decimal places to retain. Both the length and precision parameters are optional. If they aren't specified, length has a default value of 10, and precision a default of zero.

As an example, STR(43.2461, 5, 2) returns '43.25'. If the data is corrupted because the length isn't long enough to hold all the digits before the decimal point, the returned string will be filled with asterisks. For example, STR(4326.12, 3) returns '***'. If the length isn't sufficient to hold all the decimal places, these will be discarded and the value rounded according the space available, so STR(43.184, 4, 2) returns '43.2'.

Oracle

Oracle allows you to convert character, numeric, and date types to strings using the TO_CHAR() and TO_NCHAR() functions. With dates and numeric types, you can also specify a format string. For example:

TO_CHAR(333, '0000.0')                        -- '0333.0'
TO_CHAR(333.15, '0000.0')                     -- '0333.2'
TO_CHAR(DATE '2003-05-23', 'DD MONTH YYYY') -- '23 MAY 2003'

DB2

DB2 is relatively strongly typed and has specific conversion functions for all its supported data types. These include the CHAR() and VARCHAR() functions, which you use for converting data to the CHAR and VARCHAR types, respectively. CHAR() can be used with most data types, but VARCHAR() is limited to character, date, and graphic types. If used with a character type, VARCHAR() can take a second parameter indicating the length of the string returned; otherwise, it takes only one parameter—the value to convert.

Depending on the type of data involved, CHAR() can also have a second parameter. If you're converting a date, you can specify one of the values ISO, USA, EUR, JIS, or LOCAL to indicate how to format the string. For example, CHAR(DATE('2003-02-24'), USA) returns '02/24/2003', but CHAR(DATE('2003-02-24'), EUR) returns '24.02.2003'.

You can also specify the length for character types or the decimal point character to use for floating-point types. For example, CHAR(32.47, ',') returns '32,47'.

MySQL

MySQL is weakly typed and will generally try to convert values as necessary. If you need to convert a value explicitly to a string, you can simply concatenate the value with an empty string:

CONCAT(23.52, '') -- Returns '23.52'

Access

Access supports the STR() function for converting dates and numeric types to strings:

STR(#2003-01-01#) ' Returns '01/01/2003', depending on locale
STR(23.52)         ' Returns '23.52'

Unlike the SQL Server version of this function, STR() in Access doesn't allow you to specify the length of the returned string.

Casting to Numbers

Some RDBMSs also provide specific functions for casting non-numeric types to numeric types, but in many cases more flexibility is available with the generic conversion functions.

SQL Server

SQL Server doesn't have specific functions for converting to numeric types—use CAST()/CONVERT() instead.

Oracle

Oracle supports the TO_NUMBER() function, which allows you to convert character types into numbers. As well as the string to convert, you can specify a format string that the function can use to work out how to interpret the string. This allows the string to contain characters such as hexadecimal and currency symbols that might not otherwise be permitted in a numeric value:

TO_NUMBER('FFFF', 'XXXX')            -- Returns 655535
TO_NUMBER('$5,102.25', '$9,999.99') -- Returns 5102.25

DB2

DB2 supports a range of functions for converting numeric and character values to different numeric types. Some of the more common are the following:

INTEGER()/INT(): Converts numeric or character data to an integer.
DOUBLE()/FLOAT(): Converts numeric or character data to a floating-point value.
DECIMAL()/DEC(): Converts numeric or character data to a decimal. As well as the data to convert, this function lets you specify the total number of digits and the number of digits after the decimal place; for example, DECIMAL(37.54, 3, 1) returns 37.5.

MySQL

Because of its weak typing, MySQL allows you to convert strings or dates to numbers just by adding zero to them. For example, '23.5' + 0 returns 23.5 as a numeric type. If the string begins with a number but contains non-numeric characters, only that section that can be interpreted as a number will be returned; if the string begins with characters that can't be converted to a numeric value, zero will be returned. For example, '4.5 strings' returns 4.5, but 'strings 4 u' returns 0.

Access

Access supports the INT() function, which allows you to convert non-numeric data types into integers (with rounding down). For example, INT('4.8') returns 4, and INT('5.23e3') returns 5230.

Using Generic Casting

Two generic casting functions have fairly widespread support: CAST(), which is the ANSI standard and is supported by SQL Server, Oracle, DB2, and MySQL, and CONVERT(), which is the Open Database Connectivity (ODBC) form and is supported by SQL Server and MySQL.

Note

Support for CAST() and CONVERT() was added in MySQL 4.0.2; in previous versions, you'll need to use the implicit casting methods mentioned previously.

The syntax for CAST() is as follows:

CAST(expression AS data_type)

In SQL Server, Oracle, and DB2, the data_type can optionally include details of the size of the target data type and (for decimal values) the number of places after the decimal point:

CAST('3.521' AS DECIMAL(3,2)) -- Returns 3.52

Note

Note that MySQL doesn't allow this syntax and only permits you to specify the data type itself—not size or precision details. Also, MySQL will only allow casting to one of the following types: BINARY, DATE, DATETIME, TIME, SIGNED [INTEGER],or UNSIGNED [INTEGER].

The syntax for CONVERT() in MySQL is as follows:

CONVERT(expression, data_type)

Again, data_type must be one of the types listed previously, and you aren't permitted to indicate the length of the type.

The SQL Server syntax for CONVERT() is as follows:

CONVERT(data_type [(length)], expression [, style])

The style parameter is a number that indicates how to format a datetime value if you're casting to a datetime or how a datetime is formatted if you're converting one to another type:

CONVERT(varchar, GETDATE(), 106) -- Returns current date in
                                    -- the format '27 Feb 2003'

See the Microsoft MSDN Web site at http://msdn.microsoft.com/library/en-us/tsqlref/ts_ca-co_2f3o.asp?frame=true for full details of the possible values of this number.

Creating Functions

Now that you've seen how to use some of the most common built-in functions available in most RDBMSs, let's look at how you can define your own. ANSI SQL-99 defines a CREATE FUNCTION statement, which is implemented (with the inevitable differences) by most vendors and which allows you to create your own User-Defined Functions (UDFs). The basic syntax, common to most implementations, boils down to no more than this:

CREATE FUNCTION function_name [(parameter_list)]
RETURNS data_type
<SQL statement(s)>

To learn the actual syntax in the different RDBMSs, you'll look at an example. Notice that Access doesn't support UDFs, so there isn't an example for that system. For the sake of both clarity and conciseness, we won't present every option for each RDBMS, but we'll explain the main options. This example is based on the string manipulation example earlier in this chapter, where you reversed the order of the first and last names of each student.

Calling user-defined functions is the same as calling the built-in functions. Once you've created it, you can call your FormatName() function in the query:

SELECT FormatName(Name) AS StudentName
FROM Student
ORDER BY StudentName;

However, you may need to specify the schema where the function is stored in SQL Server (for example, dbo.FormatName(Name)).

This will return a list of the students, sorted according to their last names:

   StudentName

   -----------------
   Akbar, Mohammed
   Alaska, Steve
   Andrews, Vic
   Burton, Gary
   Fernandez, Maria
   Foster, Andrew
   Jones, John
   Jonsson, Isabelle
   Lee, Bruce
   Picard, Julia
   Scarlett, Emily
   Wolff, Anna

SQL Server

The basic syntax for creating a function in SQL Server is as follows:

CREATE FUNCTION function_name [(parameter_list)]
RETURNS data_type
AS
BEGIN
   [<SQL statements>]
   RETURN expression
END

The function can contain multiple SQL statements but mustn't perform any permanent changes to the data in the database.

CREATING A FUNCTION (SQL SERVER)

Here's the SQL Server version of the FormatName() UDF:

CREATE FUNCTION FormatName (@FullName varchar(50))
RETURNS varchar(50)
AS
BEGIN
   RETURN RIGHT(@FullName, LEN(@FullName) -
                CHARINDEX(' ', @FullName) + 1) + ', ' +
           LEFT(@FullName, CHARINDEX(' ', @FullName) - 1)
END

The parameter list defined after the function name takes the following form:

(@FullName varchar(50))

As with the parameters and variables of stored procedures (which you'll look at in Chapter 9, "Using Stored Procedures"), the parameters and variables in UDFs in SQL Server are prefixed with an @ character, both when they're defined in the parameter list and when they're referenced in the body of the function.

Although you could have multiple SQL statements in the function, in this case you just return the formatted name; this expression returned from the function is exactly the same expression as you used in the early example.

Oracle

In general, the basic syntax for defining Oracle functions is similar to that for stored procedures, which you'll look at in Chapter 9, "Using Stored Procedures":

CREATE [OR REPLACE] FUNCTION function_name
(parameter_list)
RETURN data_type
IS
variable_list
BEGIN
   [<SQL statements>]
   RETURN expression;
END;
/

There are a couple of differences to the standard syntax. First, as with many Oracle objects, you can indicate that you want to replace any existing object of the same name using the OR REPLACE option. Note that the return type is declared using the RETURN keyword rather than the standard RETURNS. Parameters can be marked as input (IN) or output (OUT) parameters, just as they can for stored procedures.

Second, you need to declare any variables used in the function before the BEGIN...END block. The final / is simply a device to let SQL*Plus know that you've reached the end of a block that includes a number of SQL statements (as the semicolon that usually marks the end of a statement is used within CREATE FUNCTION).

CREATING A FUNCTION (ORACLE)

The Oracle version of the FormatName() function is as follows:

CREATE OR REPLACE FUNCTION FormatName(FullName IN varchar)
RETURN varchar
IS
FormattedName varchar(50);
BEGIN
   FormattedName :=
      SUBSTR(FullName, INSTR(FullName, ' ') + 1) || ', ' ||
      SUBSTR(FullName, 1, INSTR(FullName, ' ') - 1);
   RETURN(FormattedName);
END;
/

Here you define just one input parameter—the name that you want to format—and no output parameters. You also declare a variable, FormattedName, just to show the syntax although you could actually manage without it for this function.

Within the body of the function, you set your FormattedName variable to the expression you want to return (using the special PL/SQL := operator) and then return this variable from the function using the RETURN keyword. You'll look at using variables in SQL in more detail in Chapter 9, "Using Stored Procedures."

DB2

The basic syntax for creating a function in DB2 follows the ANSI standard quite closely and looks like this:

CREATE FUNCTION function_name(parameter_list)
   RETURNS data_type
   [LANGUAGE SQL]
   [DETERMINISTIC | NON DETERMINISTIC]
   [CONTAINS SQL | READS SQL DATA]
[BEGIN ATOMIC]
   [<SQL statements>]
   RETURN expression;
[END]

There are a number of options you can specify for the function. First, you can tell DB2 that your function is in SQL (rather than a programming language such as C++ or Java) using the LANGUAGE SQL option. Second, you can use the DETERMINISTIC keyword to indicate that the function will always return the same result for a given parameter (or you can specify NOT DETERMINISTIC if this isn't the case); knowing this allows DB2 to optimize the function. Finally, you can specify READS SQL DATA to indicate that your function reads data from a database or CONTAINS SQL to indicate that your function uses SQL keywords and functions to manipulate the data passed in, without retrieving further data from the database.

The function body can consist merely of a RETURN statement followed by an expression, or it can contain a block of SQL statements surrounded by the BEGIN ATOMIC...END keywords.

CREATING A FUNCTION (DB2)

In DB2, the FormatName() UDF looks like this:

CREATE FUNCTION FormatName(FullName varchar(50))
   RETURNS varchar(50)
   LANGUAGE SQL
   DETERMINISTIC
   CONTAINS SQL
BEGIN ATOMIC
   DECLARE FormattedName VARCHAR(50);
   SET FormattedName =
       SUBSTR(FullName, POSSTR(FullName, ' ') + 1) || ', ' ||
       SUBSTR(FullName, 1, POSSTR(FullName, ' ') - 1);
   RETURN FormattedName;
END

This function will always return the same value for a given parameter, so you declare it as DETERMINISTIC. You don't read any data from the database—you just work with the string passed into the function—so you also include the CONTAINS SQL clause (rather than READS SQL DATA).

For this example, you could write the body of the function as a single RETURN statement, but you want to show the syntax using a block of SQL statements, so you haven't. Instead, as in the Oracle example, you've defined a variable to store the formatted name and then returned that. Before you use the variable, you must explicitly declare it:

DECLARE FormattedName VARCHAR(50);

You'll look at using DB2 variables in more detail in Chapter 9, "Using Stored Procedures."

MySQL

This last example is by far the most complicated because MySQL doesn't support UDFs written in SQL. Instead, you need to write the function in C or C++ and then register it with MySQL using the CREATE FUNCTION statement. The CREATE FUNCTION syntax for MySQL is as follows:

CREATE [AGGREGATE] FUNCTION function_name
RETURNS {STRING | REAL | INTEGER}
SONAME shared_library_name;

The AGGREGATE keyword must be specified if your function aggregates rows like SQL aggregate functions such as COUNT. This has an impact on the way the function is coded. The SONAME clause indicates the shared library (SO or DLL) where the function is defined.

Because this isn't a SQL example and because the example will differ depending on what platform you're using, we'll go through it quickly. For full details, check the MySQL documentation and the udf_example example functions, which are available as part of the source distribution of MySQL.

CREATING A FUNCTION (MYSQL)

First, then, you need to write the C/C++ code for the function. This code must be compiled into a shared object (Linux) or dynamic-link library (Windows). The following is the code for the FormatName() example:

#include <string>

#include <my_global.h>
#include <my_sys.h>
#include <mysql.h>

using namespace std;

char* FormatName(UDF_INIT *initid, UDF_ARGS *args,
                   char *result, unsigned long *length,
                   char *is_null, char *error)
{
   // Retrieve the Name parameter from the args parameter
   char* fullName = new char[args->lengths[0]];
   fullName = args->args[0];

   // Convert from char* to string
   string strName = fullName;

   // Trim whitespace from end of string
   int i = strName.length();
   while (strName[i] == ' ' || strName[i] == '\0') i--;
   string trimName = strName.substr(0, i + 1);
   // Arrange in 'LastName, FirstName' format
   int spaceIndex = trimName.find_first_of(" ");
   string firstName = trimName.substr(0, spaceIndex);
   string lastName = trimName.substr(spaceIndex + 1);
   string formattedName = lastName + ", " + firstName;

   // Convert back to char* and set the length argument
   char* fmtName = new char[];
   formattedName.copy(result, string::npos);
   *length = static_cast<unsigned long>(i + 2);

   return result;
}

Note that MySQL prescribes the signature for the function. There are three versions, depending on whether the function returns a string (as previously), a floating-point value, or an integer.

Before compiling the DLL/SO, you need to indicate to the linker that your function is to be exported so that MySQL can see it within the library. To do this, you can use a linker definition file (in this case called MySQLFunction.def):

LIBRARY "MySQLFunction"
EXPORTS
   FormatName

Once you've compiled the library, you need to register the function with MySQL. To do this, you just need to execute this CREATE FUNCTION statement:

CREATE FUNCTION FormatName
RETURNS STRING
SONAME 'C:\\MySQL\\lib\\MySQLFunction.dll';

In this case, you've specified the exact path to the DLL on a Windows machine. On a Linux machine, you would instead copy the SO to a location where the Id program can find it or configure Id to look in the directory where the SO is stored. How to do this varies depending on the exact platform.