![]() |
Table of Contents |
![]() |
4.2 ANSI SQL Aggregate FunctionsAggregate functions return a single value based upon a set of other values. If used among other expressions in the item list of a SELECT statement, the SELECT must have a GROUP BY or HAVING clause. No GROUP BY or HAVING clause is required if the aggregate function is the only value retrieved by the SELECT statement. The supported aggregate functions and their syntax are listed in Table 4-1. Technically speaking, ANY, EVERY, and SOME are considered aggregate functions. However, they have been discussed as range search criteria since they are most often used that way. Refer to ALL/ANY/SOME Operators for more information on these functions. The number of values processed by an aggregate function varies depending on the number of rows queried from the table. This behavior makes aggregate functions different from scalar functions, which can only operate on the values of a single row per invocation. The general syntax of an aggregate function is: aggregate_function_name ( [ALL | DISTINCT] expression ) The aggregate function name may be AVG, COUNT, MAX, MIN, or SUM, as listed in Table 4-1. The ALL keyword, which specifies the default behavior, evaluates all rows when aggregating the value of the function. The DISTINCT keyword uses only distinct values when evaluating the function.
The AVG function computes the average of values in a column or an expression. SUM computes the sum. Both functions work with numeric values and ignore NULL values. Use the DISTINCT keyword to compute the average or sum of all distinct values of a column or expression. SQL Standard SyntaxAVG ([ALL | DISTINCT] expression ) SUM ([ALL | DISTINCT] expression ) MySQL, PostgreSQL, and SQL ServerAll these platforms support the SQL2003 syntax of AVG and SUM. DB2 and OracleDB2 and Oracle support the ANSI syntax and the following analytic syntax: AVG ([ALL | DISTINCT] expression ) OVER (window_clause) SUM ([ALL | DISTINCT] expression ) OVER (window_clause) For an explanation of the window_clause, see the Section 4.3 later in this chapter. ExamplesThe following query computes average year-to-date sales for each type of book: SELECT type, AVG( ytd_sales ) AS "average_ytd_sales"FROM titles GROUP BY type; This query returns the sum of year-to-date sales for each type of book: SELECT type, SUM( ytd_sales ) FROM titles GROUP BY type;
The CORR function returns the correlation coefficient between a set of dependent and independent variables. SQL2003 SyntaxCalls the function with two variables, one dependent and the other independent: CORR( dependent, independent ) Any pair in which either the dependent variable, independent variable, or both are NULL is ignored. The result of the function is NULL when none of the input pairs consist of two non-NULL values. OracleOracle supports the SQL2003 syntax, and the following analytic syntax: CORR (dependent, independent) OVER (window_ clause) For an explanation of the window_clause, see the Section 4.3 later in this chapter. DB2, MySQL, PostgreSQL, and SQL ServerThese platforms do not support any form of the CORR function. ExampleThe following CORR example uses the data shown by the first SELECT: SELECT * FROM test2; Y X ---------- ---------- 1 3 2 2 3 1 SELECT CORR(y,x) FROM test2; CORR(Y,X) ---------- -1
The COUNT function is used to compute the number of rows in an expression. SQL2003 SyntaxCOUNT(*) COUNT( [ALL|DISTINCT] expression )
MySQL, PostgreSQL, and SQL ServerAll of these platforms support the SQL2003 syntax of COUNT. DB2 and OracleDB2 and Oracle support the ANSI syntax and the following analytic syntax: COUNT ({*|[DISTINCT] expression}) OVER (window_clause) For an explanation of the window_clause, see the section later in this chapter titled Section 4.3. ExamplesThis query counts all rows in a table: SELECT COUNT(*) FROM publishers; The following query finds the number of different countries where publishers are located: SELECT COUNT(DISTINCT country) "Count of Countries" FROM publishers
The COVAR_POP function returns the population covariance of a set of dependent and independent variables. SQL2003 SyntaxCall the function with two variables, one dependent and the other independent: COVAR_POP( dependent, independent) The function disregards any pair in which either the dependent variable, independent variable, or both are NULL. If no rows remain in the group after NULL elimination, then the result of the function is NULL. OracleOracle supports the SQL2003 syntax and implements the following analytic syntax: COVAR_POP ( dependent, independent ) OVER (window_clause) For an explanation of the window_clause, see the section later in this chapter titled Section 4.3. DB2In DB2, the function is named CORRELATION. MySQL, PostgreSQL, and SQL ServerThese platforms do not support any form of the COVAR_POP function. ExampleThe following COVAR_POP example uses the data shown by the first SELECT: SELECT * FROM test2; Y X ---------- ---------- 1 3 2 2 3 1 SELECT COVAR_POP(y,x) FROM test2; COVAR_POP(Y,X) -------------- -.66666667
The COVAR_SAMP function returns the sample covariance of a set of dependent and independent variables. SQL2003 SyntaxCall the function with two variables, one dependent and the other independent: COVAR_SAMP( dependent, independent ) The function disregards any pair in which either the dependent variable, independent variable, or both are NULL. The result of the function is NULL when none of the input pairs consist of two non-NULL values. OracleOracle supports the SQL2003 syntax and implements the following analytic syntax: COVAR_SAMP ( dependent, independent ) OVER (window_clause) For an explanation of the window_clause, see the section later in this chapter titled Section 4.3. DB2, MySQL, PostgreSQL, and SQL ServerThese platforms do not support any form of the COVAR_SAMP function. ExampleThe following COVAR_SAMP example uses the data shown by the first SELECT: SELECT * FROM test2; Y X ---------- ---------- 1 3 2 2 3 1 SQL> SELECT COVAR_SAMP(y,x) FROM test2; COVAR_SAMP(Y,X) --------------- -1
Computes the relative rank of a hypothetical row within a group of rows, where that relative rank is computed as follows: (rows_preceding_hypothetical + rows_peered_with_hypothetical) / rows_in_group Bear in mind that the rows_in_group value includes the hypothetical row that you are proposing when you call the function. SQL2003 SyntaxIn the following syntax, items in the value_list correspond by position to items in the sort_list. Therefore, both lists must have the same number of expressions. CUME_DIST(value_list) WITHIN GROUP (ORDER BY sort_list) value_list ::= expression [,expression...] sort_list ::= sort_item [,sort_item...] sort_item ::= expression [ASC|DESC] [NULLS FIRST|NULLS LAST] OracleOracle follows the SQL2003 syntax and implements the following analytic syntax: CUME_DIST OVER ([partioning] ordering ) For an explanation of the partioning and order clauses, see the section later in this chapter titled Section 4.3. DB2, MySQL, PostgreSQl, and SQL ServerThese platforms do not implement the CUME_DIST aggregate function. ExampleThe following example determines the relative rank of the hypothetical new row (num=4, odd=1) within each group of rows from test4, where groups are distinguished by the values in the odd column: SELECT * FROM test4; NUM ODD ---------- ---------- 0 0 1 1 2 0 3 1 3 1 4 0 5 1 SELECT odd, CUME_DIST(4,1) WITHIN GROUP (ORDER BY num, odd) FROM test4 GROUP BY odd; ODD CUME_DIST(4,1)WITHINGROUP(ORDERBYNUM,ODD) ---------- ----------------------------------------- 0 1 1 .8 In group odd=0, the new row comes after the three rows: (0,0), (2,0), and (4,0). It will peer with itself. The total number of rows in the group will be four, which includes the hypothetical row. The relative rank, therefore, is computed as follows: (3 rows preceding + 1 peering) / (3 in group + 1 hypothetical) = 4 / 4 = 1 In group odd=1, the new row follows the three rows (1,1), (3,1), and a duplicate (3,1). Again, there is one peer, the hypothetical row itself. The number of rows in the group is five, which includes the hypothetical row. The relative rank is then: (3 rows preceding + 1 peering) / (4 in group + 1 hypothetical) = 4 / 5 = .8
Computes a rank in a group for a hypothetical row that you supply. This is a dense rank. Rankings are never skipped, even when a group contains rows that rank identically. SQL2003 SyntaxIn the following syntax, items in the value_list correspond by position to items in the sort_list. Therefore, both lists must have the same number of expressions. DENSE_RANK(value_list) WITHIN GROUP (ORDER BY sort_list) value_list ::= expression [,expression...] sort_list ::= sort_item [,sort_item...] sort_item ::= expression [ASC|DESC] [NULLS FIRST|NULLS LAST] OracleOracle follows the SQL2003 syntax and implements the following analytic syntax: DENSE_RANK( ) OVER ([partioning] ordering ) For an explanation of the partioning and order clauses, see the section later in this chapter titled Section 4.3. DB2, MySQL, PostgreSQl, and SQL ServerThese platforms do not implement the DENSE_RANK aggregate function. However, DB2 does support DENSE_RANK as an analytic function. See the section later in this chapter titled Section 4.3. ExampleThe following example determines the dense rank of the hypothetical new row (num=4, odd=1) within each group of rows from test4, where groups are distinguished by the values in the odd column: SELECT * FROM test4; NUM ODD ---------- ---------- 0 0 1 1 2 0 3 1 3 1 4 0 5 1 SELECT odd, DENSE_RANK(4,1) WITHIN GROUP (ORDER BY num, odd) FROM test4 GROUP BY odd; ODD DENSE_RANK(4,1)WITHINGROUP(ORDERBYNUM,ODD) ---------- ------------------------------------------ 0 4 1 3 In group odd=0, the new row comes after (0,0), (2,0), and (4,0), and thus it is position 4. In group odd=1, the new row follows (1,1), (3,1), and a duplicate (3,1). In that case, the duplicate occurrences of (3,1) both rank #2, so the new row is ranked #3. Compare this behavior with RANK, which gives a different result.
MIN(expression) and MAX(expression) find the minimum and maximum value of expression (string, datetime, or numeric) in a set of rows. DISTINCT or ALL may be used with these functions, but do not affect the result. SQL2003 SyntaxMIN( [ALL | DISTINCT] expression ) MAX( [ALL | DISTINCT] expression ) PostgreSQL and SQL ServerThese platforms support the SQL2003 syntax of MIN and MAX. DB2 and OracleDB2 and Oracle support the ANSI syntax and implements the following analytic syntax: MIN ({ALL|[DISTINCT] expression}) OVER (window_clause) MAX ({ALL|[DISTINCT] expression}) OVER (window_clause) For an explanation of the window_clause, see the section later in this chapter titled Section 4.3. MySQLMySQL supports the SQL2003 syntax of MIN and MAX. MySQL also supports the functions LEAST( ) and GREATEST( ), providing the same capabilities. ExamplesThe following query finds the best and worst sales for any title on record: SELECT MIN(ytd_sales), MAX(ytd_sales) FROM titles; Aggregate functions are used often in the HAVING clause of queries with GROUP BY. The following query selects all categories (types) of books that have an average price for all books in the category higher than $15.00: SELECT type 'Category', AVG( price ) 'Average Price' FROM titles GROUP BY type HAVING AVG(price) > 15
Generates a relative rank for a hypothetical row by dividing that row's rank less 1 by the number of rows in the group. SQL2003 SyntaxIn the following syntax, items in the value_list correspond by position to items in the sort_list. Therefore, both lists must have the same number of expressions. PERCENT_RANK(value_list) WITHIN GROUP (ORDER BY sort_list) value_list ::= expression [,expression...] sort_list ::= sort_item [,sort_item...] sort_item ::= expression [ASC|DESC] [NULLS FIRST|NULLS LAST] OracleOracle follows the SQL2003 syntax and implements the following syntax: PERCENT_RANK( ) OVER ([partioning] ordering) For an explanation of the partioning and order clauses, see the section later in this chapter titled Section 4.3. DB2, MySQL, PostgreSQl, and SQL ServerThese platforms do not implement the PERCENT_RANK aggregate function. ExampleThe following example determines the percentage rank of the hypothetical new row (num=4, odd=1) within each group of rows from test4, where groups are distinguished by the values in the odd column: SELECT * FROM test4; NUM ODD ---------- ---------- 0 0 1 1 2 0 3 1 3 1 4 0 5 1 SELECT odd, PERCENT_RANK(4,1) WITHIN GROUP (ORDER BY num, odd) FROM test4 GROUP BY odd; ODD PERCENT_RANK(4,1)WITHINGROUP(ORDERBYNUM,ODD) ---------- -------------------------------------------- 0 1 1 .75 In group odd=0, the new row comes after (0,0), (2,0), and (4,0), and thus it is position 4. The rank computation is: (4th rank - 1)/3 rows = 100%. In group odd=1, the new row follows (1,1), (3,1), and a duplicate (3,1), and is again ranked at #4. The rank computation for odd=1 is: (4th rank - 1)/4 rows = 3/4 = 75%.
Generates an interpolated value corresponding to a percentile that you specify. SQL2003 SyntaxIn the following syntax, percentile is a number between zero and one: PERCENTILE_CONT(percentile) WITHIN GROUP (ORDER BY sort_list) sort_list ::= sort_item [,sort_item...] sort_item ::= expression [ASC|DESC] [NULLS FIRST|NULLS LAST] OracleOracle allows only one expression in the ORDER BY clause: PERCENTILE_CONT(percentile) WITHIN GROUP (ORDER BY expression) Oracle also allows some use of windowing syntax: PERCENTILE_CONT (percentile) WITHIN GROUP (ORDER BY sort_list) OVER (partitioning) See Section 4.3 later in this chapter for a description of partitioning. DB2, MySQL, PostgreSQL, and SQL ServerThese platforms do not implement PERCENTILE_CONT. ExampleThe following example groups the data in test4 by the column named odd, and invokes PERCENTILE_CONT to return a 50th percentile value for each group: SELECT * FROM test4; NUM ODD ---------- ---------- 0 0 1 1 2 0 3 1 3 1 4 0 5 1 SELECT odd, PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY NUM) FROM test4 GROUP BY odd; ODD PERCENTILE_CONT(0.50)WITHINGROUP(ORDERBYNUM) ---------- -------------------------------------------- 0 2 1 3
Determines the value in a group with the smallest cumulative distribution greater than or equal to a percentile that you specify. SQL2003 SyntaxIn the following syntax, percentile is a number between zero and one: PERCENTILE_DISC(percentile) WITHIN GROUP (ORDER BY sort_list) sort_list ::= sort_item [,sort_item...] sort_item ::= expression [ASC|DESC] [NULLS FIRST|NULLS LAST] OracleOracle allows only one expression in the ORDER BY clause: PERCENTILE_DISC(percentile) WITHIN GROUP (ORDER BY expression) Oracle also allows some use of windowing syntax: PERCENTILE_DISC (percentile) WITHIN GROUP (ORDER BY sort_list) OVER (partitioning) See Section 4.3 later in this chapter for a description of partitioning. DB2, MySQL, PostgreSQL, and SQL ServerThese platforms do not implement PERCENTILE_DISC. ExampleThe following example is similar to that for PERCENTILE_CONT, except that it returns, for each group, the value closest, but not exceeding, the 60th percentile: SELECT * FROM test4; NUM ODD ---------- ---------- 0 0 1 1 2 0 3 1 3 1 4 0 5 1 SELECT odd, PERCENTILE_DISC(0.60) WITHIN GROUP (ORDER BY NUM) FROM test4 GROUP BY odd; PERCENTILE_CONT(0.50)WITHINGROUP(ORDERBYNUM) -------------------------------------------- 2 3
Computes a rank in a group for a hypothetical row that you supply. This is not a dense rank. If the group contains rows that rank identically, then it's possible for ranks to be skipped. If you want a dense rank, use the DENSE_RANK function. SQL2003 SyntaxIn the following syntax, items in the value_list correspond by position to items in the sort_list. Therefore, both lists must have the same number of expressions. RANK(value_list) WITHIN GROUP (ORDER BY sort_list) value_list ::= expression [,expression...] sort_list ::= sort_item [,sort_item...] sort_item ::= expression [ASC|DESC] [NULLS FIRST|NULLS LAST] OracleOracle follows the SQL2003 syntax and implements the following analytic syntax: RANK( ) OVER ([partitioning] ordering) For an explanation of the partioning and order clauses, see the section later in this chapter titled Section 4.3. DB2, MySQL, PostgreSQl, and SQL ServerThese platforms do not implement the RANK aggregate function. ExampleThe following example determines the rank of the hypothetical new row (num=4, odd=1) within each group of rows from test4, where groups are distinguished by the values in the odd column: SELECT * FROM test4; NUM ODD ---------- ---------- 0 0 1 1 2 0 3 1 3 1 4 0 5 1 SELECT odd, RANK(4,1) WITHIN GROUP (ORDER BY num, odd) FROM test4 GROUP BY odd; ODD RANK(4,1)WITHINGROUP(ORDERBYNUM,ODD) ---------- ------------------------------------ 0 4 1 4 In both cases, the rank of the hypothetical new row is 4. In group odd=0, the new row comes after: (0,0), (2,0), and (4,0), and thus it is position 4. In group odd=1, the new row follows (1,1), (3,1), and a duplicate (3,1). In that case, the new row is preceding by three rows, so it is ranked #4. Compare this behavior with DENSE_RANK.
SQL2003 defines a family of functions, having names beginning with REGR_, that relate to different aspects of linear regression. The functions work in the context of a least-squares regression line. SQL2003 SyntaxFollowing is the syntax and a brief description of each REGR_ function:
The REGR_ functions only work on number pairs containing two non-NULL values. Any number pair with one or more NULL values will be ignored. DB2 and OracleDB2 and Oracle support the SQL2003 syntax for all REGR_ functions. In addition, DB2 allows the shortened name REGR_ICPT in place of REGR_INTERCEPT. Oracle supports the following analytic syntax: REGR_function ( dependent, independent ) OVER (window_clause) For an explanation of the window_clause, see the section later in this chapter titled Section 4.3. MySQL, PostgreSQL, and SQL ServerThese platforms do not implement the REGR family of functions. ExampleThe following REGEXP_COUNT example demonstrates that any pair with one or more NULL values is ignored. The table test3 contains three non-NULL number pairs, and three other pairs having at least one NULL: SQL> SELECT * FROM test3; Y X ---------- ---------- 1 3 2 2 3 1 4 NULL NULL 4 NULL NULL The REGR_COUNT function ignores the pairs having NULLs, counting only those pairs with non-NULL values: SELECT REGR_COUNT(y,x) FROM test3; REGR_COUNT(Y,X) --------------- 3 Likewise, all other REGR_ functions filter out any pairs having NULL values before performing their respective computations.
Use STDDEV_POP to find the population standard deviation within a group of numeric values. SQL2003 SyntaxSTDDEV_POP( numeric_expression ) DB2 and MySQLUse the STDDEV function. In DB2 and MySQL, STDDEV returns the population standard deviation. PostgreSQLThis platform does not provide a function to compute population standard deviation. OracleOracle supports the standard syntax and the following analytic syntax: STDDEV_POP (numeric_expression) OVER (window_clause) For an explanation of the window_clause, see the section later in this chapter titled Section 4.3. SQL ServerUse the STDEVP function. ExampleThe following example computes the population standard deviation for the values 1, 2, and 3: SELECT * FROM test; X ---------- 1 2 3 SELECT STDDEV_POP(x) FROM test; STDDEV_POP(X) ------------- .816496581
Use STDDEV_SAMP to find the sample standard deviation within a group of numeric values. SQL2003 SyntaxSTDDEV_SAMP( numeric_expression ) OracleOracle supports the standard syntax. Oracle also provides the STDDEV function, which operates similar to STDDEV_SAMP except that it returns zero as not NULL when there is only one value in the set. Oracle also supports analytic syntax: STDDEV_SAMP (numeric_expression) OVER (window_clause) For an explanation of the window_clause, see the section later in this chapter titled Section 4.3. DB2This platform does not provide a function to compute sample standard deviation. MySQLMySQL does not provide a function to compute sample standard deviation. MySQL does provide a function named STDDEV, but it returns the population standard deviation. PostgreSQLUse STDDEV. SQL ServerUse STDEV (with only one D!). ExampleThe following example computes the sample standard deviation for the values 1, 2, and 3: SELECT * FROM test; X ---------- 1 2 3 SELECT STDDEV_SAMP(x) FROM test; STDDEV_SAMP(X) -------------- 1
Use VAR_POP to compute the population variance of a set of values. SQL2003 SyntaxVAR_POP( numeric_expression ) DB2 and PostgreSQLThese platforms do not provide a function to compute population variance. MySQLUse the VARIANCE function, which in MySQL returns the population variance. OracleOracle supports the standard syntax and the following analytic syntax: VAR_POP (numeric_expression) OVER (window_clause) For an explanation of the window_clause, see the section later in this chapter titled Section 4.3. SQL ServerUse the VARP function. ExampleThe following example computes the population variance for the values 1, 2, and 3: SELECT * FROM test; X ---------- 1 2 3 SELECT VAR_POP(x) FROM test; VAR_POP(X) ---------- .666666667
Use VAR_SAMP to compute the sample variance of a set of values. SQL2003 SyntaxVAR_SAMP( numeric_expression ) DB2 and PostgreSQLUse VARIANCE( numeric_expression ) to compute sample variance. MySQLMySQL provides no function for computing sample variance. There is the VARIANCE function, but in MySQL that function returns the population variance. OracleOracle supports the standard syntax. You may also use the VARIANCE function, which differs from VAR_SAMP by returning zero (and not NULL) for sets having only a single value. Oracle also supports analytic syntax: VAR_SAMP (numeric_expression) OVER (window_clause) For an explanation of the window_clause, see the section later in this chapter titled Section 4.3. SQL ServerUse the VAR function. ExampleThe following example computes the sample variance for the values 1, 2, and 3: SELECT * FROM test; X ---------- 1 2 3 SELECT VAR_SAMP(x) FROM test; VAR_SAMP(X) ----------- 1 |
![]() |
Table of Contents |
![]() |