This tip may sound obvious
to most of you, but I have seen professional developers, in two major SQL
Server-based applications used worldwide, not follow it. And that is to
always include a WHERE clause in your SELECT statement to narrow
the number of rows returned. If you don't use a WHERE clause, then SQL
Server will perform a table scan of your table and return all of the rows.
In some case you may want to return all rows, and not using a WHERE clause
is appropriate in this case. But if you don't need all the rows returned,
use a WHERE clause to limit the number of rows returned.
By returning data you don't
need, you are causing SQL Server to perform I/O it doesn't need to
perform, wasting SQL Server resources. In addition, it increases network
traffic, which can also lead to reduced performance. And if the table is
very large, a table scan will lock the table during the time-consuming
scan, preventing other users from accessing it, hurting
concurrency.
Another negative aspect of a
table scan is that it will tend to flush out data pages from the cache
with useless data, which reduces SQL Server's ability to reuse useful data
in the cache, which increases disk I/O and hurts performance. [6.5, 7.0,
2000] Updated 4-17-2003
*****
To help identify long
running queries, use the SQL Server Profiler Create Trace Wizard to
run the "TSQL By Duration" trace. You can specify the length of the long
running queries you are trying to identify (such as over 1000
milliseconds), and then have these recorded in a log for you to
investigate later. [7.0] More info from
Microsoft
*****
When using the UNION
statement,
keep in mind that, by default, it performs the equivalent of a SELECT
DISTINCT on the final result set. In other words, UNION takes the results of two
like recordsets, combines them, and then performs a SELECT DISTINCT in
order to eliminate any duplicate rows. This process occurs even if there
are no duplicate records in the final recordset. If you know that there
are duplicate records, and this presents a problem for your application,
then by all means use the UNION statement to eliminate the duplicate
rows.
On the other hand, if you
know that there will never be any duplicate rows, or if there are, and
this presents no problem to your application, then you should use the
UNION ALL statement instead of the UNION statement. The advantage of the
UNION ALL is that is does not
perform the SELECT DISTINCT function, which saves a lot of unnecessary SQL
Server resources from being using. [6.5, 7.0, 2000] Updated
10-30-2003
*****
Sometimes you might want to
merge two or more sets of data resulting from two or more
queries
using UNION. For example:
SELECT
column_name1, column_name2 FROM table_name1 WHERE column_name1 =
some_value UNION SELECT column_name1, column_name2 FROM
table_name1 WHERE column_name2 = some_value
This same query can be
rewritten, like the following example, and when doing so, performance will
be boosted:
SELECT
DISTINCT column_name1, column_name2 FROM table_name1 WHERE
column_name1 = some_value OR column_name2 = some_value
And if you can assume that
neither criteria will return duplicate rows, you can even further boost
the performance of this query by removing the DISTINCT clause. [6.5, 7.0,
2000] Added 6-5-2003
*****
Carefully evaluate whether
your SELECT query needs the DISTINCT clause or not. Some developers
automatically add this clause to every one of their SELECT statements,
even when it is not necessary. This is a bad habit that should be
stopped.
The DISTINCT clause should
only be used in SELECT statements if you know that duplicate returned rows
are a possibility, and that having duplicate rows in the result set would
cause problems with your application.
The DISTINCT clause creates
a lot of extra work for SQL Server, and reduces the physical resources
that other SQL statements have at their disposal. Because of this, only
use the DISTINCT clause if it is necessary. [6.5, 7.0, 2000] Updated
10-30-2003
*****
In your queries, don't
return column data you don't need. For example, you should not use
SELECT * to return all the columns from a table if you don't need all the
data from each column. In addition, using SELECT * prevents the use of
covered indexes, further potentially hurting query performance. [6.5, 7.0,
2000] Updated 2-8-2002
*****
If your application allows
users to run queries, but you are unable in your application to easily
prevent users from returning hundreds, even thousands of unnecessary rows
of data they don't need, consider using the TOP operator within the
SELECT statement. This way, you can limit how may rows are returned, even
if the user doesn't enter any criteria to help reduce the number or rows
returned to the client. For example, the statement:
SELECT
TOP 100 fname, lname FROM customers WHERE state = 'mo'
limits the results to the
first 100 rows returned, even if 10,000 rows actually meet the criteria of
the WHERE clause. When the specified number of rows is reached, all
processing on the query stops, potentially saving SQL Server overhead, and
boosting performance.
The TOP operator works by
allowing you to specify a specific number of rows to be returned, like the
example above, or by specifying a percentage value, like this:
SELECT
TOP 10 PERCENT fname, lname FROM customers WHERE state = 'mo'
In the above example, only
10 percent of the available rows would be returned.
Keep in mind that using this
option may prevent the user from getting the data they need. For example,
the data the are looking for may be in record 101, but they only get to
see the first 100 records. Because of this, use this option with
discretion. [7.0, 2000] Updated 10-30-2003
*****
You may have heard of a SET
command called SET ROWCOUNT. Like the TOP operator, it is designed
to limit how many rows are returned from a SELECT statement. In effect,
the SET ROWCOUNT and the TOP operator perform the same
function.
While is most cases, using
either option works equally efficiently, there are some instances (such as
rows returned from an unsorted heap) where the TOP operator is more
efficient than using SET ROWCOUNT. Because of this, using the TOP operator
is preferable to using SET ROWCOUNT to limit the number of rows returned
by a query. [6.5, 7.0, 2000] Updated 10-30-2003
*****
In a WHERE clause, the
various operators used directly affect how fast a query is
run. This
is because some operators lend themselves to speed over other operators.
Of course, you may not have any choice of which operator you use in your
WHERE clauses, but sometimes you do.
Here are the key operators
used in the WHERE clause, ordered by their performance. Those operators at
the top will produce results faster than those listed at the
bottom.
This lesson here is to use =
as much as possible, and <> as least as possible. [6.5, 7.0, 2000]
Added 5-30-2003
*****
In a WHERE clause, the
various operands used directly affect how fast a query is
run. This
is because some operands lend themselves to speed over other operands. Of
course, you may not have any choice of which operand you use in your WHERE
clauses, but sometimes you do.
Here are the key operands
used in the WHERE clause, ordered by their performance. Those operands at
the top will produce results faster than those listed at the
bottom.
- A single literal used by
itself on one side of an operator
- A single column name used
by itself on one side of an operator, a single parameter used by itself
on one side of an operator
- A multi-operand expression
on one side of an operator
- A single exact number on
one side of an operator
- Other numeric number
(other than exact), date and time
- Character data,
NULLs
The simpler the operand, and
using exact numbers, provides the best overall performance. [6.5, 7.0,
2000] Added 5-30-2003
*****
If a WHERE clause includes
multiple expressions, there is generally no performance benefit gained by
ordering the various expressions in any particular order. This is because the SQL
Server Query Optimizer does this for you, saving you the effort. There are
a few exceptions to this, which are discussed on this web site[7.0, 2000]
Added 5-30-2003
*****
Don't include code that
doesn't do anything. This may sound obvious,
but I have seen this in some off-the-shelf SQL Server-based applications.
For example, you may see code like this:
SELECT
column_name FROM table_name WHERE 1 = 0
When this query is run, no
rows will be returned. Obviously, this is a simple example (and most of
the cases where I have seen this done have been very long queries), a
query like this (or part of a larger query) like this doesn't perform
anything useful, and shouldn't be run. It is just wasting SQL Server
resources. In addition, I have seen more than one case where such dead
code actually causes SQL Server to through errors, preventing the code
from even running. [6.5, 7.0, 2000] Added 5-30-2003
*****
By default, some developers,
especially those who have not worked with SQL Server before, routinely
include code similar to this in their WHERE clauses when they make string
comparisons:
SELECT
column_name FROM table_name WHERE LOWER(column_name) =
'name'
In other words, these
developers are making the assuming that the data in SQL Server is
case-sensitive, which it generally is not. If your SQL Server database
is not configured to be case sensitive, you don't need to use LOWER or
UPPER to force the case of text to be equal for a comparison to be
performed. Just leave these functions out of your code. This will
speed up the performance of your query, as any use of text functions in a
WHERE clause hurts performance.
But what if your database
has been configured to be case-sensitive? Should you then use the LOWER
and UPPER functions to ensure that comparisons are properly compared? No.
The above example is still poor coding. If you have to deal with ensuring
case is consistent for proper comparisons, use the technique described
below, along with appropriate indexes on the column in
question:
SELECT
column_name FROM table_name WHERE column_name = 'NAME' or column_name =
'name'
This code will run much
faster than the first example. [6.5, 7.0, 2000] Added
5-30-2003
*****
Try to avoid WHERE
clauses that are non-sargable. The term "sargable" (which is in effect
a made-up word) comes from the pseudo-acronym "SARG", which stands for
"Search ARGument," which refers to a WHERE clause that
compares a column to a constant value. If a WHERE clause is sargable, this
means that it can take advantage of an index (assuming one is available)
to speed completion of the query. If a WHERE clause is non-sargable, this
means that the WHERE clause (or at least part of it) cannot take advantage
of an index, instead performing a table/index scan, which may cause the
query's performance to suffer.
Non-sargable search
arguments in the WHERE clause, such as "IS NULL", "<>", "!=",
"!>", "!<", "NOT", "NOT EXISTS", "NOT IN", "NOT LIKE", and
"LIKE '%500'" generally prevents (but not always) the query
optimizer from using an index to perform a search. In addition,
expressions that include a function on a column, expressions that have the
same column on both sides of the operator, or comparisons against a column
(not a constant), are not sargable.
But not every WHERE clause
that has a non-sargable expression in it is doomed to a table/index scan.
If the WHERE clause includes both sargable and non-sargable clauses, then
at least the sargable clauses can use an index (if one exists) to help
access the data quickly.
In many cases, if there is a
covering
index on the table, which includes all of the columns in the
SELECT, JOIN, and WHERE clauses in a query, then the covering index can be
used instead of a table/index scan to return a query's data, even if it
has a non-sargable WHERE clause. But keep in mind that covering indexes
have their own drawbacks, such as producing very wide indexes that
increase disk I/O when they are read.
In some cases, it may be
possible to rewrite a non-sargable WHERE clause into one that is sargable.
For example, the clause:
WHERE
SUBSTRING(firstname,1,1) = 'm'
can be rewritten like
this:
WHERE
firstname like 'm%'
Both of these WHERE clauses
produce the same result, but the first one is non-sargable (it uses a
function) and will run slow, while the second one is sargable, and will
run much faster.
WHERE clauses that perform
some function on a column are non-sargable. On the other hand, if you can
rewrite the WHERE clause so that the column and function are separate,
then the query can use an available index, greatly boosting performance.
for example:
Function Acts Directly on
Column, and Index Cannot Be Used:
SELECT
member_number, first_name, last_name FROM members WHERE
DATEDIFF(yy,datofbirth,GETDATE()) > 21
Function Has Been Separated
From Column, and an Index Can Be Used:
SELECT
member_number, first_name, last_name FROM members WHERE dateofbirth
< DATEADD(yy,-21,GETDATE())
Each of the above queries
produces the same results, but the second query will use an index because
the function is not performed directly on the column, as it is in the
first example. The moral of this story is to try to rewrite WHERE clauses
that have functions so that the function does not act directly on the
column.
WHERE clauses that use NOT
are not sargable, but can often be rewritten to remove the NOT from the
WHERE clause, for example:
WHERE NOT
column_name > 5
to
WERE
column_name <= 5
Each of the above clauses
produce the same results, but the second one is sargable.
If you don't know if a
particular WHERE clause is sargable or non-sargable, check out the query's
execution plan in Query Analyzer. Doing this, you can very quickly see if
the query will be using index lookups or table/index scans to return your
results.
With some careful analysis,
and some clever thought, many non-sargable queries can be written so that
they are sargable. Your goal for best performance (assuming it is
possible) is to get the left side of a search condition to be a single
column name, and the right side an easy to look up value. [6.5, 7.0, 2000]
Updated 6-2-2003
*****
If you run into a situation
where a WHERE clause is not sargable because of the use of a
function on
the right side of an equality sign (and there is no other way to rewrite
the WHERE clause), consider creating an index on a computed column
instead. This way, you avoid the non-sargable WHERE clause altogether,
using the results of the function in your WHERE clause instead. Because of
the additional overhead required for indexes on computed columns, you will
only want to do this if you need to run this same query over and over in
your application, thereby justifying the overhead of the indexed computed
column. [2000] Added 8-13-2002
*****
If you currently have a
query that uses NOT IN, which offers poor
performance because the SQL Server optimizer has to use a nested table
scan to perform this activity, instead try to use one of the following
options instead, all of which offer better performance:
- Use EXISTS or NOT
EXISTS
- Use
IN
- Perform a LEFT OUTER JOIN
and check for a NULL condition
[6.5, 7.0, 2000] Updated
10-30-2003
*****
When you have a choice of
using the IN or the EXISTS clause in your Transact-SQL, you
will generally want to use the EXISTS clause, as it is usually more
efficient and performs faster. [6.5, 7.0, 2000] Updated
10-30-2003
*****
If you find that SQL Server
uses a TABLE SCAN instead of an INDEX SEEK when you use an IN or OR clause
as part of your WHERE clause, even when those columns
are covered by an index, consider using an index hint to force the Query
Optimizer to use the index.
For example:
SELECT *
FROM tblTaskProcesses WHERE nextprocess = 1 AND processid IN (8,32,45)
takes about 3 seconds,
while:
SELECT *
FROM tblTaskProcesses (INDEX = IX_ProcessID) WHERE nextprocess = 1 AND
processid IN (8,32,45)
returns in under a second.
[7.0, 2000] Added 3-18-2002 Contributed by David
Ames
*****
If you use LIKE in your
WHERE clause, try to use one or more
leading character in the clause, if at all possible. For example,
use:
LIKE
'm%'
not:
LIKE
'%m'
If you use a leading
character in your LIKE clause, then the Query Optimizer has the ability to
potentially use an index to perform the query, speeding performance and
reducing the load on SQL Server.
But if the leading character
in a LIKE clause is a wildcard, the Query Optimizer will not be able to
use an index, and a table scan must be run, reducing performance and
taking more time.
The more leading characters
you can use in the LIKE clause, the more likely the Query Optimizer will
find and use a suitable index. [6.5, 7.0, 2000] Updated
10-30-2003
*****
If your application needs to
retrieve summary data often, but you don't want to have
the overhead of calculating it on the fly every time it is needed,
consider using a trigger that updates summary values after each
transaction into a summary table.
While the trigger has some
overhead, overall, it may be less that having to calculate the data every
time the summary data is needed. You may have to experiment to see which
methods is fastest for your environment. [6.5, 7.0, 2000] Updated
10-30-2003
*****
If your application needs to
insert a large binary value into an image data column, perform this
task using a stored procedure, not using an INSERT statement embedded in
your application.
The reason for this is
because the application must first convert the binary value into a
character string (which doubles its size, thus increasing network traffic
and taking more time) before it can be sent to the server. And when the
server receives the character string, it then has to convert it back to
the binary format (taking even more time).
Using a stored procedure
avoids all this because all the activity occurs on the SQL Server, and
little data is transmitted over the network. [6.5, 7.0, 2000] Updated
10-30-2003
*****
When you have a choice of
using the IN or the BETWEEN clauses in your Transact-SQL, you
will generally want to use the BETWEEN clause, as it is much more
efficient. For example:
SELECT
customer_number, customer_name FROM customer WHERE customer_number
in (1000, 1001, 1002, 1003, 1004)
is much less efficient than
this:
SELECT
customer_number, customer_name FROM customer WHERE customer_number
BETWEEN 1000 and 1004
Assuming there is a useful
index on customer_number, the Query Optimizer can locate a range of
numbers much faster (using BETWEEN) than it can find a series of numbers
using the IN clause (which is really just another form of the OR clause).
[6.5, 7.0, 2000] Updated 10-30-2003
*****
If possible, try to avoid
using the SUBSTRING function in your WHERE clauses. Depending on how it is
constructed, using the SUBSTRING function can force a table scan instead
of allowing the optimizer to use an index (assuming there is one). If the
substring you are searching for does not include the first character of
the column you are searching for, then a table scan is
performed.
If possible, you should
avoid using the SUBSTRING function and use the LIKE condition instead, for
better performance.
Instead of doing
this:
WHERE
SUBSTRING(column_name,1,1) = 'b'
Try using this
instead:
WHERE
column_name LIKE 'b%'
If you decide to make this
choice, keep in mind that you will want your LIKE condition to be
sargable, which means that you cannot place a wildcard in the first
position. [6.5, 7.0, 2000] Updated 6-4-2003
*****
Where possible, avoid
string concatenation in your Transact-SQL code, as it is not a fast
process, contributing to overall slower performance of your application.
[6.5, 7.0, 2000] Updated 10-30-2003
*****
Generally, avoid using
optimizer hints in your queries. This is because it is generally very
hard to outguess the Query Optimizer. Optimizer hints are special keywords
that you include with your query to force how the Query Optimizer runs. If
you decide to include a hint in a query, this forces the Query Optimizer
to become static, preventing the Query Optimizer from dynamically adapting
to the current environment for the given query. More often than not, this
hurts, not helps performance.
If you think that a hint
might be necessary to optimize your query, be sure you first do all of the
following first:
- Update the statistics on
the relevant tables.
- If the problem query is
inside a stored procedure, recompile it.
- Review the search
arguments to see if they are sargable, and if not, try to rewrite them
so that they are sargable.
- Review the current
indexes, and make changes if necessary.
If you have done all of the
above, and the query is not running as you expect, then you may want to
consider using an appropriate optimizer hint.
If you haven't heeded my
advice and have decided to use some hints, keep in mind that as your data
changes, and as the Query Optimizer changes (through service packs and new
releases of SQL Server), your hard-coded hints may no longer offer the
benefits they once did. So if you use hints, you need to periodically
review them to see if they are still performing as expected. [6.5, 7.0,
2000] Updated 3-6-2001
*****
If you have a WHERE clause
that includes expressions connected by two or more AND operators, SQL
Server will evaluate them from left to right in the order they are
written.
This assumes that no parenthesis have been used to change the order of
execution. Because of this, you may want to consider one of the following
when using AND:
- Locate the least likely
true AND expression first. This way, if the AND expression is false, the
clause will end immediately, saving time.
- If both parts of an AND
expression are equally likely being false, put the least complex AND
expression first. This way, if it is false, less work will have to be
done to evaluate the expression.
You may want to consider
using Query Analyzer to look at the execution plans of your queries to see
which is best for your situation. [6.5, 7.0, 2000] Updated
3-6-2001
*****
If you want to boost the
performance of a query that includes an AND operator in the WHERE
clause, consider the following:
- Of the search criterions
in the WHERE clause, at least one of them should be based on a highly
selective column that has an index.
- If at least one of the
search criterions in the WHERE clause is not highly selective, consider
adding indexes to all of the columns referenced in the WHERE
clause.
- If none of the column in
the WHERE clause are selective enough to use an index on their own,
consider creating a covering index for this query.
[7.0, 2000] Updated
2-8-2002
*****
The Query Optimizer will
perform a table scan or a clustered index scan on a table if the WHERE
clause in the query contains an OR operator and if any of the referenced
columns in the OR clause are not indexed (or does not have a useful
index). Because of this, if you use many queries with OR clauses, you will
want to ensure that each referenced column in the WHERE clause has a
useful index. [7.0, 2000] Added 10-17-2000
*****
A query with one or more OR
clauses can sometimes be rewritten as a series of queries that are
combined with a UNION ALL statement, in order to boost the
performance of the query. For example, let's take a look at the following
query:
SELECT
employeeID, firstname, lastname FROM names WHERE dept = 'prod' or
city = 'Orlando' or division = 'food'
This query has three
separate conditions in the WHERE clause. In order for this query to use an
index, then there must be an index on all three columns found in the WHERE
clause.
This same query can be written using UNION ALL instead of
OR, like this example:
SELECT
employeeID, firstname, lastname FROM names WHERE dept = 'prod' UNION
ALL SELECT employeeID, firstname, lastname FROM names WHERE city =
'Orlando' UNION ALL SELECT employeeID, firstname, lastname FROM
names WHERE division = 'food'
Each of these
queries will produce the same results. If there is only an index on dept,
but not the other columns in the WHERE clause, then the first version will
not use any index and a table scan must be performed. But in the second
version of the query will use the index for part of the query, but not for
all of the query.
Admittedly, this is a very simple example, but
even so, it does demonstrate how rewriting a query can affect whether or
not an index is used or not. If this query was much more complex, then the
approach of using UNION ALL might be must more efficient, as it allows you
to tune each part of the index individually, something that cannot be done
if you use only ORs in your query.
Note, that I am using UNION
ALL instead of UNION. The reason for this is to
prevent the UNION statement from trying to sort the data and remove
duplicates, which hurts performance. Of course, if there is the
possibility of duplicates, and you want to remove them, then of course you
can use just UNION.
If you have a query that uses ORs and it not
making the best use of indexes, consider rewriting it as a UNION ALL, and
then testing performance. Only through testing can you be sure that one
version of your query will be faster than another. [7.0, 2000] Added
2-8-2002
*****
Don't use ORDER BY in your
SELECT statements unless you really need to, as it adds a lot of extra
overhead. For example, perhaps it may be more efficient to sort the data
at the client than at the server. In other cases, perhaps the client
doesn't even need sorted data to achieve its goal. The key here is to
remember that you shouldn't automatically sort data, unless you know it is
necessary. [6.5, 7.0, 2000] Updated
6-4-2003
*****
Whenever SQL Server has to
perform a sorting operation, additional resources have to be used to
perform this task. Sorting often occurs when
any of the following Transact-SQL statements are executed:
- ORDER BY
- GROUP BY
- SELECT
DISTINCT
- UNION
- CREATE INDEX (generally
not as critical as happens much less often)
In many cases, these
commands cannot be avoided. On the other hand, there are few ways that
sorting overhead can be reduced. These include:
- Keep the number of rows to
be sorted to a minimum. Do this by only returning those rows that
absolutely need to be sorted.
- Keep the number of columns
to be sorted to the minimum. In other words, don't sort more columns
that required.
- Keep the width (physical
size) of the columns to be sorted to a
minimum.
- Sort column with number
datatypes instead of character datatypes.
When using any of the above
Transact-SQL commands, try to keep the above performance-boosting
suggestions in mind. [6.5, 7.0, 2000] Added
6-5-2003
*****
If you have to sort by a
particular column often, consider making that
column a clustered index. This is because the data is already presorted
for you and SQL Server is smart enough not to resort the data. [6.5, 7.0,
2000] Added 6-5-2003
*****
If your SELECT statement
includes an IN operator along with a list of values
to be tested in the query, order the list of values so that the most
frequently found values are placed at the first of the list, and the less
frequently found values are placed at the end of the list. This can speed
performance because the IN option returns true as soon as any of the
values in the list produce a match. The sooner the match is made, the
faster the query completes. [6.5, 7.0, 2000] Added
11-27-2000
*****
If you need to use the
SELECT INTO option, keep in mind that it can lock system tables,
preventing others users from accessing the data they need. If you do need
to use SELECT INTO, try to schedule it when your SQL Server is less busy,
and try to keep the amount of data inserted to a minimum. [6.5, 7.0, 2000]
Added 11-28-2000
*****
If your SELECT statement
contains a HAVING clause, write your query so that
the WHERE clause does most of the work (removing undesired rows) instead
of the HAVING clause do the work of removing undesired rows. Using the
WHERE clause appropriately can eliminate unnecessary rows before they get
to the GROUP BY and HAVING clause, saving some unnecessary work, and
boosting performance.
For example, in a SELECT
statement with WHERE, GROUP BY, and HAVING clauses, here's what happens.
First, the WHERE clause is used to select the appropriate rows that need
to be grouped. Next, the GROUP BY clause divides the rows into sets of
grouped rows, and then aggregates their values. And last, the HAVING
clause then eliminates undesired aggregated groups. If the WHERE clause is
used to eliminate as many of the undesired rows as possible, this means
the GROUP BY and the HAVING clauses will have less work to do, boosting
the overall performance of the query. [6.5, 7.0, 2000]
Added 12-11-2000
*****
If your application performs
many wildcard (LIKE %) text searches on CHAR or VARCHAR columns,
consider using SQL Server's full-text search option. The Search Service
can significantly speed up wildcard searches of text stored in a database.
[7.0, 2000] Updated 1-12-2001
*****
The GROUP BY clause can be
used with or without an aggregate function. But if you want optimum
performance, don't use the GROUP BY clause without an aggregate
function. This is because you can accomplish the same end result by
using the DISTINCT option instead, and it is faster.
For example, you could write
your query two different ways:
USE
Northwind SELECT OrderID FROM [Order Details] WHERE UnitPrice
> 10 GROUP BY OrderID
or
USE
Northwind SELECT DISTINCT OrderID FROM [Order Details] WHERE
UnitPrice > 10
Both of the above queries
produce the same results, but the second one will use less resources and
perform faster. [6.5, 7.0, 2000] Added 1-12-2001
*****
The GROUP BY clause can be
sped up if you follow these suggestion:
- Keep the number of rows
returned by the query as small as possible.
- Keep the number of
groupings as few as possible.
- Don't group redundant
columns.
- If there is a JOIN in the
same SELECT statement that has a GROUP BY, try to rewrite the query to
use a subquery instead of using a JOIN. If this is possible, performance
will be faster. If you have to use a JOIN, try to make the GROUP BY
column from the same table as the column or columns on which the set
function is used.
- Consider adding an ORDER
BY clause to the SELECT statement that orders by the same column as the
GROUP BY. This may cause the GROUP BY to perform faster. Test this to
see if is true in your particular situation.
[7.0, 2000] Added
6-6-2003
*****
Sometimes perception is more
important that reality. For example, which of the
following two queries is the fastest:
- A query that takes 30
seconds to run, and then displays all of the required
results.
- A query that takes 60
seconds to run, but displays the first screen full of records in less
than 1 second.
Most DBAs would choose the
first option as it takes less server resources and performs faster. But
from many user's point-of-view, the second one may be more palatable. By
getting immediate feedback, the user gets the impression that the
application is fast, even though in the background, it is not.
If you run into situations
where perception is more important than raw performance, consider using
the FAST query hint. The FAST query hint is used with the SELECT statement
using this form:
OPTION(FAST
number_of_rows)
where number_of_rows
is the number of rows that are to be displayed as fast as
possible.
When this hint is added to a
SELECT statement, it tells the Query Optimizer to return the specified
number of rows as fast as possible, without regard to how long it will
take to perform the overall query. Before rolling out an application using
this hint, I would suggest you test it thoroughly to see that it performs
as you expect. You may find out that the query may take about the same
amount of time whether the hint is used or not. If this the case, then
don't use the hint. [7.0, 2000] Added 3-6-2001
*****
Instead of using temporary
tables, consider using a derived table instead. A derived table is the
result of using a SELECT statement in the FROM clause of an existing
SELECT statement. By using derived tables instead of temporary tables, we
can reduce I/O and boost our application's performance. [7.0, 2000]
Added 3-9-2001 More info on derived
tables.
*****
SQL Server 2000 offers a new
data type called "table." Its main purpose is for the temporary storage of
a set of rows. A variable, of type "table," behaves as if it is a local
variable. And like local variables, it has a limited scope, which is
within the batch, function, or stored procedure in which it was declared.
In most cases, a table variable can be used like a normal table. SELECTs,
INSERTs, UPDATEs, and DELETEs can all be made against a table variable.
For best performance, if you
need a temporary table in your Transact-SQL code, try to use a table
variable instead of creating a conventional temporary table
instead.
Table variables are created and manipulated in memory instead of the
tempdb database, making them much faster. In addition, table variables
found in stored procedures result in fewer compilations (than when using
temporary tables), and transactions using table variables only last as
long as the duration of an update on the table variable, requiring less
locking and logging resources. [2000] Added 8-7-2001
*****
It is fairly common request
to write a Transact-SQL query to to compare a parent table and a child
table and find out if there are any parent records that don't have a
match in the child table. Generally, there are three ways this can be
done:
Using a
NOT EXISTS
SELECT
a.hdr_key FROM hdr_tbl a WHERE NOT EXISTS (SELECT * FROM dtl_tbl b
WHERE a.hdr_key = b.hdr_key)
Using a
Left Join
SELECT
a.hdr_key FROM hdr_tbl a LEFT JOIN dtl_tbl b ON a.hdr_key =
b.hdr_key WHERE b.hdr_key IS NULL
Using a
NOT IN
SELECT
hdr_key FROM hdr_tbl WHERE hdr_key NOT IN (SELECT hdr_key FROM
dtl_tbl)
In each case, the above
query will return identical results. But, which of these three variations
of the same query produces the best performance? Assuming everything else
is equal, the best performing version through the worst performing version
will be from top to bottom, as displayed above. In other words, the NOT
EXISTS variation of this query is generally the most efficient.
I say generally, because the
indexes found on the tables, along with the number of rows in each table,
can influence the results. If you are not sure which variation to try
yourself, you can try them all and see which produces the best results in
your particular circumstances. [7.0, 2000] Added 3-29-2002
*****
Be careful when using OR in
your WHERE clause, it is fairly simple to accidentally retrieve much more
data than you need, which hurts performance. For example, take a look
at the query below:
SELECT
companyid, plantid, formulaid FROM batchrecords WHERE companyid =
'0001' and plantid = '0202' and formulaid = '39988773' OR companyid
= '0001' and plantid = '0202'
As you
can see from this query, the WHERE clause is redundant, as:
companyid
= '0001' and plantid = '0202' and formulaid = '39988773'
is a subset of:
companyid
= '0001' and plantid = '0202'
In other words, this query
is redundant. Unfortuantely, the SQL Server Query Optimizer isn't smart
enough to know this, and will do exactly what you tell it to. What will
happen is that SQL Server will have to retrieve all the data you have
requested, then in effect do a SELECT DISTINCT to remove redundant rows it
unnecessarily finds.
In this case, if you drop
this code from the query:
OR companyid
= '0001' and plantid = '0202'
then run the query, you will
receive the same results, but with much faster performance. [6.5, 7.0,
2000] Added 5-22-2002
*****
If you need to verify the
existence of a record in a table, don't use SELECT COUNT(*)
in your Transact-SQL code to identify it, which is very inefficient and
wastes server resources. Instead, use the Transact-SQL IF EXITS to
determine if the record in question exits, which is much more efficient.
For example:
Here's how you might use
COUNT(*):
IF (SELECT
COUNT(*) FROM table_name WHERE column_name = 'xxx')
Here's a faster way, using
IF EXISTS:
IF EXISTS
(SELECT * FROM table_name WHERE column_name = 'xxx')
The reason IF EXISTS is
faster than COUNT(*) is because the query can end immediately when the
text is proven true, while COUNT(*) must count go through every record,
whether there is only one, or thousands, before it can be found to be
true. [7.0, 2000] Updated 4-1-2003
*****
Let's say that you often
need to INSERT the same value into a column. For example, perhaps you
have to perform 100,000 INSERTs a day into a particular table, and that
90% of the time the data INSERTed into one of the columns of the table is
the same value.
If this the case, you can
reduce network traffic (along with some SQL Server overhead) by creating
this particular column with a default value of the most common value. This
way, when you INSERT your data, and the data is the default value, you
don't INSERT any data into this column, instead allowing the default value
to automatically be filled in for you. But when the value needs to be
different, you will of course INSERT that value into the column. [6.5,
7.0, 2000] Added 7-2-2003
*****
Performing UPDATES takes
extra resources for SQL Server to perform. When performing an UPDATE,
try to do as many of the following recommendations as you can in order to
reduce the amount of resources required to perform an UPDATE. The more of
the following suggestions you can do, the faster the UPDATE will
perform.
- If you are UPDATing a
column of a row that has an unique index, try to only update one row at
a time.
- Try not to change the
value of a column that is also the primary
key.
- When updating VARCHAR
columns, try to replace the contents with contents of the same
length.
- Try to minimize the
UPDATing of tables that have UPDATE triggers.
- Try to avoid UPDATing
columns that will be replicated to other
databases.
- Try to avoid UPDATing
heavily indexed columns.
- Try to avoid UPDATing a
column that has a reference in the WHERE clause to the column being
updated.
Of course, you may have very
little choice when UPDATing your data, but at least give the above
suggestions a thought. [6.5, 7.0, 2000] Added 7-2-2003
*****
If you have created a
complex transaction that includes several parts, one part of which has a
higher probability of rolling back the transaction than the others, better
performance will be provided if you locate the most likely to fail part of
the transaction at the front of the greater transaction. This way, if this
more-likely-to-fail transaction has to roll back because of a failure,
there has been no resources wasted on the other less-likely-to-fail
transactions. [6.5, 7.0, 2000] Added 7-2-2003
|