In my previous article I talked about a few spool operators
(Eager Spool, Lazy Spool and Rowcount Spool), now its time to talk about the
remaining spool operators that the query optimizer uses to make our queries run
faster. In this article I will talk about the Non Clustered Index Spool. Then I
will talk about Window Spools which are used by a number of window functions
being introduced in Denali. In the end, I will talk about the common scenarios
SQL Server query optimizer employs spool
operators and how can we avoid them by simply keeping in mind a few things.
Let’s get the things rolling then.
Non Clustered Index
Spool
Every DBA who has had the opportunity to do performance
tuning knows the importance of indexes in SQL Server. Still we will see tables
without indexes (or without proper indexes). The Non Clustered Index Spool is
used to better the read performance of queries which access such tables which
are not properly indexed (or indexed at all). When SQL Server encounters such
queries, where it needs to read data from tables that are not indexed it might
create a temporary index on the spooled data in the TempDB. The query optimizer
then uses this index as any other non clustered index to boost the performance
of the query. The Index Spool scans the input rows and places a copy of the
rows in a hidden file stored in TempDB (just like any other spool) and builds a
non clustered index on the rows. Now this hidden index can be used in the query
for seek operations just like any other index is used. To better understand
index spool operators I will show just one query and the execution plan that
SQL Server query optimizer uses for the query,
USE AdventureWorks
GO
SELECT *
FROM sales.SalesOrderDetail S1
WHERE S1.LineTotal > (
SELECT AVG(S2.linetotal)
FROM sales.SalesOrderDetail AS S2
WHERE S2.ModifiedDate < S1.ModifiedDate)
What I am doing with this query is pulling all the sales
orders which have the order value (LineTotal) more than the average order
values for all the orders that were placed before them. To figure out what the
query optimizer thinks about this query, we will see the execution plan
So as you can see, the query optimizer thinks that it should
create an index on the table to fetch all the data in the TempDB so it is going
for creating an index spool(and that is taking about half of the time the query
is using). So according to the query plan, first a clustered index scan is used
to read all the data from the table and that data is being pulled by the index
spool which then creates an index on that data in the tempdb. As the spool is
an Eager spool( I talked about Eager and Lazy spools in my previous article) it
will pull all the data at one go from the table scan. Let’s see if the query optimizer is smart
enough to take his own decisions and to do that we need to check the existing
indexes on the SalesOrderDetail,
So, the query optimizer is once again spot on as there is no
index covering for either of the two columns (LineTotal and ModifiedDate) used
in the query. The best thing about index spools are that they tell you exactly
what you can do to remove them. So if we create a non-clustered index on the
table covering the columns that are used in the query, we should see something
interesting in the query plan,
create nonclustered index IX_Test on sales.SalesOrderDetail(ModifiedDate,linetotal)
Now if we run the same query again and see the execution
plan we can see the index spool is gone. Now instead of going for a clustered
index scan, the query optimizer goes for an index seek on the index we just
created. Also, if you want to check the total query cost, it has also decreased
by a substantial amount and the improvement will only get better once the table
size grows as the query will never go for a complete table scan now.
Window Spool
This will be the last spool
operator that I will discuss. Window spools are new to SQL Server just like a
lot of window functions that are coming bundled with SQL Server 2012 aka
Denali. Window functions are set of functions that allow us to work on a set of
data that is known as the Window. Different database systems are offering a
number of Window functions for quite some time now but SQL Server has generally
missed out on these functions. There were some advances with SQL Server 2005
but after that we had to wait till Denali to see further substantial
improvements in Window functions. But do we need them anyways? The answer would
be yes as Window functions allow us to have a critical functionality of dealing
with details within an aggregation. If we group by some field in a select query
we lose access to the details within the group but sometimes we need at least
some information about the groups that we lose because of aggregations. One
such example would be to get the rank of salespersons per geographical region
based on their targets achieved. Window functions let us do exactly that. They
partition the data into different windows (in the example above, windows will
be different geographical regions) and let us work on different windows of
data. We will see just one example and then we will move to our spool operator,
USE TempDB
GO
IF OBJECT_ID('Window_tbl') IS NOT NULL
DROP TABLE Window_tbl
GO
CREATE TABLE
Window_tbl (Value
INT)
GO
INSERT INTO
Window_tbl(value)
values
(4),
(4),
(5),
(3),
(6)
GO
SELECT Value,
LEAD(Value, 2) OVER(ORDER BY Value DESC) AS
"Lead"
FROM Window_tbl
And the result is this,
So the Lead function tells us the value in the row that falls after the offset defined by the second
parameter. So in the example above, as the offset is two it gives us the values
that are separated by the current value by two rows. NULL values will be
returned if there are no rows defined by the offset parameter. So for 4 and 3
there are NULL values as there are no rows which are separated by two rows from
the rows having 4 and 3. Let’s see the execution plan for the code that we just
ran,
So the query optimizer is using a Window spool. So SQL Server stores the
frame data( or the Window data) in a temporary ‘worktable’ in TempDB. SQL
Server actually uses two different ways
to store the frame data. It can either go for a disk-based worktable or it can
store all the frame data into an in-memory worktable. The difference being the
in-memory worktable is much faster than the disk based worktable for obvious
reasons. It goes for fewer scans/ logical/ physical reads than the disk based
worktable. The question is can we control SQL Server to go for one particular
type of worktable for the spools? Yes, we can. The in-memory worktable is used
when we define the frame as rows and the frame contains less than 10000 rows.
The on-disk worktable is used when the window functions are used with the
default frame (i.e range) or the frame has more than 10000 rows.
OVER (
[ <PARTITION BY clause> ]
[ <ORDER BY clause> ]
[ <ROW or RANGE clause> ]
)
[ <PARTITION BY clause> ]
[ <ORDER BY clause> ]
[ <ROW or RANGE clause> ]
)
We can easily check
which kind of worktable is being used by SQL Server. We can either use the
xEvent ‘window_spool_ondisk_warning’ in SQL Server 2012 or we can set
the statistics IO on and look at the output.
SET STATISTICS
IO ON
SELECT Value,
LEAD(Value, 2) OVER(ORDER BY Value DESC) AS "Lead"
FROM Window_tbl
(5 row(s) affected)
Table 'Worktable'. Scan count 0, logical reads 0,
physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads
0, lob read-ahead reads 0.
Table 'Window_tbl'. Scan count 1, logical reads 1,
physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads
0, lob read-ahead reads 0.
The count values for
the ‘Worktable’ tell us that it has used an in-memory worktable. For a
disk-based worktable you will see these counts to be much higher. Because of
the performance differences, it is always good to deine the window as ROWS
instead of RANGE.
Avoiding Spools
As with all performance tuning methods, there is no sureshot method to
avoid spools but definitely there are some best practices that can help to a
good extent. SQL Server might go for spools for queries using ‘Select into’ if
the isolation level is read isolation. It can be avoided by lowering the
isolation level to read committed. Also, use of NOLOCK hints should also help
in such cases. (NOTE: Be careful before you change the isolation level of your
systems). SQL Server might go for spools in user defined functions to ensure
that no DDL change will break the ongoing operation of the function. This
scenario can be avoided by using schema binding in functions. If the UDF is
non-schema bound then the query optimizer will generate the spool operator. In
the case of a UDF where it is not accessing any base table or view, such
spooling will never be utilized. So in the case of the schema bound UDF this
spool operation does not occur and performance is increased. Window spools can
be avoided easily by defining the frame as ROWS instead of RANGE (provided the
frame size is less than 10000 rows). Similarly, its pretty simple to avoid non
clustered index spools. Mostly, you would be creating indexes covering the
columns used in the query( other wise the optimizer will create the same index
in tempdb known as spools). Reworking a query that used an OR in the where
statement might help sometimes.
So this was an attempt to make the spool operators a little more
understandable. I hope you enjoyed learning about different types of spools and
ways to avoid them. Your comments and feedback are welcome as always.
Thanks for a good concluding article. I would only recomment creating scenarios from the self created tables. This should be a good value add for your blogs.
ReplyDeleteThanks for the feedback Adi, I will keep that in mind.
ReplyDelete