Post, moved. Please visit: https://allbigdatathings.blogspot.com/2019/07/connecting-aws-redshift-to-ssas-cubes.html
Rambling about database technologies
Spewing things about SQL Server, Redshift, AWS and more
Sunday, 27 September 2015
Thursday, 12 March 2015
Installing Boto framework on Windows
I recently started a new task which involves connecting to S3 using Python. It looked a little tough in the beginning but it was pretty straightforward in the end. First of all, I needed to install a library. Most of the tutorials showed steps to do that but on Linux. Anyways, I found a way and it is quite straight forward, just that a beginner gets intimidated by a lot of things. Let's roll then.
Step 1: First you need to download the Boto library on to your system. A quick google search gave me a github link, https://github.com/boto/boto. Go to this link and "Download the zip".
Step 2: Extract the contents of the zip and put it in some folder. For me, I put it in the location C:\Python27\boto-develop
Step 3: Open Windows cmd and move to the location used in step2.
>cd C:\Python27\boto-develop
Step 4: Now you need to install it. To do that run the following command,
python setup.py install
You should see a lot of files being extracted, and the control returning to the cmd prompt. Your installation is done!
Just to confirm that the installation is done, move to the lib/site-packages folder of your python installation. You should see an EGG-INFO file for boto.
See, that wasn't tough, right?
Step 1: First you need to download the Boto library on to your system. A quick google search gave me a github link, https://github.com/boto/boto. Go to this link and "Download the zip".
Step 2: Extract the contents of the zip and put it in some folder. For me, I put it in the location C:\Python27\boto-develop
Step 3: Open Windows cmd and move to the location used in step2.
>cd C:\Python27\boto-develop
Step 4: Now you need to install it. To do that run the following command,
python setup.py install
You should see a lot of files being extracted, and the control returning to the cmd prompt. Your installation is done!
Just to confirm that the installation is done, move to the lib/site-packages folder of your python installation. You should see an EGG-INFO file for boto.
See, that wasn't tough, right?
Wednesday, 22 August 2012
Columnar Databases and SQL Server Columnstore Indexes
Lately I have
been working on Vertica - a column based database from HP and I have to admit
that I am amazed by its capabilities and the functionalities it offers,
especially when we are talking about relatively big data. The basic difference
between SQL Server and Vertica is the way they store data- SQL server being a
row store database and Vertica being a column store database. With Denali, SQL
Server has also come up with something on the similar lines with ColumnStore
indexes (codenamed Apollo). In this article I will talk about columnar databases,
how they are different from the row store databases that we use everyday and
why do we need them. Also, I will try to figure out what SQL Server has to
offer with Apollo and how it stands up to any other column store database.
Column store databases
A column store
database is a database which stores table data as sections of columns of data rather than as rows of data. That
means all the values for one particular column will be stored together and
these sets of data might be distributed for different columns. Different column
values for same row are linked by pointers within. When any query is run
against a row store database, it fetches data from all the columns for the rows
that match the query conditions. In contrast, when the same query is run
against a column store database, only those columns are read which are required
by the query. So if you consider a table with 30 columns and your query needs(
this is often less than 15% of the columns in a typical fact table, according
to a Microsoft whitepaper) only five of those fields the column store database
will at least give you 6x performance over its row store counterpart (provided
the data in each column is similar). Considering the fact that disk access is
still the slowest operation for our computers, this becomes significant because
of much lesser pages to be read from the disk. Also, the columns which are not
used in the query are not loaded at all in the memory which becomes essential
to avoid frequent PLE drops when dealing with large amount of data.
Also, most
of the column store databases use high values of compression for the data they
store. The extremely high values of compression are results of the redundancy
we usually see in columns. Again, the values in the same column are invariably
of the same type, unlike the data you will see in a row, thus facilitating even
more compression. Most of the column store database systems use more than one
type of compression schemes based on the type of values stored in the column.
For instance, Vertica uses Run Length Encoding schemes to store values in a
column that are highly repetitive. These encoding schemes allow the database
system to compress each column separately based on the underlying data values which
can’t be done in a row store database because of different types of values
stored in each row for each column.
So if
column store databases are so good, why do we use row store databases at all?
Moreover, why row store databases are ruling the market when they are so inept
when compared with column store databases? The point is that row store
databases have their own advantages. You can’t use a column store database for
a small table with hundreds of OLTP transactions per minute. A row store database
will be much faster than a column store database if you want to see all the
columns of the table in your resultset (because different columns are stored at
different places on the disk and need multiple disk seeks). In a nutshell,
there are specific use cases where you should use column store databases. The
rise of column store databases has a lot to do with the rise of BI and big
data. So generally column store databases should be used where traditional row
based databases fail in delivering a satisfactory performance. Usually, such
cases will be when you are trying to run analytic queries on significantly big
data or when you need only few columns of a large table with a number of
columns (a fact table for a star schema perfectly suits the description). Row
store databases fail because they are just not designed for the purpose.
ColumnStore indexes in SQL Server
Now that
you have a basic understanding of what a columnstore is, let’s talk about this
new feature of Denali which is known as ColumnStore Index (codenamed Apollo). A
columnstore index stores each column in a separate set of disk pages as
compared to a heap/ B-tree which store multiple rows per page.
It brings
the obvious advantages of a column store, easily compressible data and fetching
only the data that we actually need in the query, to a normal SQL Server table.
The columnstore index in SQL Server employs Microsoft’s proprietary VertiPaq
technology. SQL Server columnstore indexes are pure column stores. That means that the data is stored and
compressed in column-wise fashion and individual columns can be accessed
separately from other columns. Another type of column store is hybrid column stores which stores data
as set of rows, but within that set of rows, data is organized and compressed
in column-wise fashion.
Once you
create a columnstore index on a table the query optimizer will choose to use a
heap/B-tree or the index created by you based on the query and the cost of the
query. If the optimizer chooses the columnstore index, but you feel that the
underlying B-tree can perform better, you can always query hints to use the
B-tree instead of the columnstore index. Remember that SEEKS are not supported
by columnstore indexes. That means, if you use the table hint FORCESEEK, the SQL
query optimizer will not consider the columnstore index and will rather go for
the underlying B-tree or heap. Also, SQL Server columnstore indexes can use
batch mode processing to even better the performance of the query.
While the
performance benefits of columnstore index is quite nice, one can argue using
alternative methods to achieve the same functionality, namely pre-computed
summary tables or indexed views. These alternatives will surely work but they
don’t offer the flexibility that columnstore indexes do. Pre-computed summary
tables will approximate the effect of a columnstore but they will not work you
have to change your query. Say, even if you have to add just one aggregate
function to the query, you will need to change the structure of the summary
table. Having a columnstore index on relevant columns will give you the
flexibility to run any query with any aggregate function without changing
anything in the base table. Also, if you don’t have a fixed set of queries to
be run it can become very difficult to maintain pre-computed tables. Another
alternative can be to use covering indexes. In most cases the columnstore index
will be much more compressed that the covering index. If the query is too
selective the covering index might work faster than the columnstore index (keep
in mind that column stores are always better for big data and almost always
poorer for OLTP queries). Also, columnstore index can use batch mode of
processing to substantially speed up the query.
Limitations on using a columnstore index
Now that
you know the advantages of columnstore index in SQL Server, it’s high time you
know the flip side of the coin. SQL Server columnstore index comes with its own
set of limitations, and unfortunately there are a lot of them. Among the numerous
limitations the most glaring one is the inability to update a table with a
columnstore index. Microsoft actually proposes some workarounds to achieve this
in the whitepaper on columnstore indexes but this can definitely turn out to be
the single biggest factor that actually stops you from using a columnstore
index. A columnstore index can’t be clustered though Microsoft says that it
will change with the future releases of SQL Server. Also, you cannot modify a
columnstore index using ALTER INDEX statement. If you want to do so, you must drop and recreate the columnstore
index. You cannot do a custom sort in a columnstore index by using ASC/DESC
because columnstore indexes are ordered
according to the compression algorithms. You cannot combine columnstore indexes
with either replication or change data capture. As columnstore indexes already
use compression you cannot combine it with page or row level compression
offered by SQL Server. There are some more limitations that you can refer to in
the whitepaper or on the msdn page for columnstore indexes.
I agree
that the excitation created by use of column stores in a mainstream row store
database is somewhat dampened by the numerous limitations that comes bundles
with it but nonetheless it is a great initiative by Microsoft to try and come
up with something meant for big data. The future looks destined for huge growth
in the amount of data that we deal with and that will only prompt the growth of
columnar databases. Right now SQL Server columnstore indexes stand nowhere when
compared to full columnar database in terms of functionalities and performance
but at least the guys at Redmond know what they are doing and we can definitely
hope for a plethora of improvements in columnstore indexes in the future SQL
Server releases. Also, I am looking forward to a more direct and tighter
integration of columnstore indexes and SQL Server Analysis Services. I hope you
enjoyed reading this little article. Your comments and feedback are welcome as
always.
Sunday, 15 July 2012
Spool Operators: Index spool and Window spool
In my previous article I talked about a few spool operators
(Eager Spool, Lazy Spool and Rowcount Spool), now its time to talk about the
remaining spool operators that the query optimizer uses to make our queries run
faster. In this article I will talk about the Non Clustered Index Spool. Then I
will talk about Window Spools which are used by a number of window functions
being introduced in Denali. In the end, I will talk about the common scenarios
SQL Server query optimizer employs spool
operators and how can we avoid them by simply keeping in mind a few things.
Let’s get the things rolling then.
Non Clustered Index
Spool
Every DBA who has had the opportunity to do performance
tuning knows the importance of indexes in SQL Server. Still we will see tables
without indexes (or without proper indexes). The Non Clustered Index Spool is
used to better the read performance of queries which access such tables which
are not properly indexed (or indexed at all). When SQL Server encounters such
queries, where it needs to read data from tables that are not indexed it might
create a temporary index on the spooled data in the TempDB. The query optimizer
then uses this index as any other non clustered index to boost the performance
of the query. The Index Spool scans the input rows and places a copy of the
rows in a hidden file stored in TempDB (just like any other spool) and builds a
non clustered index on the rows. Now this hidden index can be used in the query
for seek operations just like any other index is used. To better understand
index spool operators I will show just one query and the execution plan that
SQL Server query optimizer uses for the query,
USE AdventureWorks
GO
SELECT *
FROM sales.SalesOrderDetail S1
WHERE S1.LineTotal > (
SELECT AVG(S2.linetotal)
FROM sales.SalesOrderDetail AS S2
WHERE S2.ModifiedDate < S1.ModifiedDate)
What I am doing with this query is pulling all the sales
orders which have the order value (LineTotal) more than the average order
values for all the orders that were placed before them. To figure out what the
query optimizer thinks about this query, we will see the execution plan
So as you can see, the query optimizer thinks that it should
create an index on the table to fetch all the data in the TempDB so it is going
for creating an index spool(and that is taking about half of the time the query
is using). So according to the query plan, first a clustered index scan is used
to read all the data from the table and that data is being pulled by the index
spool which then creates an index on that data in the tempdb. As the spool is
an Eager spool( I talked about Eager and Lazy spools in my previous article) it
will pull all the data at one go from the table scan. Let’s see if the query optimizer is smart
enough to take his own decisions and to do that we need to check the existing
indexes on the SalesOrderDetail,
So, the query optimizer is once again spot on as there is no
index covering for either of the two columns (LineTotal and ModifiedDate) used
in the query. The best thing about index spools are that they tell you exactly
what you can do to remove them. So if we create a non-clustered index on the
table covering the columns that are used in the query, we should see something
interesting in the query plan,
create nonclustered index IX_Test on sales.SalesOrderDetail(ModifiedDate,linetotal)
Now if we run the same query again and see the execution
plan we can see the index spool is gone. Now instead of going for a clustered
index scan, the query optimizer goes for an index seek on the index we just
created. Also, if you want to check the total query cost, it has also decreased
by a substantial amount and the improvement will only get better once the table
size grows as the query will never go for a complete table scan now.
Window Spool
This will be the last spool
operator that I will discuss. Window spools are new to SQL Server just like a
lot of window functions that are coming bundled with SQL Server 2012 aka
Denali. Window functions are set of functions that allow us to work on a set of
data that is known as the Window. Different database systems are offering a
number of Window functions for quite some time now but SQL Server has generally
missed out on these functions. There were some advances with SQL Server 2005
but after that we had to wait till Denali to see further substantial
improvements in Window functions. But do we need them anyways? The answer would
be yes as Window functions allow us to have a critical functionality of dealing
with details within an aggregation. If we group by some field in a select query
we lose access to the details within the group but sometimes we need at least
some information about the groups that we lose because of aggregations. One
such example would be to get the rank of salespersons per geographical region
based on their targets achieved. Window functions let us do exactly that. They
partition the data into different windows (in the example above, windows will
be different geographical regions) and let us work on different windows of
data. We will see just one example and then we will move to our spool operator,
USE TempDB
GO
IF OBJECT_ID('Window_tbl') IS NOT NULL
DROP TABLE Window_tbl
GO
CREATE TABLE
Window_tbl (Value
INT)
GO
INSERT INTO
Window_tbl(value)
values
(4),
(4),
(5),
(3),
(6)
GO
SELECT Value,
LEAD(Value, 2) OVER(ORDER BY Value DESC) AS
"Lead"
FROM Window_tbl
And the result is this,
So the Lead function tells us the value in the row that falls after the offset defined by the second
parameter. So in the example above, as the offset is two it gives us the values
that are separated by the current value by two rows. NULL values will be
returned if there are no rows defined by the offset parameter. So for 4 and 3
there are NULL values as there are no rows which are separated by two rows from
the rows having 4 and 3. Let’s see the execution plan for the code that we just
ran,
So the query optimizer is using a Window spool. So SQL Server stores the
frame data( or the Window data) in a temporary ‘worktable’ in TempDB. SQL
Server actually uses two different ways
to store the frame data. It can either go for a disk-based worktable or it can
store all the frame data into an in-memory worktable. The difference being the
in-memory worktable is much faster than the disk based worktable for obvious
reasons. It goes for fewer scans/ logical/ physical reads than the disk based
worktable. The question is can we control SQL Server to go for one particular
type of worktable for the spools? Yes, we can. The in-memory worktable is used
when we define the frame as rows and the frame contains less than 10000 rows.
The on-disk worktable is used when the window functions are used with the
default frame (i.e range) or the frame has more than 10000 rows.
OVER (
[ <PARTITION BY clause> ]
[ <ORDER BY clause> ]
[ <ROW or RANGE clause> ]
)
[ <PARTITION BY clause> ]
[ <ORDER BY clause> ]
[ <ROW or RANGE clause> ]
)
We can easily check
which kind of worktable is being used by SQL Server. We can either use the
xEvent ‘window_spool_ondisk_warning’ in SQL Server 2012 or we can set
the statistics IO on and look at the output.
SET STATISTICS
IO ON
SELECT Value,
LEAD(Value, 2) OVER(ORDER BY Value DESC) AS "Lead"
FROM Window_tbl
(5 row(s) affected)
Table 'Worktable'. Scan count 0, logical reads 0,
physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads
0, lob read-ahead reads 0.
Table 'Window_tbl'. Scan count 1, logical reads 1,
physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads
0, lob read-ahead reads 0.
The count values for
the ‘Worktable’ tell us that it has used an in-memory worktable. For a
disk-based worktable you will see these counts to be much higher. Because of
the performance differences, it is always good to deine the window as ROWS
instead of RANGE.
Avoiding Spools
As with all performance tuning methods, there is no sureshot method to
avoid spools but definitely there are some best practices that can help to a
good extent. SQL Server might go for spools for queries using ‘Select into’ if
the isolation level is read isolation. It can be avoided by lowering the
isolation level to read committed. Also, use of NOLOCK hints should also help
in such cases. (NOTE: Be careful before you change the isolation level of your
systems). SQL Server might go for spools in user defined functions to ensure
that no DDL change will break the ongoing operation of the function. This
scenario can be avoided by using schema binding in functions. If the UDF is
non-schema bound then the query optimizer will generate the spool operator. In
the case of a UDF where it is not accessing any base table or view, such
spooling will never be utilized. So in the case of the schema bound UDF this
spool operation does not occur and performance is increased. Window spools can
be avoided easily by defining the frame as ROWS instead of RANGE (provided the
frame size is less than 10000 rows). Similarly, its pretty simple to avoid non
clustered index spools. Mostly, you would be creating indexes covering the
columns used in the query( other wise the optimizer will create the same index
in tempdb known as spools). Reworking a query that used an OR in the where
statement might help sometimes.
So this was an attempt to make the spool operators a little more
understandable. I hope you enjoyed learning about different types of spools and
ways to avoid them. Your comments and feedback are welcome as always.
Wednesday, 20 June 2012
Spool Operators: Eager spool, Lazy spool and Rowcount Spool
There
are a lot of operators in the query execution plan that we generally overlook(
I do this for those operators which I have no clue about). Sometimes these
operators prove out to be pivotal in the performance of the query and we are
forced to take good care of them. Spool operators are one such category of
operators that we generally don't pay much attention to. But the very existence
of spools in your query plan hints(most of the times) that the query is not
optimized and SQL Server was forced to use spools to make the best out of your
query. In this post, I will talk about what spools are and when they are used.
I will go into details of Eager spool, Lazy Spool and Row Count Spool and we
will see an interesting problem- known as the Halloween Problem. In the next
post I will talk about the remaining spool operators- Table spool,
Non-Clustered Index Spool and Window Spool and few guidelines to avoid spools.
Sounds interesting ? Let's dive in then.
The
query optimizer uses many techniques to optimize the queries we write and to
avoid any problems, so that the query performs better. The spool operators are
good examples of how optimizer works on behalf of the developer to make sure
the query performs well. Though there are many spool operators (five to be
precise in SQL Server 2008/ six in Denali), they all share the fact that they
save their intermediate query results on the TempDB database and use this temporary
area to search a value. This is not an explicit #temp table, but a work table
for operations within the processing necessary for a given query's behavior.
The optimizer uses this process whenever it knows that the intermediate result
is complex to calculate. So the computations are done once and the results
stored in the temporary space are used for further operations. Also, the
optimizer might go for spooling if it finds out that the density( the more
dense a column is, more rows that column returns for a given query) of the
column is very high.
Before
getting started with different spool operators, I will talk about few terms. To
start with let's understand Rewind and Rebind. A rewind is an execution using the
same value that has already been calculated in the immediately preceding
execution. A rebind is defined as an execution that needs a value that is
still to be calculated. For a spool it is very important to distinguish between
rewinds and rebinds. Showplan operators can be categorized into Blocking and Non-Blocking operators. A blocking operator is one
which halts the query execution until all rows are read from the input. Only
then the actions will be performed and result would be returned. The SORT
operator is a blocking operator. It reads all the rows, sorts the data and then
returns the ordered rows. Non-blocking operators read one row from their input
and return the output for each row.
Eager
Spool
Eager
spool stores all the rows received from the preceding operator into a hidden
temp table in TempDB. "Eager" means that the spool will read all the
rows that were returned from the previous operator at one time. So that makes
the eager spool a blocking operator as it fetches all the rows at one go and
stores it in a temp table. If the operator is rewound and no rebinding is
necessary, the spooled data is used instead of rescanning the input. If
rebinding is required all the spooled data is discarded and the spool is rebuilt
by rescanning the input.
To
make it clear I will resort to storytelling, literally ! There were a bunch of
DBAs, hardworking and dedicated as ever, who were sacrificing their night of
Halloween and burning some midnight oil for the greater good. They were trying
to give a 10 % hike to every employee in the organisation who earned less than
$25000. So what's the big deal you might say. Its just a matter of one update.
And update they did. But then something happened that nobody was able to
believe. While for other employees it may have been proved a Halloween gift, it
was no less than horror for those poor DBAs. They found that all the employees
were earning more than $25000 after the update. This is what the kids know as
the Halloween Problem.
Let's
try to figure out what actually happened with the update. Wikipedia defines
this problem as "the Halloween
Problem refers to a phenomenon in database in which an
update operation causes a change in the physical location of a row,
potentially allowing the row to be visited more than once during the operation.
This could even cause an infinite loop in some cases where updates continually
place the updated record ahead of the scan performing the update operation."
This is exactly what happened there. The update kept on updating the records
unless each record satisfied the where clause.
Let's
dive a little deeper. When we update a field which is a key in an index, first
the index is used to fetch the data for a row and when the update is done the
index is also updated with the new value. Now, consider the salary field as the
updatable column. Now this is sorted in the index. Once the first value for
salary is updated in the index, it becomes greater than the second. So it is
moved back to the second place as the index is sorted. Now for the update, when
the parent query wants the second row, again the index is used. As the index
has already returned the first row it will now return the second row. But that
was already updated!!!
SQL
Server deals with the Halloween Problem using an Eager spool. So when the
update query is run, it again calls for the index. But instead of fetching one
row, it spools the entire table. Now for the second row, records are not read
from the index but from the temp table that the spool has created. So there is
no multiple updates on the same row and DBAs can now rest on Halloween nights.
SQL Server query optimizer may also create eager spools in other scenarios,
when it feels like data should be cached for further operations. So much for the
eager spool, let's talk about the Lazy Spool now.
Lazy
Spool
Being
a member of the great Spool family, Lazy Spool is also destined to get data
from some table and store in some temporary table in the TempDB. But, Lazy
Spools work a little differently than Eager spools. Lazy Spools read data only
when individual rows are required. That means a Lazy Spool reads data and
stores the rows in temporary tables only when the parent operator asks for a
row. So the difference with the Eager Spool is visible as an Eager Spool reads
all rows at once into a temporary table. Again, not to say Lazy Spool is a
non-blocking operator. The query optimizer uses the Lazy Spool operator to
optimize some queries by avoiding reading the same values multiple times.
Effectively, SQL Server works with one chunk of data in all operations( until
the parent operator needs different data). Otherwise, it would be constantly
fetching new data with each iteration of the parent operator.
Row
Count Spool
Quoting
from msdn, "The Row
Count Spool operator scans the input, counting how many
rows are present and returning the same number of rows without any data in
them. This operator is used when it is important to check for the existence of
rows, rather than the data contained in the rows. For example, if a Nested Loops operator
performs a left semi join operation and the join predicate applies to inner
input, a row count spool may be placed at the top of the inner input of
the Nested
Loops operator. Then the Nested Loops operator
can determine how many rows are output by the row count spool (because the
actual data from the inner side is not needed) to determine whether to return
the outer row." That's all that I could have said to explain it but I have
a sample query to show when a Row Count Spool might be used and then we
will try to figure out why the query optimizer does that and how it helps with
respect to our query.
Use AdventureWorks
GO
select * from
Sales.SalesOrderDetail
SOD1
where not exists( select 1 from Sales.SalesOrderDetail
SOD2
where SOD2.modifieddate= '2001-07-01 00:00:00.000'
and SOD2.orderqty>10)
When we look at the query plan for this query we will see
this,
So you can see that the plan is using the Row Count Spool
operator to check the Exists sub query. The clustered index scan reads all rows
in the table and passes the rows to the Nested loop operator to be joined with
the subquery result. The Left Anti Semi join here means that only one side of
the join will be actually returned to the select query. So, only the rows
returned from SOD1 will be returned by the select. For each row in SOD1, the
loop will see the Row Count Spool to check if the value exists or not. If SQL
Server would not have used Row Count Spool for this query, it would have
required to execute the subquery for each row returned by the parent query. So
effectively, SQL Server uses Row Count Spool to avoid reading the SOD2 once for
every row returned by the parent query. If you want to see the difference in
query execution had SQL Server not chose to create a Row Count Spool, you might
want to refer to Fabiano’s articles on Spool operators(links shared below).
I hope you enjoyed this tour of Spool Operators as much as I
enjoyed learning about them. Though spools are one of the ways SQL Server query
optimizer tries to tune our ill-written queries, they should also be seen as an
opportunity to tune the query. I will talk about the remaining spool operators
and few guidelines to avoid them in your query plans in the next article. Your
comments and feedback are welcome as always.
You might want to go through the fabulously written articles
by Fabiano,
Thursday, 7 June 2012
Difference between sys.objects and sysobjects
According to Books Online (BOL), many of the system tables from earlier releases of SQL Server are now implemented as a set of views in SQL Server 2005. These views are known as compatibility views, and they are meant for backward compatibility only. The compatibility views expose the same metadata that was available in SQL Server 2000. However, the compatibility views do not expose any of the metadata related to features that are introduced in SQL Server 2005. Therefore, when you use new features, such as Service Broker or partitioning, you must switch to using the catalog views.
For example, sysobjects is just a system table in SQL Server 2000. However, in SQL Server 2005, there is no such system table named “sysobjects”, it is implemented as a system view, just as background compatibility view only. Therefore, when we use new features, we need to switch to using the catalog views, here is sys.objects. In SQL Server 2005, since many of new features are used and stored in the system database, so these two views (sysobjects and sys.objects) return the result will certainly different.
For more information, please visit the following links:
Compatibility Views (Transact-SQL): http://msdn.microsoft.com/en-us/library/ms187376(v=SQL.90).aspx
Mapping SQL Server 2000 System Tables to SQL Server 2005 System Views: http://msdn.microsoft.com/en-us/library/ms187997(v=SQL.90).aspx
User-Schema Separation: http://msdn.microsoft.com/en-us/library/ms190387(v=SQL.90).aspx
Friday, 4 May 2012
Transaction Log full on TempDB
This post aims to give the readers a simple step-by-step methodology to deal with 'Transaction Log Full' situations. The general perception is that we should not care about transaction logs if the recovery model is set to Simple. But actually that's not always the case. A very common phenomenon happens when the transaction log on the TempDB blows up because of some spid in KILLED/ROLLBACK state. It happens because while rolling back the transaction also the transaction log is used. So let's get started with how to deal will this issue,
1. Do DBCC OpenTran to know which transaction is the actual culprit.
2. Check the spid that you got from step 1 using sp_who2. It will give you an idea of what is actually the reason behind the issue and if you can kill it without affecting anything important.
3. If you can kill the spid, kill it. This will solve the transaction log issue. If not, see if you can increase the size of the log file using,
1. Do DBCC OpenTran to know which transaction is the actual culprit.
2. Check the spid that you got from step 1 using sp_who2. It will give you an idea of what is actually the reason behind the issue and if you can kill it without affecting anything important.
3. If you can kill the spid, kill it. This will solve the transaction log issue. If not, see if you can increase the size of the log file using,
USE MASTER
GO
ALTER DATABASE TEMPDB MODIFY FILE (NAME ='templog' , SIZE= 8500MB )
Or add another file for the log,
ALTER DATABASE Tempdb
ADD LOG FILE
( NAME = tempdblog2 ,
FILENAME = 'E:\MSSQL.1\MSSQL\DATA\tempdblog2.ldf' ,
SIZE = 10MB,
MAXSIZE = 100MB ,
FILEGROWTH = 10MB)
4. The above step will solve the problem in most of the cases. But there is a possibility that the disk where the log file is has already grown to full. in such case either you can get more space added to the drive by calling your admin or you can shrink some database files of the databases that are lying on that drive. Once you have more space on the disk you can go back to step 3 to fix the issue.
I hope this simple article will help few DBAs in dealing with this issue. As always your comments are welcome.
Subscribe to:
Posts (Atom)