Rambling about database technologies: April 2012

Wednesday, 25 April 2012

SQL Server Service Broker Demystified

Introduction

Service Broker was introduced in SQL Server 2005 and can be easily hailed as one of the best new features added to SQL Server. With Service Broker a database developer can create powerful asynchronous applications within a server or between different servers. The range of such applications may vary from simple workload queuing to complex cross-server auditing. Moreover, being part of the database engine, Service Broker enjoys qualities like transactional consistency, reliability, and security. In a nutshell, It is a powerful tool for a DBA and not that difficult to learn.

I will start with the basic terminology and components that make up a service broker application (and in the process we will also have a look at the code snippets to create those components).

Synchronous vs. Asynchronous Messaging

What makes Service Broker so powerful is the asynchronous messaging system that allows developers to develop loosely coupled components that work independently to perform some task. While in a synchronous messaging the interacting components wait on each other till the message is received (that is, the sender will not continue until the receiver has received the message), in an asynchronous messaging system the sender and receiver keep doing their work independently. The advantage of asynchronous communication (and thus of the service broker) is that the sender and receiver can overlap their computation because they do not wait for each other.

Transactional consistency

In the service broker infrastructure, the delivery of messages between two parties( called endpoints as you will later see) is transactional. That is, if a transaction rollbacks, all service broker operations within that transaction will also roll back.

The heart of service broker architecture is a dialog, which according to Technet, is reliable, persistent, bi-directional, ordered exchange of messages between two endpoints. An endpoint is the sender or receiver of messages. So, let's start with going through the components that make up Service Brokers.

Enabling Service Broker

First of all, we need to enable Service Broker on our database. To check whether Service Broker is enabled on your database you need to query the sys.databases catalog.

SELECT is_broker_enabled 
FROM sys.databases 
WHERE name = ‘Your_Database’ 
GO

To enable Service Broker on your database, we use the alter database command,

ALTER DATABASE Yoour_Database 
SET ENABLE_BROKER 
WITH ROLLBACK IMMEDIATE 
GO

Be careful while running this command as WITH ROLLBACK IMMEDIATE will disconnect all the users.

Message Types

A message type is the definition of the format of the messages that will be part of the dialog between the two endpoints. Message type performs any validation required on the contents of the message before sending the message. If the message is to be sent to another database (on same or different server) the message type should exist on both the databases.

To create a message type you can use the Create Message Type command.

CREATE MESSAGE TYPE message_type_name
    [ AUTHORIZATION owner_name ]
    [ VALIDATION = {  NONE
                    | EMPTY
                    | WELL_FORMED_XML
                    | VALID_XML WITH SCHEMA COLLECTION
                      schema_collection_name
                   } ]
[ ; ]

Permission for creating a message type defaults to members of ddl_admin or db_owner database roles and the sysadmin server role. The arguments and their definitions are detailed below:

message_type_name: Is the name of the message type to be created. The message_type_name can be upto 128 characters. By convention, the name is a URL of the form //<hostname/pathname/name> but using a URL format is not required.

AUTHORIZATION owner_name: This defines who will be the owner of the message. When the current user is sa or dbo, it can be the name of any valid user. In all other cases, it must be either the current user, the name of the user who the current user has impersonate permission for or the name of the role the current user belongs to. When this clause is omitted the owner is the user who executed the statement.

VALIDATION: This specifies how the message body for messages of this type are validated by service broker. When this clause is omitted the validation defaults to none. The possible values are :

NONE: No validation is performed. The message body may contain data or it may be NULL.
EMPTY: The message body must be NULL.
WELL_FORMED_XML: The message body must contain well-formed XML. When this clause is specified, the receiver loads the message into XML parser to ensure that it can be parsed. If the message fails this validation, it is discarded and an error message is sent to the sender.
VALID_XML WITH SCHEMA COLLECTION schema_collection_name: The message body must contain XML that complies with a schema in the specified collection. schema_collection_name must be the name of an existing XML schema collection.

Let’s have an example of creating a message type:

CREATE MESSAGE TYPE My_First_Msg 
VALIDATION = WELL_FORMED_XML 
GO

You can run the following query to view the message types in the database where you have created the message them:

SELECT * FROM sys.service_message_types

Although Service Broker does not require that both sides of the conversation use the same validation, it is better to have it this way to help troubleshooting.

If you have high message volume, each message passing through the XML parser can adversely affect the performance. So, you can avoid using XML validations in such case.

Contracts

Service Broker contracts are database objects which define which message type or types will be used in any conversation. Service Broker contracts define two different service roles: the initiator and the target. The initiator of the conversation starts the conversation by sending a message to the target. The contract that the conversation uses defines which service role can send messages of what message type.

To create a contract we use the Create Contract command. The syntax is:

CREATE CONTRACT contract_name
   [ AUTHORIZATION owner_name ]
      (  {   { message_type_name | [ DEFAULT ] }
          SENT BY { INITIATOR | TARGET | ANY } 
       } [ ,...n] ) 
[ ; ]

The arguments and their definitions are detailed below:

AUTHORIZATION: Has exactly the same meaning as in creating a message type

message_type_name: is the name of the message that is to be included as part of the contract.

SENT BY: specifies which endpoint can send the particular type of message.

INITIATOR: specifies that only initiator can send the messages of the specified message type.

TARGET: specifies that only target can send the messages of the specified message type.

ANY: specifies that the messages of the specified type can be sent by either initiator or target.

[DEFAULT]: specifies that this contract supports messages of the default message type. By default, all databases contain a message type named DEFAULT.

Let’s continue creating these objects to be used in our messaging application,

--Either the initiator or the target may send the same message type

CREATE CONTRACT My_First_Contract 
(My_First_Msg SENT BY ANY) 
GO

A contract does not specify the ordering of messages. Service Broker requires the first message of a conversation to be sent by the initiator. After the first message, there are no ordering requirements in the conversation dialog.

More than one contract can use the same message type(s).

The message type and direction cannot be changed once a contract is defined. If you have to change the message type, you must drop and recreate the contract.

Queue

As I mentioned earlier that Service Broker is an asynchronous messaging infrastructure. Lets dive a bit deeper into this now. In asynchronous messaging, you send a message and start doing something else, without caring a bit about the receiver. The processing of the sent message depends solely on the receiver’s mood. He might process it now or later without casting any effect on what you are doing on your end. But, the message needs to be stored somewhere between the time you sent it and the time it will be processed. This magical place where your messages vanish just as you send them is called the queue.

You can imagine the queue as a table (actually SQL Server does implement queue via a hidden table in your database). You cannot directly manipulate ( insert, update or delete) this table though. To see the messages in the queue SQL Server provides you with a read-only view on this table which you can query. When a sender sends a message, it is put to the sender queue. The transport layer moves that message reliably to the destination queue. The receiver queue can now pull the message whenever it feels like. The message from the sender queue is deleted only after it is successfully moved to the receiver queue. So, there is no question of any message being lost in transit.

There is one temporary queue on every SQL Server instance, called the transmission_queue. When a message is sent over a network, Service Broker puts the message in this queue, on the initiator server. Then, the Service Broker sends the message over the network and marks it as waiting for acknowledgement in the transmission_queue. When the message is received in the target queue, Service Broker sends an acknowledgement back to the initiator. When this acknowledgement is received, the message is deleted from the transmission_queue. This process is called the dialog. Note that you will only see messages in transmission_queue when there is some message in transit. You can query the transmission_queue using the following select statement:

SELECT * FROM sys.transmission_queue

To create a queue we use the Create Queue command. The syntax is:

CREATE QUEUE queue_name
   [ WITH
     [ STATUS = { ON | OFF }  [ , ] ]
     [ RETENTION = { ON | OFF } [ , ] ] 
     [ ACTIVATION (
         [ STATUS = { ON | OFF } , ] 
           PROCEDURE_NAME = <procedure> ,
           MAX_QUEUE_READERS = max_readers , 
           EXECUTE AS { SELF | 'user_name' | OWNER } 
            ) [ , ] ]
     [ POISON_MESSAGE_HANDLING (
       [ STATUS = { ON | OFF } )
    ]
     [ ON { filegroup | [ DEFAULT ] } ]
[ ; ]
<object> ::=
{
    [ database_name. [ schema_name ] . | schema_name. ]
        queue_name
} 
<procedure> ::=
{
    [ database_name. [ schema_name ] . | schema_name. ]
        stored_procedure_name
}

The arguments and their definitions are detailed below:

STATUS: specifies whether the queue is available or unavailable for use after creation. It is a good practice to set the queue off so the application cannot use it. Once the setup is done completely you can turn it on using the alter queue statement.

RETENTION: When OFF (default) the message is deleted from the queue as soon as the transaction that receives the message commits. When set to ON, the messages remain in the queue until the conversation ends. While it may seem harmless to keep it ON, doing so increases the size of the queue and may affect performance of the system as a whole. As a guideline, you should set it ON only when you have a business need to keep the messages around after processing

ACTIVATION: I will deal with this is the next section as it deserves detailed discussion

POISON_MESSAGE_HANDLING: specifies whether poison message handling is enabled for the queue. The default is ON. If set to ON, the queue will be disabled after five consecutive transaction rollbacks. Setting it to OFF allows to have a custom poison message handling system in place.

ON { filegroup | [ DEFAULT ] } : specifies the SQL Server filegroup where the queue will be created. If the volume of messages is going to be high in your application, it might be better to keep the queue on a separate filegroup.

--We will be creating a queue each for the initiator and the target

CREATE QUEUE MY_First_Queue_Init 
CREATE QUEUE My_First_Queue_Target 
GO

Activation

Service Broker Activation is the mechanism that enables the applications to scale dynamically based on the message traffic. Activation uses Service Broker to start an application when there is work for the program to do.

There are two types of activation: internal and external. Internal activation works with SQL Server stored procedures and the procedure is directly activated. External activation works with SQL Server event that indicates that the program should start another queue reader.

Service Broker activation works in two steps. First, Service Broker determines whether activation is necessary. Then, service Broker determines whether activation occurs.

Let’s get back to the create queue syntax and have a look at the activation clause and related arguments:

ACTIVATION (
         [ STATUS = { ON | OFF } , ] 
           PROCEDURE_NAME = <procedure> ,
           MAX_QUEUE_READERS = max_readers , 
           EXECUTE AS { SELF | 'user_name' | OWNER } 

            ) [ , ]

STATUS: If this is ON, Service Broker activates the stored procedure specified in the PROCEDURE_NAME argument. You can use this argument to stop activation of the procedure by setting this argument to OFF. The default value is ON for this argument.

PROCEDURE_NAME = <procedure>: Specifies the name of the procedure you want to be executed when the message arrives at the receiver queue. This value must be a SQL Server identifier.

MAX_QUEUE_READERS: specifies the maximum number of instances of the procedure that the Service Broker starts at the same time. Its value can have any value between 0 and 32,767. While you may feel everything right to set a high value to this parameter but you need to have a more careful analysis of your system. While a high value will definitely make your Service Broker application very quick to respond, but it would also add a lot of stress on your servers. So, give some time and thought to balance your application requirements and your server health.

EXECUTE AS: specifies the account under which the procedure will be run at the time of activation.

Service

Let’s think of our system as a postal service. To send a letter you would write an address on the envelope. In Service Broker world, this address is the service. A target service represents an address that accepts requests for the tasks identified by the contracts that the service specifies. An initiating service represents a return address for a conversation with a target service. Each service uses a queue to store messages. Messages sent to the service are delivered to the queue.

To create a service we use the Create Service command. The syntax is:

CREATE SERVICE service_name

[ AUTHORIZATION owner_name ]

ON QUEUE [ schema_name. ]queue_name

[ ( contract_name | [DEFAULT] [ , . . . n ] ) ]

The arguments and their definitions are detailed below:

AUTHORIZATION: this is the same as described for messages and queues.

queue_name: this is the name of the queue that holds the messages received from this service.

contract_name: this specifies the contract for which the service may be the target.

Let’s create the initiator and target services now,

CREATE SERVICE My_First_Service_Init 

ON QUEUE My_First_Queue_Init 
(My_First_Contract) 

CREATE SERVICE My_First_Service_Target 
ON QUEUE My_First_Queue_Target 
(My_First_Contract) 
GO

Dialogs (Conversations)

So as we have the basic infrastructure in place for our SQL Server postal services, lets get ready to send and receive a couple of messages. Before that just a brief description of what conversations are in Service Brokers. A conversation is the SQL Server primitive over which messages are sent. A dialog is just a special type of conversation where we have a two-way conversation between two parties (future versions of SQL Server will also support monologs).

To send messages you need to start a dialog. A dialog defines the exchange of messages between two services and specifies the contract that will be used for the messages in the dialog. Once created, you will get a “dialog handle”, a unique guid identifier for the initiator. This identifier will be used whenever the initiator wants to send any message or wants to receive any message from the target.

In order to start conversation between the services, we need to start a dialog.

DECLARE @Con_Id UNIQUEIDENTIFIER 

BEGIN DIALOG CONVERSATION @Con_Id
   FROM SERVICE My_First_Service_Init 
   TO SERVICE 'My_First_Service_Target' 
   ON CONTRACT My_First_Contract 
   WITH ENCRYPTION=OFF 

--Keep this guid

SELECT @Con_Id 
GO

Sending Messages

So we are all set to start talking with our friend sitting on the target service. To send a message, we need few things though. We need a message type to validate what we are sending, we need our side of the conversation handle and of course, we need a message. As I feel that we have all of those ingredients of our recipe with us, we better start talking!

DECLARE @con_Id UNIQUEIDENTIFIER 
--Insert the GUID from the last section 
SET @con_Id = '00ABCBC9-D049-DC11-BE14-001641E42AD0' 

-- Send a message
;SEND ON CONVERSATION @con_Id  
MESSAGE TYPE My_First_Msg 
('<Hello_Zombies />') 
GO

Now that we have sent a message, let’s see if this actually works. To check whether the target received our message we need to check the target queue.

SELECT * 

FROM My_First_Queue_Target

So now you should be looking at a single row of data. The various columns summarize the properties of the conversation the two queues are having.

Receiving a message

Now that we have sent a message to the target queue and the message has safely reached its destination, we need to receive it too( think of it as a post delivered to your mailbox by the postal services, you need to pick it up from the mailbox finally). Note that once you pick the message up from the target queue, it will be removed from there (known as one-point delivery). We will be extracting the type of the message, conversation handle and the message body in the following code:

DECLARE  
    @handle UNIQUEIDENTIFIER, 
    @type sysname, 
    @body varbinary(MAX) 

;RECEIVE TOP(1)  
    @handle = conversation_handle, 
    @top = message_type_name, 
    @body = message_body 
FROM Simple_Queue_Target 

-- Don’t lose the handle as it will be used to reply
IF @type = 'My_First_Msg' 
BEGIN 
    SELECT  
        CONVERT(XML, @body), 
        @handle 
END 
ELSE 
BEGIN 
    RAISERROR( 
        'Unknown message type',  
        16, 1) 
END 
GO

Sending a message back

Sending a message back to the initiator is similar to sending a message from the initiator. Note that we are using the conversation handle to send messages within the same conversation.

DECLARE @con_Id UNIQUEIDENTIFIER 
--Insert the handle from the previous section 
SET @con_Id = 'E1A1BFCF-D049-DC11-BE14-001641E42AD0' 

;SEND ON CONVERSATION @con_Id  
MESSAGE TYPE My_First_Msg 
('<Hello_Earth/>') 
GO

Ending the conversation

When we are done with sending or receiving messages, we should always end the conversation. Else the dialog will remain active and keep listening for new messages. Idle conversations use space on the database and can create problems if it is a busy database.

DECLARE @con_Id UNIQUEIDENTIFIER 
--Insert the handle from the previous section 
SET @con_Id = 'E1A1BFCF-D049-DC11-BE14-001641E42AD0' 

END CONVERSATION @con_Id

Conclusion

So, this was my attempt to simplify Service Brokers. In my next article I will use Service Brokers to develop a more complex application with cross-database communication and much more.

Indexes in SQL Server

Introduction

When we talk of enhancing performance of our database management systems (DBMS), the first thing that comes into focus are the indexes. It would not be an exaggeration to say that indexes are the single biggest behind-the-scene player responsible for a DBMS performing well. In this article, I will explain the basics of indexing in SQL Server 2008.

Indexes

An index in SQL Server can be visualized as an index in any book. An index in a book contains list of topics and page numbers in the book, which are arranged alphabetically for a quick reference. A database index has an ordered list of values with pointers to the row where the value and its corresponding data reside. For a table without indexes, the query engine would have to search the whole table for any query. This is similar to reading the whole book from the start to get to a topic. Succinctly, an index can tremendously reduce the query time for your tables.

In SQL Server, there are two types of indexes: clustered and non-clustered. Non-clustered indexes can come in many flavors, however, let’s start with defining the term "clustered indexes."

Clustered Indexes

A clustered index stores the actual data rows in the leaf levels. This means that the clustered index for a table and data of that table are stored at the same place. Clustered indexes are organized into ranges of data. So, if you need to search for ranges (e.g. between two date values), clustered indexes are the way to go. Also for fast access, the rows are sorted (either ascending or descending) by the index key value. As the data is physically sorted in order by the index key, we cannot have more than one clustered index on a table, otherwise we will have the data arranged in two different orders. SQL Server creates a clustered index by default to match the primary key. But, you can also define a clustered index on any column and then define the primary key on some different column(s). Also, keep in mind that clustered indexes should be defined on columns that are “ever-increasing”, which is a point I elaborate on later.

Non-clustered Indexes

The leaf levels of a non-clustered index contain the index key value and the pointer to the actual data row. Effectively, the query takes one more step to gather the data. The structure of the pointer depends on whether it is pointing to a heap or to a clustered table. If pointing to a clustered index, it uses the value from the clustered index to navigate to the correct data row. If referencing a heap (table without a clustered index), it points to the actual data row. This is called key lookup. That is, once SQL Server identifies the rows that are needed, it has to retrieve the column information for those rows from the data pages of the table. So, SQL Server has to navigate through the clustered index to get the data for all the rows that are required.

Key lookups increase as the number of rows increases in the result-set. More importantly, the cost associated with the lookups increase. At some point of time, this cost might overshoot the benefits of the non-clustered index. In such a case, the query optimizer might overlook the non-clustered index and instead go for a full table scan. You can force the query optimizer to do the other way and use the non-clustered index, but it is better not to do so, as the optimizer is much better at creating better plans than most people.

You can create up to 255 non-clustered indexes on a table. Also, you can add other columns to your non-clustered index, which I will explain in the "Covering Indexes" section.

Covering Indexes

To make the most of indexes and overrule the cost effects of lookups, they should be designed in a way that they cover most of the queries expected to be hit on that table, i.e. they should have the columns that are most likely to be in the query.

Non-clustered indexes have few limitations when we talk about covering queries:

A 900-byte size limit on indexes
A maximum of 16 columns can be named in the index
Columns with data types like nvarchar(max), text, and ntext cannot be used in indexes

Fortunately, Microsoft addressed most of these problems in SQL Server 2005 - covering indexes. With a covering index, we can include non-key columns in the index. The non-key columns are added to the leaf level of the index. If you expect search queries on your table to be more driven by some columns than others, you can include those columns in your non-clustered index. So, SQL Server does not need to lookup the address of actual data rows from the index. Rather, it gets all the information in one step.

With a covering index, you can have:

A maximum of 1023 columns as the non-key columns in the index
Columns with data types like nvarchar(max) in the non-key columns of the index

You should also be careful while using covering indexes. While it may be tempting to include a lot of columns in your index, keep in mind that they also affect I/O, cache efficiency and disk space just like the key columns of the index.

Filtered Indexes

SQL Server 2008 came up with a number of new and useful features, with filtered indexes being one of them. A filtered index is an optimized, non-clustered index designed to cover queries from a subset of the data in the underlying table. Simply, it is a non-clustered index with a where clause that filters data based on some condition(s). A well-designed filtered index can have many advantages. It can substantially improve the query performance because it is smaller than the full table non-clustered index. Also, it has filtered statistics, which are more accurate than the full table statistics. So, a filtered index also enhances the plan quality. Because a filtered index is smaller than the full table non-clustered index, the maintenance cost associated with them are lower(as maintenance is required only when data covered by the filtered index is changed). There are times when we might not need a full table non-clustered index. In such cases, we can go for filtered indexes as they also reduce the disk requirements for the index.

To effectively use these advantages you need to be aware of what data is in your table and what queries are expected on it. You need to know the subsets within your data and how they map up to the queries fired against the table.

How Data is Stored

An index consists of pages arranged in a B-tree. The root node sits at the top with leaf nodes at the bottom.

When a query is fired against any column in the index, the query engine starts from the root node. It keeps going down from node to node until it reaches the correct leaf node where the required data is. Now if the index is clustered, the engine has reached where the actual data is stored. If the index is non-clustered, the engine gets the pointer to the actual data page, which is where the data is stored, from the leaf node.

Fragmentation

Before moving on to what fragmentation is, you should know about page splits and fill factor. Page splits occur when data is inserted into a data page that is full. SQL Server splits the page in approximately equal halves and creates the space for the new data. Page splits affect the performance of indexes as creating a new page results in fragmentation. Also, the process is very resource-intensive.

MSDN defines fill factor as “a percentage from 0 to 100 that specifies how much to fill the data pages after the index is created.” Setting the fill factor to 100 means there will be no space left in the pages. It should only be done for read-only tables as there will be no insertions, thus no page splits. Any lower value means there will be some space left in each page for values coming in future. It might be tempting to make this value pretty low, say 50, but it affects the performance since only half the data is read in one fetch by the server. So, be careful on this aspect while creating your indexes.

Basically, fragmentation is inefficient use of pages within the index. Inefficient use of pages may result because of an improper logical order of pages or because of incomplete usage of space in the data pages. In both the cases, this could result in drastic drop in performance of your indexes. Index fragmentation can be either external or internal.

External Fragmentation

When an index is created, the pages are logically arranged in order. When new rows are inserted in the table, new keys might be inserted in between these keys. For this to happen, SQL Server might need to create new data pages. These new pages are generally not adjacent to the existing pages. So, we might have the pages with a different physical order than the logical order. This is called external fragmentation.

Let’s understand this with the help of figures. For example, initially we had two data pages. Now if 4 is added to the table, a new page will have to be created to accommodate it.

With a query returning values from 3 to 11, the query engine has to fetch another data page. For a few pages, this may not matter much to the performance of the query. However, imagine the same scenario with millions of pages and thousands of insertions every day. In a nutshell, external fragmentation can prove to be one of the biggest culprits in spoiling the performance of your database system.

Internal Fragmentation

Internal fragmentation is the direct consequence of non-zero fill factor. It occurs when the index pages are not being used to their full capacity. It may or may not be bad for the system performance. For a table with large number of inserts, this will actually enhance the system performance. But, having a lot of internal fragmentation affects the performance as the number of reads are increased to read the same amount of data.

Design Considerations for Clustered Indexes

By default, a clustered index is matched with the primary key on the table. But, you can define the clustered index on one column and then create the primary key on some other column(s). In this case, the primary key would be created as a non-clustered index.

Clustered indexes store the data in ranges, say, 100-200 in one node and 200-300 in another node. So, if you have query searching for a range of data, it would be better if you define the clustered index on the column used to filter the data based on range. Also, if you have to search on a range for an audit log, it would be better to create clustered index on the date column. On the other hand, non-clustered indexes perform better for specific value searches.

Clustered indexes should be defined on “ever-increasing” columns. A date column in which older dates are not inserted can be an ever-increasing column. An identity column is always an ever-increasing column. It is important because if the columns are not ever-increasing, SQL Server would need to allocate space between existing ranges for future records rather than placing them at the end in new ranges. In this case, when the range fills up and a new value in between comes up, SQL Server will need to do a page split to allocate the new data. So, effectively we have two pages now, thus increasing the search time (and the resources it will take). You may want to approach in a slightly different manner by decreasing the fill factor (say to 65%). Now you have 35% free space for the incoming values. But the problem with this approach is the need of regular re-indexing to maintain the free space, which incurs a heavy processing cost to the server as it has to move the data along with rebuilding the non-clustered indexes.