Thursday 7 February 2013

B-tree structure of a SQL Server index


One of the most important routes to high performance in a SQL Server database is the index. Indexes speed up the querying process by providing swift access to rows in the data tables, similarly to the way a book’s index helps you find information quickly within that book. In this article, I provide an overview of SQL Server indexes and explain how they’re defined within a database and how they can make the querying process faster. Most of this information applies to indexes in both SQL Server 2005 and 2008; the basic structure has changed little from one version to the next. In fact, much of the information also applies to SQL Server 2000. This does not mean there haven’t been changes. New functionality has been added with each successive version; however, the underlying structures have remained relatively the same. So for the sake of brevity, I stick with 2005 and 2008 and point out where there are differences in those two versions.

Index Structures


Indexes are created on columns in tables or views. The index provides a fast way to look up data based on the values within those columns. For example, if you create an index on the primary key and then search for a row of data based on one of the primary key values, SQL Server first finds that value in the index, and then uses the index to quickly locate the entire row of data. Without the index, a table scan would have to be performed in order to locate the row, which can have a significant effect on performance.

You can create indexes on most columns in a table or a view. The exceptions are primarily those columns configured with large object (LOB) data types, such as image, text, and varchar(max). You can also create indexes on XML columns, but those indexes are slightly different from the basic index and are beyond the scope of this article. Instead, I'll focus on those indexes that are implemented most commonly in a SQL Server database.

An index is made up of a set of pages (index nodes) that are organized in a B-tree structure. This structure is hierarchical in nature, with the root node at the top of the hierarchy and the leaf nodes at the bottom, as shown in Figure 1.


Figure 1: B-tree structure of a SQL Server index

When a query is issued against an indexed column, the query engine starts at the root node and navigates down through the intermediate nodes, with each layer of the intermediate level more granular than the one above. The query engine continues down through the index nodes until it reaches the leaf node. For example, if you’re searching for the value 123 in an indexed column, the query engine would first look in the root level to determine which page to reference in the top intermediate level. In this example, the first page points the values 1-100, and the second page, the values 101-200, so the query engine would go to the second page on that level. The query engine would then determine that it must go to the third page at the next intermediate level. From there, the query engine would navigate to the leaf node for value 123. The leaf node will contain either the entire row of data or a pointer to that row, depending on whether the index is clustered or nonclustered.

 


 


 


 


Clustered Indexes


A clustered index stores the actual data rows at the leaf level of the index. Returning to the example above, that would mean that the entire row of data associated with the primary key value of 123 would be stored in that leaf node. An important characteristic of the clustered index is that the indexed values are sorted in either ascending or descending order. As a result, there can be only one clustered index on a table or view. In addition, data in a table is sorted only if a clustered index has been defined on a table.

Note: A table that has a clustered index is referred to as a clustered table. A table that has no clustered index is referred to as a heap.

Nonclustered Indexes


Unlike a clustered indexed, the leaf nodes of a nonclustered index contain only the values from the indexed columns and row locators that point to the actual data rows, rather than contain the data rows themselves. This means that the query engine must take an additional step in order to locate the actual data.

A row locator’s structure depends on whether it points to a clustered table or to a heap. If referencing a clustered table, the row locator points to the clustered index, using the value from the clustered index to navigate to the correct data row. If referencing a heap, the row locator points to the actual data row.

Nonclustered indexes cannot be sorted like clustered indexes; however, you can create more than one nonclustered index per table or view. SQL Server 2005 supports up to 249 nonclustered indexes, and SQL Server 2008 support up to 999. This certainly doesn’t mean you should create that many indexes. Indexes can both help and hinder performance, as I explain later in the article.

In addition to being able to create multiple nonclustered indexes on a table or view, you can also add included columns to your index. This means that you can store at the leaf level not only the values from the indexed column, but also the values from non-indexed columns. This strategy allows you to get around some of the limitations on indexes. For example, you can include non-indexed columns in order to exceed the size limit of indexed columns (900 bytes in most cases).

Index Types


In addition to an index being clustered or nonclustered, it can be configured in other ways:

  • Composite index: An index that contains more than one column. In both SQL Server 2005 and 2008, you can include up to 16 columns in an index, as long as the index doesn’t exceed the 900-byte limit. Both clustered and nonclustered indexes can be composite indexes.
  • Unique Index: An index that ensures the uniqueness of each value in the indexed column. If the index is a composite, the uniqueness is enforced across the columns as a whole, not on the individual columns. For example, if you were to create an index on the FirstName and LastName columns in a table, the names together must be unique, but the individual names can be duplicated.

A unique index is automatically created when you define a primary key or unique constraint:

    • Primary key: When you define a primary key constraint on one or more columns, SQL Server automatically creates a unique, clustered index if a clustered index does not already exist on the table or view. However, you can override the default behavior and define a unique, nonclustered index on the primary key.
    • Unique: When you define a unique constraint, SQL Server automatically creates a unique, nonclustered index. You can specify that a unique clustered index be created if a clustered index does not already exist on the table.

  • Covering index: A type of index that includes all the columns that are needed to process a particular query. For example, your query might retrieve the FirstName and LastName columns from a table, based on a value in the ContactID column. You can create a covering index that includes all three columns.

Index Design


As beneficial as indexes can be, they must be designed carefully. Because they can take up significant disk space, you don’t want to implement more indexes than necessary. In addition, indexes are automatically updated when the data rows themselves are updated, which can lead to additional overhead and can affect performance. As a result, index design should take into account a number of considerations.

Database


As mentioned above, indexes can enhance performance because they can provide a quick way for the query engine to find data. However, you must also take into account whether and how much you’re going to be inserting, updating, and deleting data. When you modify data, the indexes must also be modified to reflect the changed data, which can significantly affect performance. You should consider the following guidelines when planning your indexing strategy:

  • For tables that are heavily updated, use as few columns as possible in the index, and don’t over-index the tables.
  • If a table contains a lot of data but data modifications are low, use as many indexes as necessary to improve query performance. However, use indexes judiciously on small tables because the query engine might take longer to navigate the index than to perform a table scan.
  • For clustered indexes, try to keep the length of the indexed columns as short as possible. Ideally, try to implement your clustered indexes on unique columns that do not permit null values. This is why the primary key is often used for the table’s clustered index, although query considerations should also be taken into account when determining which columns should participate in the clustered index.
  • The uniqueness of values in a column affects index performance. In general, the more duplicate values you have in a column, the more poorly the index performs. On the other hand, the more unique each value, the better the performance. When possible, implement unique indexes.
  • For composite indexes, take into consideration the order of the columns in the index definition. Columns that will be used in comparison expressions in the WHERE clause (such as WHERE FirstName = 'Charlie') should be listed first. Subsequent columns should be listed based on the uniqueness of their values, with the most unique listed first.
  • You can also index computed columns if they meet certain requirements. For example, the expression used to generate the values must be deterministic (which means it always returns the same result for a specified set of inputs). For more details about indexing computed columns, see the topic “Creating Indexes on Computed Columns” in SQL Server Books Online.

Queries


Another consideration when setting up indexes is how the database will be queried. As mentioned above, you must take into account the frequency of data modifications. In addition, you should consider the following guidelines:

  • Try to insert or modify as many rows as possible in a single statement, rather than using multiple queries.
  • Create nonclustered indexes on columns used frequently in your statement’s predicates and join conditions.
  • Consider indexing columns used in exact-match queries.

Index Basics


In this article, I’ve tried to give you a basic overview of indexing in SQL Server and provide some of the guidelines that should be considered when implementing indexes. This by no means is a complete picture of SQL Server indexing. The design and implementation of indexes are an important component of any SQL Server database design, not only in terms of what should be indexed, but where those indexes should be stored, how they should be partitioned, how data will be queried, and other important considerations. In addition, there are index types that I have not discussed, such as XML indexes as well as the filtered and spatial indexes supported in SQL Server 2008. This article, then, should be seen as a starting point, a way to familiarize yourself with the fundamental concepts of indexing. In the meantime, be sure to check out SQL Server Books Online for more information about the indexes described here as well as the other types of indexes.

Wednesday 6 February 2013

Performance Counters

Counters:
Current Queue Length: Sustained high queue for a period of time means that the IO subsystem has a problem. Average Queue Length: If the Average Queue Length for the disk is high more than 2. Then this indicates a potential IO subsystem problem. This would also make the disk sec/read and disk sec/write high. Disk Sec/Read: This counter should measure not more than 15ms in normal situation. Continuous values of more than 20ms indicate that the disks have a problem.

Under these conditions, consider the following

    Move the files to additional drives.
    Check for proper indexing on the tables.
    Check for the CPU and memory counters to find any other potential bottlenecks.

Disk Sec/Writes: This counter should measure not more than 15ms in normal situation. Continuous values of more than 20ms indicate that the disks have a problem.

Under these conditions, consider the following

    Move the files to additional drives.
    Check for proper indexing on the tables.
    Check for the CPU and memory counters to find any other potential bottlenecks.
    Placing transactional log files in separate drives to remove additional pressure on the drives

Memory/Cache:

Object Name: Memory:

Counters:
Page Faults/Sec: Page faults occur when the page that is not requested is not found in the memory. There are 2 types of page faults. Hard page faults and Soft page faults. Hard page faults require a disk access and where as Soft page faults are ones where the page is found elsewhere in the memory. High number of page faults/sec indicates that there is a problem with memory used by SQL Server. Use this counter in correlation with SQL Server:Memory Manager counters to check if there is a memory pressure. Pages/Sec: This counter is the actual value of the Hard faults and should be used in correlation with Page Faults/Sec and SQL Server memory Manager counters to find the memory pressure.

CPU:

Object Name: Processor

Counters:
%User Time: SQL Server runs in User mode. Privilege mode is used by Operating system to access hardware and memory components. This counter should have a value more than 70%. If it has a value less than that check the %privileged time and %processor time counters to see if there is a problem with the processor. %Privileged Time: Operating system moves thread to Privileged mode to access services. This counter should value less than 20%. %Processor Time: This counter tells the percentage of the time that CPU is executing over a period. Ideally it should value less than 70%. Interrupts/Sec: Interrupts/sec is the average rate, in incidents per second, at which the processor received and serviced hardware interrupts.

System: Object Name: System

Counters:
Processor Queue Length: Is the number threads that are waiting to be processed by the processor. This counter is calculated by the actual value divided by the number of processors. Ideally value should be less than 2.

SQL Server:

Object Name: Access Methods

Counters:
Forwarded Records/Sec: Number of records fetched through forwarded record pointers. Tables with NO clustered index can. If you start out with a short row, and update the row creating a wider row, the row may no longer fit on the data page. A pointer will be put in its place and the row will be forwarded to another page.

Can be avoided by:

    Using default values
    Using Char instead of Varchar.

Full Scans/Sec: Either table or Index in fully scanned. This could be caused because of wrong or unhelpful indexes. Page Splits/Sec: Number of page splits occurring as the result of index pages overflowing. Normally, this is associated with leaf pages of clustered indexes and non-clustered indexes. This can be avoided by having proper fill factor.

Object Name: Memory Manager

Counters:
Memory Grants Pending: Memory is needed to process each user request. If enough memory is not available then the user request waits for memory which cause performance hitch in executing the query.

This can be avoided by the following

    Adding more memory to the server
    Adding memory to SQL Server
    Creating proper Indexes

Object Name: Buffer Manager
Counters:
Buffer Cache Hit Ratio: Percentage of time the pages requested are already in memory. It should be ideally more than 99% and if less than 95% indicates that SQL server has no enough memory and adding more memory would be beneficial. Checkpoints/Sec: Pages written to the disk during the Checkpoint process freeing up SQL cache. Memory pressure is indicated if this counter is more than 300 Secs. Lazy Writes/Sec: Pages written to the disk during the LazyWriter process freeing up SQL cache. Memory pressure is indicated if this counter is more than 300 Secs. Page Life Expectancy: This is one counter that I would love to monitor to check the memory pressure. This would give you the time in seconds the data pages reside in the SQL Server cache. If the value is low indicates the following problems.

    SQL Cache is cold(Discussed in Page Faults)
    Memory Problems
    Missing indexes.

If Checkpoints/Sec, Lazywriter/Sec and Page life expectancy together is less than 300 seconds then it indicates the cause of memory pressure and your server needs more memory.

Object Name: Databases

Counters:
Transactions/Sec: Indicates the number of transaction that occur per second in the server.

Object Name: General Statistics
Counters: User Connections: Number of connections made to the SLQ Server.

Object Name: Latches

Counters:
Average Latch wait Time: Latches are short term light weight synchronization object. Latches are not held for the duration of a transaction. Typical latching operations during row transfers to memory, controlling modifications to row offset table, etc. A high value here indicates potential problems with IO subsystem or Memory.

Object Name: Locks

Counters
Average Wait Time(ms), Lock Wait time(ms) and Lock waits/Sec: All these counters are related to the locks held by the SQL Server. Transaction should be as short as possible and should hold locks only for a short time without blocking other user queries, A high value here would indicate the following.

    Memory Pressure
    IO Subsystem problem
    Improper indexes
    Improper SQL structures
    Improper placement of files in the disk subsystem
    Incorrect usage of SQL Isolation levels

These are the major counters that I would wish to check. Apart from this you can also check the following counters.
Object Name     Counter Name
SQL Statistics     Compilations/sec
SQL Statistics     Recompilations/sec
SQL Statistics     Batch Requests/sec
SQL Server: Buffer Manager     Readahead pages/sec


Check point and Write-Ahead Transaction Log

A page modified in the cache, but not yet written to disk, is called a dirty page.
 
SQL Server uses a write-ahead log (WAL), which guarantees that no data modifications are written to disk before the associated log record is written to disk. This maintains the ACID properties for a transaction. 


To understand how the write-ahead log works, it is important for you to know how modified data is written to disk. SQL Server maintains a buffer cache into which it reads data pages when data must be retrieved. Data modifications are not made directly to disk, but are made to the copy of the page in the buffer cache. The modification is not written to disk until a checkpoint occurs in the database, or the modification must be written to disk so the buffer can be used to hold a new page. Writing a modified data page from the buffer cache to disk is called flushing the page. A page modified in the cache, but not yet written to disk, is called a dirty page.


At the time a modification is made to a page in the buffer, a log record is built in the log cache that records the modification. This log record must be written to disk before the associated dirty page is flushed from the buffer cache to disk. If the dirty page is flushed before the log record is written, the dirty page creates a modification on the disk that cannot be rolled back if the server fails before the log record is written to disk. SQL Server has logic that prevents a dirty page from being flushed before the associated log record is written. Log records are written to disk when the transactions are committed.

Tuesday 5 February 2013

Turning off the ghost cleanup task for a performance gain

Ghost cleanup process will still pop up every 5 seconds (every 10 on 2008) and start removing ghost records, potentially causing performance issues by keeping pages in the buffer pool, generating log records, and causing physical IOs. The ghost cleanup task is also one of the background processes that can cause IOs to occur on what looks like a totally quiescent system.

There is a way to turn off the ghost cleanup task, using trace flag 661, as documented in KB 920093. Be careful though!!! If you disable the ghost cleanup task, the space taken up by deleted records will *NOT* be released for reuse by SQL Server until you do something else to remove it, like rebuilding an index.

 
One method people sometimes consider is to force ghost cleanup to clean everything by performing a table or index scan (thus queuing all the deleted records up for the ghost cleanup task). Although this is an alternative, it still uses the ghost cleanup task to do the work, and on a very busy system with a very large number of deletes (warning: generalization! :-) it can be much more efficient to remove the deleted-but-not-yet-reclaimed records using index reorganize or index rebuild.
Turning on this trace flag can be useful as a performance gain on systems with very heavy delete workloads, but only as long as you're careful about it. It's not something that's generally recommended but it may be useful to you.

Time out occurred while waiting for buffer latch -- type 3, bp 0000000081FEE180, page 1:824976, stat 0xc0000b, database id: 8, allocation unit id: 6488064



Problem:

We found the error message in the sql server error log(SQL Server 2008 R2).

Error message 1:
 
Time out occurred while waiting for buffer latch -- type 3, bp 0000000081FEE180, page 1:824976, stat 0xc0000b, database id: 8, allocation unit id: 6488064, task 0x00000000043B8988 : 0, waittime 1892400, flags 0x100000003a, owning task 0x00000000043B8988. Continuing to wait.

Error message 2:

Also we are unable to run the full backup of one particular database.


Resolution :

Problem resolved after apply Sp2 SQL server 2008 R2 patch.



http://support.microsoft.com/kb/2545989





Resource Database

As everyone know till SQL Server 2000 there are four system database is available and they are master, model, msdb and tempdb databases. From SQL Server 2005 Microsoft has introduced a new system database to the database engine called Resource database (actual name for this database is mssqlsystemresource ) which is hidden to the users. What is the use of this Database? Resource database is a read-only and hidden database which physically contains all the system objects that accompanies with SQL Server. All the objects under sys schemas are exists logically in each database. The Resource database does not contain user data or user metadata. The ID of the Resource database is always 32767. The main purpose of Resource database is to make the update to the server in a fast and easy way. Since all the system objects resides under resource database, during the upgrade we can just over write the resource database files with the new resource db files which will update all the system objects exists in the database. In the old version of SQL all the system objects need to be dropped and recreated at the time of upgrade which is time consuming and the setup will fail if any one of the object gives some problem. By default the physical files for Resource database exists under the default data directory <drive>:Program FilesMicrosoft SQL ServerMSSQL10.<instance_name>Binn. You can find two files named mssqlsystemresource.mdf and mssqlsystemresource.ldf in the directory mentioned above. Can I backup the database? It is not possible to backup the database using BACKUP t-sql statement since the database will be in hidden state. Since backup is not possible obviously you cant restore the database. We have only one way to take the backup of this database is to copy the physical files of the database. You can stop the SQL server, copy the physical files of this database to another location and start the SQL Service. Now you can have the backup of Resource database physical files, you can just replace the physical files from the backup taken in case needed. How to access Resource database? I dont think there is any use in access this database for users, since users are curious about whats inside this database Im writing this. Since this is for testing purpose, please dont modify anything and overwrite the resource database physical files. Accessing the resource database is not possible in normal mode since the database will be in hidden state. There are two ways to access this database.
  • Attaching Physical files
  • Running SQL Server in single user mode
Lets discuss one by one Attaching Physical files
  • Stop SQL Server Services
  • Copy the physical files (mssqlsystemresource.mdf and mssqlsystemresource.ldf) of this database to another location. In this test Im going to copy the database to C:Temp folder
  • Start SQL Server Services
  • Attach the database as a user database (you can provide any name to the database) from the physical file copied.
1
2
3
sp_attach_db 'Resource',
'c:\temp\mssqlsystemresource.mdf',
'c:\temp\mssqlsystemresource.ldf'
  resourcedb1 Now you have attached the Resource database to sql server as a user database. You can now start exploring the database.   Running SQL Server in single user mode   In this method you can directly access the resource database however to do this you need to start SQL server services in single user mode. Lets access the database through single user mode.  
  • Start the SQL Services in single user mode.
  •  
  resourcedb2  
  • Now connect to the server and access the database as shown below.
  •  
  resourcedb3   resourcedb4   Thats it start exploring the database. You can see from the above image that the database is in read only mode.   Please note that this is for testing purpose only, please dont modify anything in the database unless you know what you are performing or as directed by Microsoft customer support services.   In normal mode we can get only two values for the Resource database. The are version number and the last update done for resource database.