Counters:
Current Queue Length: Sustained high queue for a period of time means that the IO subsystem has a problem. Average Queue Length: If the Average Queue Length for the disk is high more than 2. Then this indicates a potential IO subsystem problem. This would also make the disk sec/read and disk sec/write high. Disk Sec/Read: This counter should measure not more than 15ms in normal situation. Continuous values of more than 20ms indicate that the disks have a problem.
Under these conditions, consider the following
Move the files to additional drives.
Check for proper indexing on the tables.
Check for the CPU and memory counters to find any other potential bottlenecks.
Disk Sec/Writes: This counter should measure not more than 15ms in normal situation. Continuous values of more than 20ms indicate that the disks have a problem.
Under these conditions, consider the following
Move the files to additional drives.
Check for proper indexing on the tables.
Check for the CPU and memory counters to find any other potential bottlenecks.
Placing transactional log files in separate drives to remove additional pressure on the drives
Memory/Cache:
Object Name: Memory:
Counters:
Page Faults/Sec: Page faults occur when the page that is not requested is not found in the memory. There are 2 types of page faults. Hard page faults and Soft page faults. Hard page faults require a disk access and where as Soft page faults are ones where the page is found elsewhere in the memory. High number of page faults/sec indicates that there is a problem with memory used by SQL Server. Use this counter in correlation with SQL Server:Memory Manager counters to check if there is a memory pressure. Pages/Sec: This counter is the actual value of the Hard faults and should be used in correlation with Page Faults/Sec and SQL Server memory Manager counters to find the memory pressure.
CPU:
Object Name: Processor
Counters:
%User Time: SQL Server runs in User mode. Privilege mode is used by Operating system to access hardware and memory components. This counter should have a value more than 70%. If it has a value less than that check the %privileged time and %processor time counters to see if there is a problem with the processor. %Privileged Time: Operating system moves thread to Privileged mode to access services. This counter should value less than 20%. %Processor Time: This counter tells the percentage of the time that CPU is executing over a period. Ideally it should value less than 70%. Interrupts/Sec: Interrupts/sec is the average rate, in incidents per second, at which the processor received and serviced hardware interrupts.
System: Object Name: System
Counters:
Processor Queue Length: Is the number threads that are waiting to be processed by the processor. This counter is calculated by the actual value divided by the number of processors. Ideally value should be less than 2.
SQL Server:
Object Name: Access Methods
Counters:
Forwarded Records/Sec: Number of records fetched through forwarded record pointers. Tables with NO clustered index can. If you start out with a short row, and update the row creating a wider row, the row may no longer fit on the data page. A pointer will be put in its place and the row will be forwarded to another page.
Can be avoided by:
Using default values
Using Char instead of Varchar.
Full Scans/Sec: Either table or Index in fully scanned. This could be caused because of wrong or unhelpful indexes. Page Splits/Sec: Number of page splits occurring as the result of index pages overflowing. Normally, this is associated with leaf pages of clustered indexes and non-clustered indexes. This can be avoided by having proper fill factor.
Object Name: Memory Manager
Counters:
Memory Grants Pending: Memory is needed to process each user request. If enough memory is not available then the user request waits for memory which cause performance hitch in executing the query.
This can be avoided by the following
Adding more memory to the server
Adding memory to SQL Server
Creating proper Indexes
Object Name: Buffer Manager
Counters:
Buffer Cache Hit Ratio: Percentage of time the pages requested are already in memory. It should be ideally more than 99% and if less than 95% indicates that SQL server has no enough memory and adding more memory would be beneficial. Checkpoints/Sec: Pages written to the disk during the Checkpoint process freeing up SQL cache. Memory pressure is indicated if this counter is more than 300 Secs. Lazy Writes/Sec: Pages written to the disk during the LazyWriter process freeing up SQL cache. Memory pressure is indicated if this counter is more than 300 Secs. Page Life Expectancy: This is one counter that I would love to monitor to check the memory pressure. This would give you the time in seconds the data pages reside in the SQL Server cache. If the value is low indicates the following problems.
SQL Cache is cold(Discussed in Page Faults)
Memory Problems
Missing indexes.
If Checkpoints/Sec, Lazywriter/Sec and Page life expectancy together is less than 300 seconds then it indicates the cause of memory pressure and your server needs more memory.
Object Name: Databases
Counters:
Transactions/Sec: Indicates the number of transaction that occur per second in the server.
Object Name: General Statistics
Counters: User Connections: Number of connections made to the SLQ Server.
Object Name: Latches
Counters:
Average Latch wait Time: Latches are short term light weight synchronization object. Latches are not held for the duration of a transaction. Typical latching operations during row transfers to memory, controlling modifications to row offset table, etc. A high value here indicates potential problems with IO subsystem or Memory.
Object Name: Locks
Counters
Average Wait Time(ms), Lock Wait time(ms) and Lock waits/Sec: All these counters are related to the locks held by the SQL Server. Transaction should be as short as possible and should hold locks only for a short time without blocking other user queries, A high value here would indicate the following.
Memory Pressure
IO Subsystem problem
Improper indexes
Improper SQL structures
Improper placement of files in the disk subsystem
Incorrect usage of SQL Isolation levels
These are the major counters that I would wish to check. Apart from this you can also check the following counters.
Object Name Counter Name
SQL Statistics Compilations/sec
SQL Statistics Recompilations/sec
SQL Statistics Batch Requests/sec
SQL Server: Buffer Manager Readahead pages/sec
Current Queue Length: Sustained high queue for a period of time means that the IO subsystem has a problem. Average Queue Length: If the Average Queue Length for the disk is high more than 2. Then this indicates a potential IO subsystem problem. This would also make the disk sec/read and disk sec/write high. Disk Sec/Read: This counter should measure not more than 15ms in normal situation. Continuous values of more than 20ms indicate that the disks have a problem.
Under these conditions, consider the following
Move the files to additional drives.
Check for proper indexing on the tables.
Check for the CPU and memory counters to find any other potential bottlenecks.
Disk Sec/Writes: This counter should measure not more than 15ms in normal situation. Continuous values of more than 20ms indicate that the disks have a problem.
Under these conditions, consider the following
Move the files to additional drives.
Check for proper indexing on the tables.
Check for the CPU and memory counters to find any other potential bottlenecks.
Placing transactional log files in separate drives to remove additional pressure on the drives
Memory/Cache:
Object Name: Memory:
Counters:
Page Faults/Sec: Page faults occur when the page that is not requested is not found in the memory. There are 2 types of page faults. Hard page faults and Soft page faults. Hard page faults require a disk access and where as Soft page faults are ones where the page is found elsewhere in the memory. High number of page faults/sec indicates that there is a problem with memory used by SQL Server. Use this counter in correlation with SQL Server:Memory Manager counters to check if there is a memory pressure. Pages/Sec: This counter is the actual value of the Hard faults and should be used in correlation with Page Faults/Sec and SQL Server memory Manager counters to find the memory pressure.
CPU:
Object Name: Processor
Counters:
%User Time: SQL Server runs in User mode. Privilege mode is used by Operating system to access hardware and memory components. This counter should have a value more than 70%. If it has a value less than that check the %privileged time and %processor time counters to see if there is a problem with the processor. %Privileged Time: Operating system moves thread to Privileged mode to access services. This counter should value less than 20%. %Processor Time: This counter tells the percentage of the time that CPU is executing over a period. Ideally it should value less than 70%. Interrupts/Sec: Interrupts/sec is the average rate, in incidents per second, at which the processor received and serviced hardware interrupts.
System: Object Name: System
Counters:
Processor Queue Length: Is the number threads that are waiting to be processed by the processor. This counter is calculated by the actual value divided by the number of processors. Ideally value should be less than 2.
SQL Server:
Object Name: Access Methods
Counters:
Forwarded Records/Sec: Number of records fetched through forwarded record pointers. Tables with NO clustered index can. If you start out with a short row, and update the row creating a wider row, the row may no longer fit on the data page. A pointer will be put in its place and the row will be forwarded to another page.
Can be avoided by:
Using default values
Using Char instead of Varchar.
Full Scans/Sec: Either table or Index in fully scanned. This could be caused because of wrong or unhelpful indexes. Page Splits/Sec: Number of page splits occurring as the result of index pages overflowing. Normally, this is associated with leaf pages of clustered indexes and non-clustered indexes. This can be avoided by having proper fill factor.
Object Name: Memory Manager
Counters:
Memory Grants Pending: Memory is needed to process each user request. If enough memory is not available then the user request waits for memory which cause performance hitch in executing the query.
This can be avoided by the following
Adding more memory to the server
Adding memory to SQL Server
Creating proper Indexes
Object Name: Buffer Manager
Counters:
Buffer Cache Hit Ratio: Percentage of time the pages requested are already in memory. It should be ideally more than 99% and if less than 95% indicates that SQL server has no enough memory and adding more memory would be beneficial. Checkpoints/Sec: Pages written to the disk during the Checkpoint process freeing up SQL cache. Memory pressure is indicated if this counter is more than 300 Secs. Lazy Writes/Sec: Pages written to the disk during the LazyWriter process freeing up SQL cache. Memory pressure is indicated if this counter is more than 300 Secs. Page Life Expectancy: This is one counter that I would love to monitor to check the memory pressure. This would give you the time in seconds the data pages reside in the SQL Server cache. If the value is low indicates the following problems.
SQL Cache is cold(Discussed in Page Faults)
Memory Problems
Missing indexes.
If Checkpoints/Sec, Lazywriter/Sec and Page life expectancy together is less than 300 seconds then it indicates the cause of memory pressure and your server needs more memory.
Object Name: Databases
Counters:
Transactions/Sec: Indicates the number of transaction that occur per second in the server.
Object Name: General Statistics
Counters: User Connections: Number of connections made to the SLQ Server.
Object Name: Latches
Counters:
Average Latch wait Time: Latches are short term light weight synchronization object. Latches are not held for the duration of a transaction. Typical latching operations during row transfers to memory, controlling modifications to row offset table, etc. A high value here indicates potential problems with IO subsystem or Memory.
Object Name: Locks
Counters
Average Wait Time(ms), Lock Wait time(ms) and Lock waits/Sec: All these counters are related to the locks held by the SQL Server. Transaction should be as short as possible and should hold locks only for a short time without blocking other user queries, A high value here would indicate the following.
Memory Pressure
IO Subsystem problem
Improper indexes
Improper SQL structures
Improper placement of files in the disk subsystem
Incorrect usage of SQL Isolation levels
These are the major counters that I would wish to check. Apart from this you can also check the following counters.
Object Name Counter Name
SQL Statistics Compilations/sec
SQL Statistics Recompilations/sec
SQL Statistics Batch Requests/sec
SQL Server: Buffer Manager Readahead pages/sec
No comments:
Post a Comment