Know thy indexes

Posted on 5/3/2013 @ 3:13 AM in #Bak2Basics by | Feedback | 1455 views

Okay so here’s another bak2basics kinda post – Know thy indexes. (If u got more stuff to add, please do :-))

What are indexes? Well indexes are exactly what you’d think they are – like at the back of a book? They are a logical arrangement of your data, so you can find the data easily. So SQL Server has two kinds of indexes, Clustered and Non-Clustered.

1. Both clustered and non-clustered indexes are implemented using a variant of B-Tree, but a clustered index reorganizes the data pages of the indexed table, whereas a non-clustered index is created in seperate data pages.

2. Clustered indexes are usually to support primary keys and should be used to index only narrow columns – thus making them a bad candidate for GUIDs. Thus, if you have GUIDs as your primary key, make sure it is non-clustered (by default your PK is clustered). Usually a clustered index key should column not exceed 16 bytes per row because the key column’s data will be repeated in the leaf nodes of every non clustered index – thereby hitting your performance.

3. A table can have only one clustered index, but it can have up to 249 non-clustered indexes.

4. It is extremely important to consider that non-clustered indexes will speed up your selects (if used wisely), but at the same time they will hurt your insert/update/deletes. This is because data is copied over to the non clustered indexes.

5. A column is considered a good candidate for an index when a typical where clause will restrict the subset of the data to less than 10% of the total data.

6. Indexing large or wide columns is mostly always a bad idea. :-). The exception is XML indexes in which the column can be large but the index is quite controllable. Another option is to use SQLCLR UDTs where indexing behavior is controllable.

Creating Indexes can be expensive, especially in production systems. SQL Server 2005 has introduced a few neat things to reduce contentions on your data when you are creating indexes in your production systems.

1. Create Index ix_monkey on Table(column) with (ONLINE = ON) – this specifies that the table being indexed is online, and even though the indexing process will take longer, it will keep your system more available.

2. Controlling locks using ALLOW_ROW_LOCKS, and ALLOW_PAGE_LOCKS – you can turn either one of these two OFF, which signifies that you donot want the optimizer to be a smart alec, and you want to ensure that locking isn’t done – thus ensuring high availability at the cost of a longer time to build your index.

Indexes, just like your car, need maintainence. Just like your car needs regular maintainence – Oil Changes, and regular expensive maintainence – Servicing, Indexes need to be defragmented (cheap), and rebuilt (expensive) every now and then.

Defragmenting indexes only defragments within data pages, but doesn’t move data pages between extents. In other words – it ain’t a complete job, but it’ll do for the meanwhile :-). Defragmenting an index can be done in SQL2k using DBCC INDEXDEFRAG, but in SQL Server 2005, you should instead use
ALTER INDEX ix_Monkey On Table(Column)
REORGANIZE

Rebuiliding indexes on the other hand is more expensive, but dude ju gotta do it. Well less frequently than Defragging but still, at times you must do it. In SQL2k, you used to rebuild indexes using DBCC REINDEX, but in SQL2k5, you should instead use

   1:  ALTER INDEX ix_Monkey On Table(Column)
   2:  REBUILD

So the question is, how the heck can you judge, whether your index needs defragging or rebuilding? Okay, thats really complex, take a deep breath and read on

1. If index fragmentation is below 10%, you’re okay – don’t bother.

2. If index fragmentation is between 10% and 30% – Defrag.

3. If index fragmentation is more than 30% – Rebuild.

Whew, that was easy !! :-). Which leaves the question, how the heck do I know the fragmentation percentage of my index?

Well SQL2k had a function called DBCC SHOWCONTIG, running DBCC SHOWCONTIG shows output as below -

   1:  Table: 'StoreContact' (30623152); index ID: 1, database ID: 7
   2:  TABLE level scan performed.
   3:  - Pages Scanned................................: 5
   4:  - Extents Scanned..............................: 2
   5:  - Extent Switches..............................: 1
   6:  - Avg. Pages per Extent........................: 2.5
   7:  - Scan Density [Best Count:Actual Count].......: 50.00% [1:2]
   8:  - Logical Scan Fragmentation ..................: 20.00%
   9:  - Extent Scan Fragmentation ...................: 50.00%
  10:  - Avg. Bytes Free per Page.....................: 1319.0
  11:  - Avg. Page Density (full).....................: 83.70%

SQL2k5 added something called dynamic management functions (sys.dm_*), which help you monitor changing information such as locks, and .. violla .. index health. So you should instead use sys.dm_db_index_physical_stats . You can easily join between the sys.indexes (not sysindexes ;-)), and sys.dm_db_index_physical_stats as like this – 

   1:  SELECT
   2:  OBJECT_NAME(i.object_id) AS TableName,
   3:  i.name AS IndexName,
   4:  ips.avg_fragmentation_in_percent
   5:  FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, 'DETAILED') ips
   6:  JOIN sys.indexes i ON
   7:  i.object_id = ips.object_id
   8:  AND i.index_id = ips.index_id
   9:  WHERE ips.avg_fragmentation_in_percent > 10 

To get an output as below

   1:  TableName      IndexName                             avg_fragmentation_in_percent
   2:  -----------    ------------------------------------  ----------------------------
   3:  StoreContact   PK_StoreContact_CustomerID_ContactID  20
   4:  StoreContact   AK_StoreContact_rowguid               66.6666666666667
   5:  StoreContact   IX_StoreContact_ContactID             50
   6:  StoreContact   IX_StoreContact_ContactTypeID         50

As you can see, because the output is tabular, you can now easily set an automated off-hours job, or simply specify ONLINE=ON, and rebuild or defrag your indexes as necessary. This will alleviate the overhead of the ever constant bickering with the irresponsible DBA who doesn’t maintain your indexes. Also, this gives you an easy to read average fragmentation, rather than logical and extent fragmentation. Truthfully, given that I can easily now convert index_id to index name, this gives me something I can easily relate with.

And speakin’ of which, my indexes above need rebuilding real bad, so I’ll go rebuild my indexes.

Sound off but keep it civil:

Older comments..