Late last week, a coworker sent me a blog post on Clustered Indexes and Heaps; titled: Unreasonable Defaults: Primary Key as the Clustering Key. It’s a blog post that, well.. go ahead and read it, and come back. I’ll wait.
What did you take away from that blog post? What really jumped out at you?
For the first three reads (yes, I read it more than three times), here’s what jumped out:
- Primary Keys are bad at being clustering keys because the data that we need is hardly ever sequential by ID (I assume doubly so if the primary key is a GUID).
- It’s better to make the clustering key something that actually puts the data that would most likely be accessed together on the same page.
- Heaps are better than having a Clustered Index if you don’t have a covering (Non Clustered) index.
Since I went to the crash course on SQL Server Performance Troubleshooting last May (H/T to the Brent Ozar Unlimited folks), I know a lot more than I did before about SQL Server (I didn’t know very much before, and the 3 day intensive class traded money for time — really worth it), so I’ve got an understanding of what is this blog post is really about — but it clashed with what I know about SQL Server.
The post didn’t really spell out those things that it probably should have, and that worried me:
- A covering Non Clustered Index is usually the best approach to get data quickly. The smaller this index (e.g., less columns in index, or smaller data types in the index), the better performance you’ll see. If you’re retrieving lots of columns with large or unknown sizes (
nvarchar(max)) and you’re filtering by your Clustering Key alone, a Clustered Index is going to be fast.
- Heaps are bad, MMkay? Well, that’s not entirely accurate. They’re great when you just need data any old place and don’t really care about retrieval in an orderly (or fast) fashion.
SELECT *is bad because it doesn’t allow you to use your beautiful non-clustered index alone; it forces use of the Clustered Index (if one exists) and does not allow for the fastest possible data retrieval (Unless you really need everything, in which case, does your App look like this?)
It mentioned these things, to be sure (well, the first and third bullet), but it didn’t make them the ‘bold face type’ it should have; and didn’t explain it in a way that a non DBA would get the same understanding from it as someone who spends their days tuning SQL.
The takeaway is this: Whenever you’re reading a blog post on performance, context matters. Authors will assume knowledge you may or may not have. They will assume context that may not fit your business needs. The most important takeaway you can have is never move on the basis of a demo alone.
Note: the Brent Ozar Unlimited team is giving the SQL Server Performance Troubleshooting class again this year. I highly recommend it. It doesn’t matter if you’re a fifth year DBA or a Developer trying to level up your SQL Server game — it’s more than worth your time. If you have a conference budget this year, forget a conference — pick a training class and do it.
Full Disclosure: Brent Ozar took a look over this post to make sure I wasn’t talking out of my ass. I thank him for that. I endorsed their class before he agreed to fact-check this post for me.