To understand why and when to add more datafiles, it is important to know:
What is allocation contention
What is the purpose of adding datafiles
How this issue differs in SQL 2022 vs SQL2019?
What is Allocation Contention
When any insert, update or delete operations occur, they need to acquire a latch on PFS, GAM and SGAM pages to gain access and get information and modify the pages. Latch means a light weight lock. Under normal circumstances, there will be certain amount of insert, update or delete operations running on a database, then these operations can easily acquire a latch on these 3 pages.
But on TempDB, when there is a high level of concurrent activity, which means there are too many concurrent operations, then all these operations have to wait in line to acquire a latch onto these 3 special pages, causing contention. In short, to allocate an object, the operations wait to get a latch, causing contention and that is why it is called allocation contention.
In this article, I’m focusing on TempDB, since it is very common for allocation contention to occur on TempDB and rarely on user databases.
How to identify allocation contention
To identify allocation contention, you can monitor PAGELATCH waits on PFS, GAM, and SGAM pages in TempDB. Let’s do a demo run to see how it looks.
I’m going to use ostress utility to generate a workload against SQL Server, to cause contention and then troubleshoot issues that arise out of it.
I’m running the following command on RML cmd prompt to generate workload:
ostress.exe -S"servername" -Q"exec dbatest.dbo.PopulateTempTable" -n50 -r1000 -q
Now, run the following query to check if contention exists from SSMS:
USE master
GO
SELECT
er.session_id, er.wait_type, er.wait_resource,
OBJECT_NAME(page_info.[object_id],page_info.database_id) as [object_name],
page_info.page_type_desc,
er.blocking_session_id,er.command,
SUBSTRING(st.text, (er.statement_start_offset/2)+1,
((CASE er.statement_end_offset
WHEN -1 THEN DATALENGTH(st.text)
ELSE er.statement_end_offset
END - er.statement_start_offset)/2) + 1) AS statement_text,
page_info.database_id,page_info.[file_id], page_info.page_id, page_info.[object_id],
page_info.index_id,
CASE
WHEN page_info.page_type_desc IN('SGAM_PAGE','GAM_PAGE', 'PFS_PAGE')
AND WAIT_TYPE IN ('PAGELATCH_SH','PAGELATCH_UP','PAGELATCH_EX')THEN 'ALLOCATION CONTENTION'
WHEN page_info.page_type_desc IN ('DATA_PAGE', 'INDEX_PAGE')
AND WAIT_TYPE IN ('PAGELATCH_SH','PAGELATCH_UP','PAGELATCH_EX')THEN 'METADATA CONTENTION'
END AS allocation_type
FROM sys.dm_exec_requests AS er
CROSS APPLY sys.dm_exec_sql_text(er.sql_handle) AS st
CROSS APPLY sys.fn_PageResCracker (er.page_resource) AS r /*database ID, file ID, page ID SQL2019*/
CROSS APPLY sys.dm_db_page_info(r.[db_id], r.[file_id], r.page_id, 'DETAILED') AS page_info /*replace dbcc page SQL2019*/
WHERE er.wait_type like 'PAGELATCH%'
GO
Let’s interpret the results:
wait_type = PAGELATCH
wait_resource a= 2:4:2, this is database id: datafile id: pagenumber.
Database ID 2 is always TempDB
Datafile ID is 4, this can be any number between 1 and total number of datafiles.
Page 2 is always GAM page.page_type_desc gives you what page type the page number is. Here it is GAM page.
In above case, there is contention(PAGELATCH) on GAM page, and this is a good example of allocation contention. As we just saw in the beginning, when there is contention(PAGELATCH) on PFS,GAM or SGAM pages, then it is allocation contention.
How to Resolve Allocation Contention
Option 1: Upgrade to SQL 2022
Best way to get rid of allocation contention is to upgrade to SQL 2022. On SQL 2022, transactions need not acquire exclusive pagelatches on PFS,GAM or SGAM pages anymore, instead they could share the pagelatches, which resolves this contention type. You don’t need to add any datafiles on SQL 2022 version and this contention is resolved by default.
Let’s generate the workload on SQL 2022 to test it out:
ostress.exe -S"servername" -Q"exec dbatest.dbo.PopulateTempTable" -n50 -r1000 -q
I brought up the perfmon to check if there are any pagelatches while the workload is running. See the image below, green line is throughput and red line is pagelatches. Notice how red line is a flat zero, which I highlighted in orange box. This indicates there are no pagelatches and there is no need to add any datafiles. It is just sufficient with the number of datafiles(8) that comes with the install. I didn’t have to add any since there was no contention on PFS,GAM or SGAM pages. This is the beauty of SQL 2022 version because you won’t see contention on PFS,GAM or SGAM pages anymore, which means you don’t need to add any TempDB datafiles.
Well, there could be some edge cases but I’m yet to come across. If there were still latches waiting on PFS,GAM or SGAM pages, adding another 4 or 8 datafiles should mostly resolve it.
Option 2: Adding Datafiles in SQL Server 2019 and Older
On versions SQL 2019 and older, add multiple datafiles in numbers of 4 or 8 as needed with equal size. Then, check if contention exists and if it does, again add 4 or 8 datafiles of equal size and repeat. You will find a sweet spot where you find contention is stabilized and you can stop adding files at that point. This will help you mitigate contention to some extent but not completely get rid of it.
Though on SQL 2019, you will not see contention on PFS page since latches can be shared to get access to PFS page. But transactions still need exclusive latches on GAM and SGAM page.
Let’s run the same workload on SQL 2019:
ostress.exe -S"servername" -Q"exec dbatest.dbo.PopulateTempTable" -n50 -r1000 -q
Right now, there are only 8 tempdb datafiles. In the graph, the maximum number of pagelatches are around 14000.
To reduce them, I’m going to add another 4 datafiles and check how the spike looks like.
ALTER DATABASE [tempdb] ADD FILE ( NAME = N'tempdev_9', FILENAME = N'T:\DATA\tempdev_9.ndf' , SIZE = 10240MB , FILEGROWTH = 524288KB )
GO
ALTER DATABASE [tempdb] ADD FILE ( NAME = N'tempdev_10', FILENAME = N'T:\DATA\tempdev_10.ndf' , SIZE = 10240MB , FILEGROWTH = 524288KB )
GO
ALTER DATABASE [tempdb] ADD FILE ( NAME = N'tempdev_11', FILENAME = N'T:\DATA\tempdev_11.ndf' , SIZE = 10240MB , FILEGROWTH = 524288KB )
GO
ALTER DATABASE [tempdb] ADD FILE ( NAME = N'tempdev_12', FILENAME = N'T:\DATA\tempdev_12.ndf' , SIZE = 10240MB , FILEGROWTH = 524288KB )
GO
4 datafiles have been added and I’m going to kick off the same workload again. Notice, how the spike went down to a high of ~7000 latches.
Since the number of pagelatches went down significantly from 14000 to 7000, I’m going to try to add another 4 datafiles and check if spike still goes down.
ALTER DATABASE [tempdb] ADD FILE ( NAME = N'tempdev_13', FILENAME = N'T:\DATA\tempdev_9.ndf' , SIZE = 10240MB , FILEGROWTH = 524288KB )
GO
ALTER DATABASE [tempdb] ADD FILE ( NAME = N'tempdev_14', FILENAME = N'T:\DATA\tempdev_10.ndf' , SIZE = 10240MB , FILEGROWTH = 524288KB )
GO
ALTER DATABASE [tempdb] ADD FILE ( NAME = N'tempdev_15', FILENAME = N'T:\DATA\tempdev_11.ndf' , SIZE = 10240MB , FILEGROWTH = 524288KB )
GO
ALTER DATABASE [tempdb] ADD FILE ( NAME = N'tempdev_16', FILENAME = N'T:\DATA\tempdev_12.ndf' , SIZE = 10240MB , FILEGROWTH = 524288KB )
GO
Added another set of 4 files. Now there are total of 16 files.
(8 from the install, then added 4 + 4).
Kicking off the same workload again.
The 3rd spike is the latest one after adding a second set of 4 tempdb datafiles, coming to a total of 16 tempdb datafiles. There is a slight reduction of pagelatches but not a lot from ~7000 to 5500, so I’m going to stop it at this point.
The reason I brought up PerfMon is to
Precisely track number of pagelatches that might be overlooked in SSMS.
See the difference in spike after adding each set of datafiles, helping you make an informed decision if you need to add more datafiles.
Helps you find out how much percentage of contention is reduced after adding datafiles
In our case, before adding any additional datafiles, there were a maximum of 14000 latches, and after adding 4+4 datafiles, there were finally around 5500 latches, which is a 60% reduction of pagelatches aka contention.
To summarize, before adding datafiles, peak pagelatch contention reached 14,000. After adding the two sets of 4 datafiles, contention dropped to approximately 5,500, a 60% reduction in contention in SQL 2019 version. But in SQL 2022, we didn’t have to add any apart from the 8 datafiles that came with the install.
Conclusion
Allocation contention is a critical concern, especially in TempDB, where concurrent operations can create bottlenecks and cause a slowdown in the system. By understanding how SQL Server manages allocation, you can work on reducing the contention. Best way to get rid of allocation contention is to upgrade to SQL 2022 vesion. On versions SQL 2019 and older, add datafiles in sets of 4 or 8 with equal size(same as existing files) and check for contention and repeat.
Thanks for reading,
Haripriya