Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
jeremyjsimmons
Starting Member
9 Posts |
Posted - 2014-05-05 : 15:32:20
|
I am using a stored procedure that uses a fast-forward read only cursor to extract data from a varchar(max) in table A, process the record with a CLR function that creates the new records that are insert into to table B. My issue is that the inserts into B take progressively longer the longer I run the procedure.The data comes from MiniProfiler (miniprofiler.com) which records individual sql calls that occur during a web-request in asp.net. There is one record per request in table MiniProfilerTimings with this schema. CREATE TABLE [dbo].[MiniProfilerTimings]( [RowId] [int] IDENTITY(1,1) NOT NULL, [Id] [uniqueidentifier] NOT NULL, [MiniProfilerId] [uniqueidentifier] NOT NULL, [ParentTimingId] [uniqueidentifier] NULL, [Name] [nvarchar](200) NOT NULL, [DurationMilliseconds] [decimal](9, 3) NOT NULL, [StartMilliseconds] [decimal](9, 3) NOT NULL, [IsRoot] [bit] NOT NULL, [Depth] [smallint] NOT NULL, [CustomTimingsJson] [nvarchar](max) NULL,CONSTRAINT [PK_MiniProfilerTimings] PRIMARY KEY CLUSTERED ( [RowId] ASC)The destination table looks like this:CREATE TABLE [dbo].[MiniProfilerSqlTimings]( [RowId] [int] IDENTITY(1,1) NOT NULL, [Id] [uniqueidentifier] NOT NULL, [MiniProfilerId] [uniqueidentifier] NOT NULL, [ParentTimingId] [uniqueidentifier] NOT NULL, [ExecuteType] [tinyint] NOT NULL, [StartMilliseconds] [decimal](7, 1) NOT NULL, [DurationMilliseconds] [decimal](7, 1) NULL, [FirstFetchDurationMilliseconds] [decimal](7, 1) NULL, [StackTraceSnippet] [nvarchar](300) NOT NULL, [CommandString] [nvarchar](200) NOT NULL, [Parameters] [nvarchar](max) NULL, CONSTRAINT [PK_MiniProfilerSqlTimings] PRIMARY KEY CLUSTERED ( [RowId] ASC)I am using the SQL CLR code from http://jqsql.com/ to parse the JSON which is orders of magnitude faster than using a TVF like this one https://www.simple-talk.com/sql/t-sql-programming/consuming-json-strings-in-sql-server/My procedure prUpdateMiniProfilers is essentially this:DECLARE Json_Cursor CURSOR FAST_FORWARD FOR -- Get MiniProfierTimings rows are not yet processedSELECT t.Id, t.MiniProfilerId, t.ParentTimingId, t.CustomTimingsJson FROM dbo.MiniProfilerTimings t WHERE t.CustomTimingsJson IS NOT NULL AND t.MiniProfilerId NOT IN (Select st.MiniProfilerId from MiniProfilerSqlTimings st WHERE st.MiniProfilerId = t.MiniProfilerId)OPEN Json_CursorFETCH NEXT FROM Json_Cursor INTO @TimingId, @MiniProfilerId, @ParentTimingId, @CustomTimingsJsonWHILE @@FETCH_STATUS = 0 BEGIN -- This table holds the intermediate result from parsing the JSON in MiniProfilerTimings.CustomTimingsJson -- This part always executes in < 2ms per row DECLARE @jsontable TABLE ( Id NVARCHAR(max), CommandString NVARCHAR(max), ExecuteType NVARCHAR(max), StackTraceSnippet NVARCHAR(max), StartMilliseconds FLOAT, DurationMilliseconds FLOAT, FirstFetchDurationMilliseconds FLOAT ) INSERT INTO @jsontable EXEC [jqsql].[totable] @CustomTimingsJson, 'sql' -- This inserts intermediate results + job numbers into the final table. -- as the number of total rows processed in the sproc increases, the total time to do the insert also increases. INTO MiniProfilerSqlTimings ( [Id], [MiniProfilerId], [ParentTimingId], [ExecuteType], [StartMilliseconds], [DurationMilliseconds], [FirstFetchDurationMilliseconds], [StackTraceSnippet], [CommandString], [Parameters]) SELECT CAST(Id AS UNIQUEIDENTIFIER) AS Id, @MiniProfilerId AS MiniProfilerId, @ParentTimingId AS ParentTimingId, CASE ExecuteType WHEN 'NonQuery' THEN 1 WHEN 'Scalar' THEN 2 WHEN 'Reader' THEN 3 ELSE 0 -- 'None' or anything else END AS ExecuteTypeValue, CAST(StartMilliseconds AS DECIMAL(7,1)) AS StartMilliseconds, CAST(DurationMilliseconds AS DECIMAL(7,1)) AS DurationMilliseconds, CAST(FirstFetchDurationMilliseconds AS DECIMAL(7,1)) AS FirstFetchDurationMilliseconds, CAST(StackTraceSnippet AS VARCHAR(300)), CASE WHEN LEFT(CommandString, 7) = 'DECLARE' THEN CAST(SUBSTRING(CommandString, LEN(CommandString) - CHARINDEX(CHAR(10), REVERSE(CommandString))+2, 100) AS VARCHAR(200)) ELSE CAST(CommandString AS VARCHAR(200)) END AS CommandText, -- take a deep breath. This next one isn't pretty CASE WHEN LEFT(CommandString, 7) = 'DECLARE' THEN SUBSTRING(CommandString, CHARINDEX('@', CommandString), -- Index of first argument LEN(CommandString) -- take the whole length of the string... - CHARINDEX(';', REVERSE(CommandString)) -- Less the number of characters at the end of the string until the terminating character ; .. -- - CHARINDEX('@', CommandString) -- Less the number of characters at the beginning of the string until the first character @ .. + 2 -- include the number of stripped characters ; and @ used in the CHARINDEX above ) ELSE NULL END AS Parameters FROM @Jsontable FETCH NEXT FROM Json_Cursor INTO @TimingId, @MiniProfilerId, @ParentTimingId, @CustomTimingsJsonEND -- WHILE FETCHRecord TotalTimeMs25 350 475 5100 7125 8150 14175 12200 12225 12250 14275 16300 16325 19350 48375 22400 23425 25450 25475 27500 28525 30550 99575 47600 47625 35650 38675 38700 91725 150750 41775 65800 53825 57850 48875 49900 48925 72950 51975 1631000 541025 591050 1601075 6531100 14531125 521150 521175 511200 501225 531250 541275 3881300 591325 571350 601375 781400 941425 631450 671475 711500 701525 751550 881575 751600 771625 781650 2501675 831700 841725 5021750 2001775 891800 44321825 3261850 921875 13291900 1281925 1031950 3191975 1222000 138There are some significant outliers in this data, but overall, the insert performance grows rather linearly which is undesirable. |
|
gbritton
Master Smack Fu Yak Hacker
2780 Posts |
Posted - 2014-05-06 : 09:16:10
|
"the insert performance grows rather linearly which is undesirable"Undesirable? Really? Sounds good to me! Are you looking for O(log n) (or better) insert performance? I could be wrong but I thinking anything less than O(n) is impossible.OTOH, you should get rid of the cursor-base approach. Using set-based logic is always faster (sometimes orders of magnitude better).I can't recall if anyone has a ready-to-go set-based JSON parser, but here's some links to get you started:http://www.sqlservercentral.com/articles/Tally+Table/72993/http://www.sqlservercentral.com/articles/String+Manipulation/94365/ |
|
|
jeremyjsimmons
Starting Member
9 Posts |
Posted - 2014-05-09 : 11:31:48
|
Perhaps it's my inexperience with the product with this scale of workload, but I do not understand the behavior.Records 1-50 insert in 3-4 ms each, for a total of 150-200msAs I keep inserting records, the time for each insert increases.Record 1525 took 75ms, not 3-4 as it did near the beginning of the transaction.I'm looking for consistent insert performance, not a linear growth in the amount of time per record.If you're able to suggest a set-based JSON parser, I'm all ears, but I don't believe one exists. |
|
|
gbritton
Master Smack Fu Yak Hacker
2780 Posts |
Posted - 2014-05-09 : 11:41:06
|
Aha! You are not experiencing linear growth in the size of the input (that's what I understood from your original post). You're actually seeing exponential growth in the size of the input (O(n^n)) which is of course undesirable! I believe that we could develop a set-based parser using the examples I cited. Can you post some sample data and desired output? |
|
|
jeremyjsimmons
Starting Member
9 Posts |
Posted - 2014-05-09 : 13:01:33
|
It would be more correct to say there is growth in the time to insert, without a correlation to the size of the input (json being parsed, and rows produced by it)The Result set should include the columns from the objects in the array named SQL (there is only ever one array)Id UNIQUEIDENTIFIER NOT NULL, CommandString VARCHAR(MAX) NOT NULL, ExecuteType VARCHAR(50) NOT NULL, StackTraceSnippet VARCHAR(2000) NOT NULL, StartMilliseconds DECIMAL(7,1) NULL, DurationMilliseconds DECIMAL(7,1) NULL, FirstFetchDurationMilliseconds DECIMAL(7,1) NULLHere's an example input:{"sql":[{"Id":"a312b4a9-461c-45b2-bc6b-9c61fb13ab18","CommandString":"DECLARE @SessionId nvarchar(24) = N\u0027p0utee4f4wa04m14ue1p4s13\u0027;\r\n\r\nprGetUserIdFromClientLogin","ExecuteType":"Scalar","StackTraceSnippet":"GetSPField GetUserIdForSession InitializeUserAuth Page_Load ProcessRequest","StartMilliseconds":1303.5,"DurationMilliseconds":115.1,"FirstFetchDurationMilliseconds":0},{"Id":"25099cfd-45df-42da-95df-52c0c74f2e66","CommandString":"DECLARE @UserID bigint = 69512;\r\n\r\nprSessionState","ExecuteType":"Reader","StackTraceSnippet":"GetSPDataSet GetSPDataTable GetSession GetUserSessionState InitializeUserAuth Page_Load ProcessRequest","StartMilliseconds":1462.2,"DurationMilliseconds":145.6,"FirstFetchDurationMilliseconds":7.9},{"Id":"ae5beb87-2ca4-4ece-9548-7335819c5060","CommandString":"DECLARE @ClientID bigint = 69512;\r\n\r\nprClientPermissions","ExecuteType":"Reader","StackTraceSnippet":"GetSPDataSet GetSPDataTable getClientPermissions InitializeUserAuth Page_Load ProcessRequest","StartMilliseconds":1618.3,"DurationMilliseconds":1.1,"FirstFetchDurationMilliseconds":0.8}]} |
|
|
TG
Master Smack Fu Yak Hacker
6065 Posts |
Posted - 2014-05-09 : 14:43:39
|
It's not clear to me that the insert to sql server table is where the time/row increase is happening.Part of my confusion: Are the result times you posted for every 25 rows related to the start and duration milliseconds that you are inserting? Is it possible to comment out the actual insert to your final table and still measure the times for which you are reporting in those results? Perhaps the CLR execution time is responsible for the growth or maybe even the re-use of the table variable.Be One with the OptimizerTG |
|
|
gbritton
Master Smack Fu Yak Hacker
2780 Posts |
Posted - 2014-05-09 : 14:45:13
|
I got something to work using CROSS APPLY and those functions I mentioned to parse the input then PIVOT to produce the output. Have a look:declare @json varchar(4000) = '{"sql":[{"Id":"a312b4a9-461c-45b2-bc6b-9c61fb13ab18","CommandString":"DECLARE @SessionId nvarchar(24) = N\u0027p0utee4f4wa04m14ue1p4s13\u0027;\r\n\r\nprGetUserIdFromClientLogin","ExecuteType":"Scalar","StackTraceSnippet":"GetSPField GetUserIdForSession InitializeUserAuth Page_Load ProcessRequest","StartMilliseconds":1303.5,"DurationMilliseconds":115.1,"FirstFetchDurationMilliseconds":0},{"Id":"25099cfd-45df-42da-95df-52c0c74f2e66","CommandString":"DECLARE @UserID bigint = 69512;\r\n\r\nprSessionState","ExecuteType":"Reader","StackTraceSnippet":"GetSPDataSet GetSPDataTable GetSession GetUserSessionState InitializeUserAuth Page_Load ProcessRequest","StartMilliseconds":1462.2,"DurationMilliseconds":145.6,"FirstFetchDurationMilliseconds":7.9},{"Id":"ae5beb87-2ca4-4ece-9548-7335819c5060","CommandString":"DECLARE @ClientID bigint = 69512;\r\n\r\nprClientPermissions","ExecuteType":"Reader","StackTraceSnippet":"GetSPDataSet GetSPDataTable getClientPermissions InitializeUserAuth Page_Load ProcessRequest","StartMilliseconds":1618.3,"DurationMilliseconds":1.1,"FirstFetchDurationMilliseconds":0.8}]}';with parsed as ( select e.* from (values(@json)) v(j)-- Find the start of the 'sql' array cross apply ( select substring(v.j, charindex('"sql":[', v.j) + 7,len(v.j)) ) a(sql)-- Extract the array, discarding the rest cross apply ( select substring(a.sql, 1, charindex('}]',a.sql)) ) b(sql)-- Split the result up by starting braces cross apply ( select *, rn= row_number() over(order by d.itemnumber) from dbo.DelimitedSplitN4K(b.sql, '{') d where d.item <> '' ) c-- Further split by ending braces (could have used substring/charinded here) cross apply ( select c.rn, d.* from dbo.DelimitedSplitN4K(c.Item, '}') d where d.item not in ('',',') ) d-- Select just the bits we want, discarding delimiters cross apply ( select d.rn, p.ItemNumber, p.Item from dbo.PatternSplitCM(d.Item, '%[^":]%') p where p.Matched = 1 and p.Item not in ('',',') ) e)-- Build the outputselect rn , Id , ExecuteType , CommandString , StackTraceSnippet , StartMilliseconds , DurationMilliseconds , FirstFetchDurationMilliseconds-- Source is the parsed output from above with a self-join based on row and item numberfrom (select p1.rn, p1.item p1item, p2.item p2item from parsed p1 join parsed p2 on p2.rn = p1.rn and p2.ItemNumber = p1.ItemNumber+2) src-- pivot on the items we wantpivot (max(p2item) for p1item in ( Id , CommandString , ExecuteType , StackTraceSnippet , StartMilliseconds , DurationMilliseconds , FirstFetchDurationMilliseconds)) pvt |
|
|
gbritton
Master Smack Fu Yak Hacker
2780 Posts |
Posted - 2014-05-09 : 15:32:01
|
Simpler version:;with parsed as ( select p1.rn, p2.ItemNumber, p2.item -- Split into array members from ( select p1.*, rn=row_number() over(order by (select 1)) from dbo.PatternSplitCM(@json, '[^{}]') p1 ) p1 -- split array members into key/value pairs cross apply ( select * from dbo.PatternSplitCM(p1.item, '[^{}:",[]') WHERE Matched = 1 and item not in ('',',') ) p2 )-- Build the outputselect rn , Id , ExecuteType , CommandString , StackTraceSnippet , StartMilliseconds , DurationMilliseconds , FirstFetchDurationMilliseconds-- Source is the parsed output from above with a self-join based on row and item numberfrom (select p1.rn, p1.item p1item, p2.item p2item from parsed p1 join parsed p2 on p2.rn = p1.rn and p2.ItemNumber = p1.ItemNumber+2 ) src-- pivot on the items we wantpivot (max(p2item) for p1item in ( Id , CommandString , ExecuteType , StackTraceSnippet , StartMilliseconds , DurationMilliseconds , FirstFetchDurationMilliseconds)) pvt |
|
|
jeremyjsimmons
Starting Member
9 Posts |
Posted - 2014-05-12 : 12:51:38
|
@TG - The timing I was quoting was wrapped around the INSERT INTO MiniProfilerSqlTimings FROM @JsonTimings.I've changed some things, and got the performance back to a profile I'm happy with. Essentially O(n) for each row.1. the cursor was using a WHERE ID NOT IN (SELECT ID FROM FinalTable)changed this to WHERE ID NOT IN (SELECT ID FROM FinalTable WITH (NOLOCK))2. Changed use of table variables.Declared the table variables outside the cursor. Each loop through the cursor, I'm inserting into the final table, and then clearing the rows in the table variable.3. Insert into final table is done using WITH (TABLOCK) hint. |
|
|
jeremyjsimmons
Starting Member
9 Posts |
Posted - 2014-05-12 : 13:06:54
|
@gbrittonCool function & technique. I don't quite understand how I can apply this to the entire table at once.It seems to be you're still using a variable @json, and I would have to use this in the cursor?Is there a way to use this technique in the context of the source and destination table?Also, I tried it on a larger case, and only returned 20 out of the 127 rowsThanks. |
|
|
gbritton
Master Smack Fu Yak Hacker
2780 Posts |
Posted - 2014-05-12 : 13:19:27
|
Get rid of that cursor! I don't think you need it. Is [jqsql].[totable] in the CLR you're using? If not, can you post the source? If you can set it up to call as a function instead of a proc, it should make life easier. |
|
|
TG
Master Smack Fu Yak Hacker
6065 Posts |
Posted - 2014-05-12 : 13:42:12
|
I'm glad your changes resolved the execution time growth problem. If you can't get the non-cursor version working - and I agree with gbritton that anytime you can do a set based approach you should - then you may consider this change as well:switch to a #temp table and don't delete between the iterations and don't insert to the final table in the iterations. Collect all the data then do one insert to your final table after the looping is done. Play with NOT IN vs. LEFT OUTER JOIN WHERE ID is NULL.Be One with the OptimizerTG |
|
|
gbritton
Master Smack Fu Yak Hacker
2780 Posts |
Posted - 2014-05-13 : 07:52:18
|
quote: Originally posted by jeremyjsimmons @gbrittonCool function & technique. I don't quite understand how I can apply this to the entire table at once.It seems to be you're still using a variable @json, and I would have to use this in the cursor?Is there a way to use this technique in the context of the source and destination table?Also, I tried it on a larger case, and only returned 20 out of the 127 rowsThanks.
Can you post the sample data that failed? My code only addressed the specific example you posted. It probably needs some extras to handle a full data set. Also, can you post the code to the proc you call in the middle? |
|
|
jeremyjsimmons
Starting Member
9 Posts |
Posted - 2014-05-13 : 16:41:59
|
@gbritton. I don't see how I can avoid the use of the cursor, [dbo].[jqsql.totable] is a CLR Stored Procedure. I'm using a modified version of jqsql.com. I can post to github if you're interested.I was able to make your simplified version work using CROSS APPLY. I believe there are still some edge cases to work out, but otherwise it's an interesting option. I appreciate your offer to look at the full data, but I'm going to decline from posting it. |
|
|
jeremyjsimmons
Starting Member
9 Posts |
Posted - 2014-05-13 : 18:05:09
|
@TG - thank you for the additional suggestions. Why is the single insert from the #temp table desirable over the insert from the @table every iteration through the loop.The NOT IN vs LEFT OUTER JOIN WHERE ID IS NULL vs NOT EXISTS (SELECT 1 FROM Destination WHERE A.ID = Destination.ID) all have very similar query plans. Why would I choose one over the other? |
|
|
gbritton
Master Smack Fu Yak Hacker
2780 Posts |
Posted - 2014-05-14 : 08:47:22
|
quote: Originally posted by jeremyjsimmons @gbritton. I don't see how I can avoid the use of the cursor, [dbo].[jqsql.totable] is a CLR Stored Procedure. I'm using a modified version of jqsql.com. I can post to github if you're interested.I was able to make your simplified version work using CROSS APPLY. I believe there are still some edge cases to work out, but otherwise it's an interesting option. I appreciate your offer to look at the full data, but I'm going to decline from posting it.
Since you modified jsql.com to make your CLR, can you make it so it can be called as a table-valued function instead of a stored procedure? If so, you can eliminate the cursor, since the function will just return another table. |
|
|
TG
Master Smack Fu Yak Hacker
6065 Posts |
Posted - 2014-05-14 : 13:50:18
|
quote: Originally posted by jeremyjsimmons @TG - thank you for the additional suggestions. Why is the single insert from the #temp table desirable over the insert from the @table every iteration through the loop.The NOT IN vs LEFT OUTER JOIN WHERE ID IS NULL vs NOT EXISTS (SELECT 1 FROM Destination WHERE A.ID = Destination.ID) all have very similar query plans. Why would I choose one over the other?
well, those were somewhat intuitive for me - I don't know if they would be improvements or not. But here was my thinking:the single insert allows you to avoid deleting/truncating your staging table during each iteration - that could help somewhat. It also limits your access to the live, public table to a single point in time rather than continuously throughout the process. That may be less contentious with other processes and should require less locking management for sql - ie acquire and release locks once rather than a bunch of times.The @table vs #table sort of goes along with the other suggestion. When dealing with larger volumes of data I try not to use table variables.left outer join vs not in:you're right, ideally sql server will resolve both options to a very similar if not identical plan. but my habit has been to use the left outer join where null as a default. That technique is easier for me personally to read and, anecdotally speaking, seems to get a bad plan less often.Be One with the OptimizerTG |
|
|
jeremyjsimmons
Starting Member
9 Posts |
Posted - 2014-05-15 : 13:06:26
|
@gbritton - I will attempt to change the calling style of the jqsql procedure to a table-valued function instead of a stored procedure and let you know the outcome. |
|
|
jeremyjsimmons
Starting Member
9 Posts |
Posted - 2014-05-15 : 13:09:20
|
@TG - thank you for appealing to the specifics. I agree on all points: 1. eliminating the writes to the staging table 2. limiting access to the public table to avoid lock acquisition.I did compare the query plans for the three options and I could not tell where there was a significant difference between them.Thank you for suggesting the additional options of LEFT JOIN...WHERE NOT NULL.I'm trying to further my sql knowledge this year. Can you recommend any great books? |
|
|
|
|
|
|
|