Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2012 Forums
 Transact-SQL (2012)
 Incrementally slower inserts

Author  Topic 

jeremyjsimmons
Starting Member

9 Posts

Posted - 2014-05-05 : 15:32:20
I am using a stored procedure that uses a fast-forward read only cursor to extract data from a varchar(max) in table A, process the record with a CLR function that creates the new records that are insert into to table B. My issue is that the inserts into B take progressively longer the longer I run the procedure.

The data comes from MiniProfiler (miniprofiler.com) which records individual sql calls that occur during a web-request in asp.net. There is one record per request in table MiniProfilerTimings with this schema.

CREATE TABLE [dbo].[MiniProfilerTimings](
[RowId] [int] IDENTITY(1,1) NOT NULL,
[Id] [uniqueidentifier] NOT NULL,
[MiniProfilerId] [uniqueidentifier] NOT NULL,
[ParentTimingId] [uniqueidentifier] NULL,
[Name] [nvarchar](200) NOT NULL,
[DurationMilliseconds] [decimal](9, 3) NOT NULL,
[StartMilliseconds] [decimal](9, 3) NOT NULL,
[IsRoot] [bit] NOT NULL,
[Depth] [smallint] NOT NULL,
[CustomTimingsJson] [nvarchar](max) NULL,
CONSTRAINT [PK_MiniProfilerTimings] PRIMARY KEY CLUSTERED
(
[RowId] ASC
)

The destination table looks like this:

CREATE TABLE [dbo].[MiniProfilerSqlTimings](
[RowId] [int] IDENTITY(1,1) NOT NULL,
[Id] [uniqueidentifier] NOT NULL,
[MiniProfilerId] [uniqueidentifier] NOT NULL,
[ParentTimingId] [uniqueidentifier] NOT NULL,
[ExecuteType] [tinyint] NOT NULL,
[StartMilliseconds] [decimal](7, 1) NOT NULL,
[DurationMilliseconds] [decimal](7, 1) NULL,
[FirstFetchDurationMilliseconds] [decimal](7, 1) NULL,
[StackTraceSnippet] [nvarchar](300) NOT NULL,
[CommandString] [nvarchar](200) NOT NULL,
[Parameters] [nvarchar](max) NULL,
CONSTRAINT [PK_MiniProfilerSqlTimings] PRIMARY KEY CLUSTERED
(
[RowId] ASC
)

I am using the SQL CLR code from http://jqsql.com/ to parse the JSON which is orders of magnitude faster than using a TVF like this one https://www.simple-talk.com/sql/t-sql-programming/consuming-json-strings-in-sql-server/

My procedure prUpdateMiniProfilers is essentially this:

DECLARE Json_Cursor CURSOR FAST_FORWARD FOR

-- Get MiniProfierTimings rows are not yet processed
SELECT t.Id, t.MiniProfilerId, t.ParentTimingId, t.CustomTimingsJson FROM dbo.MiniProfilerTimings t
WHERE t.CustomTimingsJson IS NOT NULL
AND t.MiniProfilerId NOT IN (Select st.MiniProfilerId from MiniProfilerSqlTimings st WHERE st.MiniProfilerId = t.MiniProfilerId)

OPEN Json_Cursor

FETCH NEXT FROM Json_Cursor INTO @TimingId, @MiniProfilerId, @ParentTimingId, @CustomTimingsJson
WHILE @@FETCH_STATUS = 0 BEGIN

-- This table holds the intermediate result from parsing the JSON in MiniProfilerTimings.CustomTimingsJson
-- This part always executes in < 2ms per row
DECLARE @jsontable TABLE (
Id NVARCHAR(max),
CommandString NVARCHAR(max),
ExecuteType NVARCHAR(max),
StackTraceSnippet NVARCHAR(max),
StartMilliseconds FLOAT,
DurationMilliseconds FLOAT,
FirstFetchDurationMilliseconds FLOAT
)
INSERT INTO @jsontable
EXEC [jqsql].[totable] @CustomTimingsJson, 'sql'

-- This inserts intermediate results + job numbers into the final table.
-- as the number of total rows processed in the sproc increases, the total time to do the insert also increases.
INTO MiniProfilerSqlTimings (
[Id],
[MiniProfilerId],
[ParentTimingId],
[ExecuteType],
[StartMilliseconds],
[DurationMilliseconds],
[FirstFetchDurationMilliseconds],
[StackTraceSnippet],
[CommandString],
[Parameters])
SELECT
CAST(Id AS UNIQUEIDENTIFIER) AS Id,
@MiniProfilerId AS MiniProfilerId,
@ParentTimingId AS ParentTimingId,
CASE ExecuteType
WHEN 'NonQuery' THEN 1
WHEN 'Scalar' THEN 2
WHEN 'Reader' THEN 3
ELSE 0 -- 'None' or anything else
END AS ExecuteTypeValue,
CAST(StartMilliseconds AS DECIMAL(7,1)) AS StartMilliseconds,
CAST(DurationMilliseconds AS DECIMAL(7,1)) AS DurationMilliseconds,
CAST(FirstFetchDurationMilliseconds AS DECIMAL(7,1)) AS FirstFetchDurationMilliseconds,
CAST(StackTraceSnippet AS VARCHAR(300)),
CASE WHEN LEFT(CommandString, 7) = 'DECLARE'
THEN CAST(SUBSTRING(CommandString, LEN(CommandString) - CHARINDEX(CHAR(10), REVERSE(CommandString))+2, 100) AS VARCHAR(200))
ELSE CAST(CommandString AS VARCHAR(200))
END AS CommandText,
-- take a deep breath. This next one isn't pretty
CASE WHEN LEFT(CommandString, 7) = 'DECLARE'
THEN SUBSTRING(CommandString,
CHARINDEX('@', CommandString), -- Index of first argument
LEN(CommandString) -- take the whole length of the string...
- CHARINDEX(';', REVERSE(CommandString)) -- Less the number of characters at the end of the string until the terminating character ; ..
-- - CHARINDEX('@', CommandString) -- Less the number of characters at the beginning of the string until the first character @ ..
+ 2 -- include the number of stripped characters ; and @ used in the CHARINDEX above
)
ELSE NULL
END AS Parameters
FROM
@Jsontable

FETCH NEXT FROM Json_Cursor INTO @TimingId, @MiniProfilerId, @ParentTimingId, @CustomTimingsJson

END -- WHILE FETCH

Record TotalTimeMs
25 3
50 4
75 5
100 7
125 8
150 14
175 12
200 12
225 12
250 14
275 16
300 16
325 19
350 48
375 22
400 23
425 25
450 25
475 27
500 28
525 30
550 99
575 47
600 47
625 35
650 38
675 38
700 91
725 150
750 41
775 65
800 53
825 57
850 48
875 49
900 48
925 72
950 51
975 163
1000 54
1025 59
1050 160
1075 653
1100 1453
1125 52
1150 52
1175 51
1200 50
1225 53
1250 54
1275 388
1300 59
1325 57
1350 60
1375 78
1400 94
1425 63
1450 67
1475 71
1500 70
1525 75
1550 88
1575 75
1600 77
1625 78
1650 250
1675 83
1700 84
1725 502
1750 200
1775 89
1800 4432
1825 326
1850 92
1875 1329
1900 128
1925 103
1950 319
1975 122
2000 138

There are some significant outliers in this data, but overall, the insert performance grows rather linearly which is undesirable.

gbritton
Master Smack Fu Yak Hacker

2780 Posts

Posted - 2014-05-06 : 09:16:10
"the insert performance grows rather linearly which is undesirable"

Undesirable? Really? Sounds good to me! Are you looking for O(log n) (or better) insert performance? I could be wrong but I thinking anything less than O(n) is impossible.

OTOH, you should get rid of the cursor-base approach. Using set-based logic is always faster (sometimes orders of magnitude better).

I can't recall if anyone has a ready-to-go set-based JSON parser, but here's some links to get you started:

http://www.sqlservercentral.com/articles/Tally+Table/72993/

http://www.sqlservercentral.com/articles/String+Manipulation/94365/

Go to Top of Page

jeremyjsimmons
Starting Member

9 Posts

Posted - 2014-05-09 : 11:31:48
Perhaps it's my inexperience with the product with this scale of workload, but I do not understand the behavior.

Records 1-50 insert in 3-4 ms each, for a total of 150-200ms
As I keep inserting records, the time for each insert increases.
Record 1525 took 75ms, not 3-4 as it did near the beginning of the transaction.

I'm looking for consistent insert performance, not a linear growth in the amount of time per record.

If you're able to suggest a set-based JSON parser, I'm all ears, but I don't believe one exists.
Go to Top of Page

gbritton
Master Smack Fu Yak Hacker

2780 Posts

Posted - 2014-05-09 : 11:41:06
Aha! You are not experiencing linear growth in the size of the input (that's what I understood from your original post). You're actually seeing exponential growth in the size of the input (O(n^n)) which is of course undesirable!

I believe that we could develop a set-based parser using the examples I cited. Can you post some sample data and desired output?
Go to Top of Page

jeremyjsimmons
Starting Member

9 Posts

Posted - 2014-05-09 : 13:01:33
It would be more correct to say there is growth in the time to insert, without a correlation to the size of the input (json being parsed, and rows produced by it)

The Result set should include the columns from the objects in the array named SQL (there is only ever one array)
Id UNIQUEIDENTIFIER NOT NULL,
CommandString VARCHAR(MAX) NOT NULL,
ExecuteType VARCHAR(50) NOT NULL,
StackTraceSnippet VARCHAR(2000) NOT NULL,
StartMilliseconds DECIMAL(7,1) NULL,
DurationMilliseconds DECIMAL(7,1) NULL,
FirstFetchDurationMilliseconds DECIMAL(7,1) NULL

Here's an example input:

{"sql":[{"Id":"a312b4a9-461c-45b2-bc6b-9c61fb13ab18","CommandString":"DECLARE @SessionId nvarchar(24) = N\u0027p0utee4f4wa04m14ue1p4s13\u0027;\r\n\r\nprGetUserIdFromClientLogin","ExecuteType":"Scalar","StackTraceSnippet":"GetSPField GetUserIdForSession InitializeUserAuth Page_Load ProcessRequest","StartMilliseconds":1303.5,"DurationMilliseconds":115.1,"FirstFetchDurationMilliseconds":0},{"Id":"25099cfd-45df-42da-95df-52c0c74f2e66","CommandString":"DECLARE @UserID bigint = 69512;\r\n\r\nprSessionState","ExecuteType":"Reader","StackTraceSnippet":"GetSPDataSet GetSPDataTable GetSession GetUserSessionState InitializeUserAuth Page_Load ProcessRequest","StartMilliseconds":1462.2,"DurationMilliseconds":145.6,"FirstFetchDurationMilliseconds":7.9},{"Id":"ae5beb87-2ca4-4ece-9548-7335819c5060","CommandString":"DECLARE @ClientID bigint = 69512;\r\n\r\nprClientPermissions","ExecuteType":"Reader","StackTraceSnippet":"GetSPDataSet GetSPDataTable getClientPermissions InitializeUserAuth Page_Load ProcessRequest","StartMilliseconds":1618.3,"DurationMilliseconds":1.1,"FirstFetchDurationMilliseconds":0.8}]}
Go to Top of Page

TG
Master Smack Fu Yak Hacker

6065 Posts

Posted - 2014-05-09 : 14:43:39
It's not clear to me that the insert to sql server table is where the time/row increase is happening.
Part of my confusion: Are the result times you posted for every 25 rows related to the start and duration milliseconds that you are inserting?
Is it possible to comment out the actual insert to your final table and still measure the times for which you are reporting in those results? Perhaps the CLR execution time is responsible for the growth or maybe even the re-use of the table variable.

Be One with the Optimizer
TG
Go to Top of Page

gbritton
Master Smack Fu Yak Hacker

2780 Posts

Posted - 2014-05-09 : 14:45:13
I got something to work using CROSS APPLY and those functions I mentioned to parse the input then PIVOT to produce the output. Have a look:


declare @json varchar(4000) = '
{"sql":[{"Id":"a312b4a9-461c-45b2-bc6b-9c61fb13ab18","CommandString":"DECLARE @SessionId nvarchar(24) = N\u0027p0utee4f4wa04m14ue1p4s13\u0027;\r\n\r\nprGetUserIdFromClientLogin","ExecuteType":"Scalar","StackTraceSnippet":"GetSPField GetUserIdForSession InitializeUserAuth Page_Load ProcessRequest","StartMilliseconds":1303.5,"DurationMilliseconds":115.1,"FirstFetchDurationMilliseconds":0},{"Id":"25099cfd-45df-42da-95df-52c0c74f2e66","CommandString":"DECLARE @UserID bigint = 69512;\r\n\r\nprSessionState","ExecuteType":"Reader","StackTraceSnippet":"GetSPDataSet GetSPDataTable GetSession GetUserSessionState InitializeUserAuth Page_Load ProcessRequest","StartMilliseconds":1462.2,"DurationMilliseconds":145.6,"FirstFetchDurationMilliseconds":7.9},{"Id":"ae5beb87-2ca4-4ece-9548-7335819c5060","CommandString":"DECLARE @ClientID bigint = 69512;\r\n\r\nprClientPermissions","ExecuteType":"Reader","StackTraceSnippet":"GetSPDataSet GetSPDataTable getClientPermissions InitializeUserAuth Page_Load ProcessRequest","StartMilliseconds":1618.3,"DurationMilliseconds":1.1,"FirstFetchDurationMilliseconds":0.8}]}
'

;with parsed as (
select e.*
from (values(@json)) v(j)

-- Find the start of the 'sql' array
cross apply (
select substring(v.j, charindex('"sql":[', v.j) + 7,len(v.j))
) a(sql)

-- Extract the array, discarding the rest
cross apply (
select substring(a.sql, 1, charindex('}]',a.sql))
) b(sql)

-- Split the result up by starting braces
cross apply (
select *, rn= row_number() over(order by d.itemnumber)
from dbo.DelimitedSplitN4K(b.sql, '{') d
where d.item <> ''
) c
-- Further split by ending braces (could have used substring/charinded here)
cross apply (
select c.rn, d.*
from dbo.DelimitedSplitN4K(c.Item, '}') d
where d.item not in ('',',')
) d

-- Select just the bits we want, discarding delimiters
cross apply (
select d.rn, p.ItemNumber, p.Item
from dbo.PatternSplitCM(d.Item, '%[^":]%') p
where p.Matched = 1 and p.Item not in ('',',')
) e
)

-- Build the output
select rn
, Id
, ExecuteType
, CommandString
, StackTraceSnippet
, StartMilliseconds
, DurationMilliseconds
, FirstFetchDurationMilliseconds

-- Source is the parsed output from above with a self-join based on row and item number
from (select p1.rn, p1.item p1item, p2.item p2item
from parsed p1
join parsed p2
on p2.rn = p1.rn
and p2.ItemNumber = p1.ItemNumber+2) src

-- pivot on the items we want
pivot (max(p2item)
for p1item in ( Id
, CommandString
, ExecuteType
, StackTraceSnippet
, StartMilliseconds
, DurationMilliseconds
, FirstFetchDurationMilliseconds)) pvt

Go to Top of Page

gbritton
Master Smack Fu Yak Hacker

2780 Posts

Posted - 2014-05-09 : 15:32:01
Simpler version:


;with
parsed as (
select p1.rn, p2.ItemNumber, p2.item

-- Split into array members
from (
select p1.*, rn=row_number() over(order by (select 1))
from dbo.PatternSplitCM(@json, '[^{}]') p1
) p1

-- split array members into key/value pairs
cross apply (
select * from dbo.PatternSplitCM(p1.item, '[^{}:",[]')
WHERE Matched = 1 and item not in ('',',')
) p2
)

-- Build the output
select rn
, Id
, ExecuteType
, CommandString
, StackTraceSnippet
, StartMilliseconds
, DurationMilliseconds
, FirstFetchDurationMilliseconds

-- Source is the parsed output from above with a self-join based on row and item number
from (select p1.rn, p1.item p1item, p2.item p2item
from parsed p1
join parsed p2
on p2.rn = p1.rn
and p2.ItemNumber = p1.ItemNumber+2
) src

-- pivot on the items we want
pivot (max(p2item)
for p1item in ( Id
, CommandString
, ExecuteType
, StackTraceSnippet
, StartMilliseconds
, DurationMilliseconds
, FirstFetchDurationMilliseconds)) pvt
Go to Top of Page

jeremyjsimmons
Starting Member

9 Posts

Posted - 2014-05-12 : 12:51:38
@TG - The timing I was quoting was wrapped around the INSERT INTO MiniProfilerSqlTimings FROM @JsonTimings.

I've changed some things, and got the performance back to a profile I'm happy with. Essentially O(n) for each row.

1. the cursor was using a WHERE ID NOT IN (SELECT ID FROM FinalTable)
changed this to WHERE ID NOT IN (SELECT ID FROM FinalTable WITH (NOLOCK))

2. Changed use of table variables.
Declared the table variables outside the cursor.
Each loop through the cursor, I'm inserting into the final table, and then clearing the rows in the table variable.

3. Insert into final table is done using WITH (TABLOCK) hint.
Go to Top of Page

jeremyjsimmons
Starting Member

9 Posts

Posted - 2014-05-12 : 13:06:54
@gbritton

Cool function & technique. I don't quite understand how I can apply this to the entire table at once.
It seems to be you're still using a variable @json, and I would have to use this in the cursor?

Is there a way to use this technique in the context of the source and destination table?

Also, I tried it on a larger case, and only returned 20 out of the 127 rows

Thanks.
Go to Top of Page

gbritton
Master Smack Fu Yak Hacker

2780 Posts

Posted - 2014-05-12 : 13:19:27
Get rid of that cursor! I don't think you need it. Is [jqsql].[totable] in the CLR you're using? If not, can you post the source? If you can set it up to call as a function instead of a proc, it should make life easier.
Go to Top of Page

TG
Master Smack Fu Yak Hacker

6065 Posts

Posted - 2014-05-12 : 13:42:12
I'm glad your changes resolved the execution time growth problem. If you can't get the non-cursor version working - and I agree with gbritton that anytime you can do a set based approach you should - then you may consider this change as well:
switch to a #temp table and don't delete between the iterations and don't insert to the final table in the iterations. Collect all the data then do one insert to your final table after the looping is done. Play with NOT IN vs. LEFT OUTER JOIN WHERE ID is NULL.

Be One with the Optimizer
TG
Go to Top of Page

gbritton
Master Smack Fu Yak Hacker

2780 Posts

Posted - 2014-05-13 : 07:52:18
quote:
Originally posted by jeremyjsimmons

@gbritton

Cool function & technique. I don't quite understand how I can apply this to the entire table at once.
It seems to be you're still using a variable @json, and I would have to use this in the cursor?

Is there a way to use this technique in the context of the source and destination table?

Also, I tried it on a larger case, and only returned 20 out of the 127 rows

Thanks.



Can you post the sample data that failed? My code only addressed the specific example you posted. It probably needs some extras to handle a full data set. Also, can you post the code to the proc you call in the middle?
Go to Top of Page

jeremyjsimmons
Starting Member

9 Posts

Posted - 2014-05-13 : 16:41:59
@gbritton. I don't see how I can avoid the use of the cursor, [dbo].[jqsql.totable] is a CLR Stored Procedure. I'm using a modified version of jqsql.com. I can post to github if you're interested.

I was able to make your simplified version work using CROSS APPLY.
I believe there are still some edge cases to work out, but otherwise it's an interesting option.
I appreciate your offer to look at the full data, but I'm going to decline from posting it.



Go to Top of Page

jeremyjsimmons
Starting Member

9 Posts

Posted - 2014-05-13 : 18:05:09
@TG - thank you for the additional suggestions. Why is the single insert from the #temp table desirable over the insert from the @table every iteration through the loop.

The NOT IN vs LEFT OUTER JOIN WHERE ID IS NULL vs NOT EXISTS (SELECT 1 FROM Destination WHERE A.ID = Destination.ID) all have very similar query plans. Why would I choose one over the other?
Go to Top of Page

gbritton
Master Smack Fu Yak Hacker

2780 Posts

Posted - 2014-05-14 : 08:47:22
quote:
Originally posted by jeremyjsimmons

@gbritton. I don't see how I can avoid the use of the cursor, [dbo].[jqsql.totable] is a CLR Stored Procedure. I'm using a modified version of jqsql.com. I can post to github if you're interested.

I was able to make your simplified version work using CROSS APPLY.
I believe there are still some edge cases to work out, but otherwise it's an interesting option.
I appreciate your offer to look at the full data, but I'm going to decline from posting it.



Since you modified jsql.com to make your CLR, can you make it so it can be called as a table-valued function instead of a stored procedure? If so, you can eliminate the cursor, since the function will just return another table.
Go to Top of Page

TG
Master Smack Fu Yak Hacker

6065 Posts

Posted - 2014-05-14 : 13:50:18
quote:
Originally posted by jeremyjsimmons

@TG - thank you for the additional suggestions. Why is the single insert from the #temp table desirable over the insert from the @table every iteration through the loop.

The NOT IN vs LEFT OUTER JOIN WHERE ID IS NULL vs NOT EXISTS (SELECT 1 FROM Destination WHERE A.ID = Destination.ID) all have very similar query plans. Why would I choose one over the other?



well, those were somewhat intuitive for me - I don't know if they would be improvements or not. But here was my thinking:

the single insert allows you to avoid deleting/truncating your staging table during each iteration - that could help somewhat. It also limits your access to the live, public table to a single point in time rather than continuously throughout the process. That may be less contentious with other processes and should require less locking management for sql - ie acquire and release locks once rather than a bunch of times.

The @table vs #table sort of goes along with the other suggestion. When dealing with larger volumes of data I try not to use table variables.

left outer join vs not in:
you're right, ideally sql server will resolve both options to a very similar if not identical plan. but my habit has been to use the left outer join where null as a default. That technique is easier for me personally to read and, anecdotally speaking, seems to get a bad plan less often.

Be One with the Optimizer
TG
Go to Top of Page

jeremyjsimmons
Starting Member

9 Posts

Posted - 2014-05-15 : 13:06:26
@gbritton - I will attempt to change the calling style of the jqsql procedure to a table-valued function instead of a stored procedure and let you know the outcome.
Go to Top of Page

jeremyjsimmons
Starting Member

9 Posts

Posted - 2014-05-15 : 13:09:20
@TG - thank you for appealing to the specifics. I agree on all points: 1. eliminating the writes to the staging table 2. limiting access to the public table to avoid lock acquisition.
I did compare the query plans for the three options and I could not tell where there was a significant difference between them.
Thank you for suggesting the additional options of LEFT JOIN...WHERE NOT NULL.

I'm trying to further my sql knowledge this year. Can you recommend any great books?
Go to Top of Page
   

- Advertisement -