Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 SQL Server Development (2000)
 Splitting string of integers

Author  Topic 

sql_er
Constraint Violating Yak Guru

267 Posts

Posted - 2010-07-20 : 12:21:26
Hi,

We have been using a function for splitting strings for a long while already. The function was working fine for our purpose [when the string length was 50 chars at most].

Recently, however, there was a need to be able to pass a lot more items into the function, which introduced 2 questions:

1. How can we pass more than 8000 chars into the function [i.e. what needs to be changed to accommodate for this]?

2. What would be the performance implications?

I did some searches on this forum and found out that there were many discussions in the past on the subject, but was not able to find a conclusive entry which would say which is the best implementation - performance wise, simultaneously allowing an unlimited string input.

Below is the function we are using now:

CREATE Function fn_splitter (@IDs VARCHAR(8000))
Returns @Tbl_IDs Table (ID Int) As
Begin
-- Append comma
Set @IDs = @IDs + '~'
-- Indexes to keep the position of searching
Declare @Pos1 Int
Declare @pos2 Int

-- Start from first character
Set @Pos1=1
Set @Pos2=1

While @Pos1<Len(@IDs)
Begin
Set @Pos1 = CharIndex('~',@IDs,@Pos1)
Insert @Tbl_IDs Select Cast(Substring(@IDs,@Pos2,@Pos1-@Pos2) As Int)
-- Go to next non comma character
Set @Pos2=@Pos1+1
-- Search from the next charcater
Set @Pos1 = @Pos1+1
End
Return
End

I'd appreciate any feedback on improvements or pointing to the script which would solve 2 of my concerns.

Thank you!

robvolk
Most Valuable Yak

15732 Posts

Posted - 2010-07-20 : 12:28:12
Here's another technique that may help:

http://www.sqlteam.com/article/parsing-csv-values-into-multiple-rows
Go to Top of Page

sql_er
Constraint Violating Yak Guru

267 Posts

Posted - 2010-07-20 : 13:52:30
RobVolk - thank you for the link.

Would you know if this is the best known solution to my problem [i.e. best performance, and allows unlimited string input] or just one of the solutions.

Thanks!
Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2010-07-20 : 14:16:04
Adam Machanic preaches CLR for splitting strings (and other string operations):

http://sqlblog.com/blogs/adam_machanic/archive/2009/04/28/sqlclr-string-splitting-part-2-even-faster-even-more-scalable.aspx

As they say in Spaceballs, PREPARE FOR...LUDICROUS SPEED! If CLR is not an option for you, Aaron Bertrand has a comparison of several techniques:

http://sqlblog.com/blogs/aaron_bertrand/archive/2010/07/07/splitting-a-list-of-integers-another-roundup.aspx
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2010-07-20 : 14:38:48
I believe OP is running SQL Server 2000.



N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2010-07-20 : 14:46:12
quote:
Originally posted by Peso

I believe OP is running SQL Server 2000.
Then the Inline 2 option in Aaron's blog post is probably the best solution. Best thing to do is test as many as possible and see which works best.
Go to Top of Page

sql_er
Constraint Violating Yak Guru

267 Posts

Posted - 2010-07-20 : 14:46:45
Robvolk - thank you for very interesting links. This is the 1st time i've been introduced to CLR concept. Is this only applicable to MSSQL 2005 and beyond and would thus not work for MSSQL 2000 system we have?

Also, for the 2nd link, all the examples take VACHAR input, which I believe would be still limited to 8000, while I'm specifically looking for a solution with no such limit.

Thanks a lot!
Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2010-07-20 : 15:08:48
I just tried modifying that code to use a text variable instead of varchar(max) but it doesn't like it. You can try modifying it to use multiple varchar(8000) variables up to some reasonable amount (seriously, you're not going to parse 2 GB worth of CSV, even 2 MB is excessive). It would probably work better as a stored procedure rather than a user-defined function.
Go to Top of Page

sql_er
Constraint Violating Yak Guru

267 Posts

Posted - 2010-07-20 : 16:09:24
Hmm ... I wonder if there is a more elegant way to handle this in MSSQL 2000.

In at least one of my cases I don't care about performance as much as about the ability to have the string length not be constrained to 8000 chars.

I plan to look into an XML solution, unless there are other advices.


Thank you!
Go to Top of Page

sql_er
Constraint Violating Yak Guru

267 Posts

Posted - 2010-07-22 : 14:43:04
Robvolk,

I actually thought more about your suggestion, and implemented it. I basically have a new function which can take TEXT as an input, and break it up into 8000 or less [depending on delimiter position] chuncks, and pass it to the original function that has 8000 char limit. I will show both functions below in case someone has the same need:


CREATE FUNCTION FN_TextSplitter(@InputText TEXT) RETURNS @Result TABLE (Number INT)
AS
BEGIN
/*
This function is an extension of fnsplitter function. While fnsplitter has an 8000 char limit
this function does not. Instead, it takes a text input, and then breaks it up into ~ 8000 char chunks
passing it along to fnsplitter function and accumulating results. It does the break intelligently
to avoid breaking in the middle of token, breaking instead at the last delimiter it finds in the 8000 char chunk of text

If the initial input is already less than 8000 characters, the function just defaults to using fnsplitter original function
*/
DECLARE @Delimiter CHAR(3)
DECLARE @StartIndex INT
DECLARE @VarcharLimit INT
DECLARE @IndexOfLastDelimiter INT
DECLARE @CurrentTextChunk VARCHAR(8000)

SET @Delimiter = '%~%'
SET @StartIndex = 1
SET @VarcharLimit = 8000
SET @CurrentTextChunk = SUBSTRING(@InputText, @StartIndex, @VarcharLimit) -- Initially set to 8000 chars
SET @IndexOfLastDelimiter = LEN(@CurrentTextChunk) - PATINDEX(@Delimiter, REVERSE(@CurrentTextChunk)) + 1
SET @CurrentTextChunk = SUBSTRING(@InputText, @StartIndex, @IndexOfLastDelimiter - 1) -- Reset to correct chunk by finding last delimiter

-- Keep looping while there is still over 8000 chars left to look at
WHILE ((DATALENGTH(@InputText) - @StartIndex) > @VarcharLimit)
BEGIN
INSERT INTO @Result(Number)
SELECT ID
FROM Fnsplitter(@CurrentTextChunk)

SET @StartIndex = @StartIndex + @IndexOfLastDelimiter
SET @CurrentTextChunk = SUBSTRING(@InputText, @StartIndex, @VarcharLimit)
SET @IndexOfLastDelimiter = LEN(@CurrentTextChunk) - PATINDEX(@Delimiter, REVERSE(@CurrentTextChunk)) + 1
SET @CurrentTextChunk = SUBSTRING(@InputText, @StartIndex, @IndexOfLastDelimiter - 1)

END

-- Original input was less than 8000 chars, so just reset @CurrentTextChunk and use the final case below
IF DATALENGTH(@InputText) < @VarcharLimit
BEGIN
SET @CurrentTextChunk = @InputText
END

-- Take care of last chunk case [or case where original input was already < 8000 chars]
INSERT INTO @Result(Number)
SELECT ID
FROM Fnsplitter(@CurrentTextChunk)

RETURN
END




/****** Object: User Defined Function dbo.fnSplitter Script Date: 3/3/2010 2:49:50 PM ******/




CREATE Function fnSplitter (@IDs Varchar(8000) )
Returns @Tbl_IDs Table (ID Int) As

Begin
-- Append comma
Set @IDs = @IDs + '~'
-- Indexes to keep the position of searching
Declare @Pos1 Int
Declare @pos2 Int

-- Start from first character
Set @Pos1=1
Set @Pos2=1

While @Pos1<Len(@IDs)
Begin
Set @Pos1 = CharIndex('~',@IDs,@Pos1)
Insert @Tbl_IDs Select Cast(Substring(@IDs,@Pos2,@Pos1-@Pos2) As Int)
-- Go to next non comma character
Set @Pos2=@Pos1+1
-- Search from the next charcater
Set @Pos1 = @Pos1+1
End
Return
End


Thanks for the guidance!






Go to Top of Page
   

- Advertisement -