Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 Help Replacing Multiple Substrings

Author  Topic 

sacang
Starting Member

2 Posts

Posted - 2012-08-03 : 18:58:58
I am looking for an efficient way to replace multiple substrings in a column. The substrings are stored in a separate table of approximately 100 rows. The table containg the column with the data to replace is approximately 12,000 rows. These sizes will increase in time.

After some research, I have gotten part way with the sample code below. Two immediate problems are:

1. The code only replaces the first occurence of the substring in each column.

2. The order of replacement must be based on the length of the substring in a decending order.

The first problem can be solved with a While loop. Although this seems to be inefficient.

I have not solved the second problem.

Following is the code:

Use tempdb
Go


If OBJECT_ID('OrigData', 'U') Is Not Null Drop Table OrigData;
Create Table OrigData(
OrigStr nvarchar(50)
);

If OBJECT_ID('Repl', 'U') Is Not Null Drop Table Repl;
Create Table Repl(
ReplStr nvarchar(50)
);

Insert Into Repl
Values ('A'), ('AB'), ('A BC')

Insert Into OrigData
Values ('abcd AB A'),
('A BC D');

With C As (
select OrigStr, ReplStr,
Replace(' ' + O.OrigStr + ' ', ' ' + R.ReplStr + ' ', ' * ') as ModName
From OrigData O
Cross
Apply Repl R
where charindex(' ' + R.ReplStr + ' ', ' ' + O.OrigStr + ' ') > 0
)
Update C
Set OrigStr = ModName
Output inserted.OrigStr;


Some notes:
1. The spaces were added to the Replace function to assure only whole words were replaced.

2. The result set looks likke this with my comments:

abcd AB * -- 'AB' Should also be replaced
* BC D -- 'A BC' should be replaced instead of 'A'

Would appreciate any gudidance.
Thank you

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2012-08-03 : 22:35:54
i think you can apply a methodolgy like below

http://visakhm.blogspot.com/2010/03/using-quirky-updates-to-develop-well.html

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

sacang
Starting Member

2 Posts

Posted - 2012-08-04 : 15:58:28
thank you visakh16. It seems that the behavior in your referenced link is contingent on the way indexes are handled. Not sure if that can be relied on?

I was able to solve the problem with the ordering of replacements with the following code. The solution was to use select top 1 and order by. If I use the select top 100% more than one row is returned and order is not guranteed. By using top 1 the behavior is as expected.

Now need to find a way to handle multiple replacements in the same string without the performance cost of a while loop.

any further ideas?
thanks


Use tempdb
Go


If OBJECT_ID('OrigData', 'U') Is Not Null Drop Table OrigData;
Create Table OrigData(
OrigStr nvarchar(50)
);

If OBJECT_ID('Repl', 'U') Is Not Null Drop Table Repl;
Create Table Repl(
ReplStr nvarchar(50)
);

Insert Into Repl
Values ('A'), ('AB'), ('A BC')

Insert Into OrigData
Values ('abcd AB A'),
('A BC D');

Drop Function SelectReplStr;
go

CREATE FUNCTION SelectReplStr
( @orig nvarchar(50) ) RETURNS Table AS
Return
Select Top 1 ReplStr From Repl
where charindex(' ' + ReplStr + ' ', ' ' + @orig + ' ') > 0
Order by Len(ReplStr) Desc;
go

With C As (
select OrigStr, ReplStr,
Replace(' ' + O.OrigStr + ' ', ' ' + R.ReplStr + ' ', ' * ') as ModName
From OrigData O
Cross Apply SelectReplStr(OrigStr) as R
where charindex(' ' + R.ReplStr + ' ', ' ' + O.OrigStr + ' ') > 0
)
Update C
Set OrigStr = ModName
Output inserted.OrigStr;
Go to Top of Page
   

- Advertisement -