Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
sacang
Starting Member
2 Posts |
Posted - 2012-08-03 : 18:58:58
|
| I am looking for an efficient way to replace multiple substrings in a column. The substrings are stored in a separate table of approximately 100 rows. The table containg the column with the data to replace is approximately 12,000 rows. These sizes will increase in time.After some research, I have gotten part way with the sample code below. Two immediate problems are:1. The code only replaces the first occurence of the substring in each column.2. The order of replacement must be based on the length of the substring in a decending order.The first problem can be solved with a While loop. Although this seems to be inefficient.I have not solved the second problem.Following is the code:Use tempdbGoIf OBJECT_ID('OrigData', 'U') Is Not Null Drop Table OrigData;Create Table OrigData( OrigStr nvarchar(50));If OBJECT_ID('Repl', 'U') Is Not Null Drop Table Repl;Create Table Repl( ReplStr nvarchar(50));Insert Into Repl Values ('A'), ('AB'), ('A BC')Insert Into OrigData Values ('abcd AB A'), ('A BC D');With C As (select OrigStr, ReplStr, Replace(' ' + O.OrigStr + ' ', ' ' + R.ReplStr + ' ', ' * ') as ModName From OrigData O Cross Apply Repl R where charindex(' ' + R.ReplStr + ' ', ' ' + O.OrigStr + ' ') > 0)Update C Set OrigStr = ModName Output inserted.OrigStr;Some notes:1. The spaces were added to the Replace function to assure only whole words were replaced.2. The result set looks likke this with my comments: abcd AB * -- 'AB' Should also be replaced * BC D -- 'A BC' should be replaced instead of 'A'Would appreciate any gudidance.Thank you |
|
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
|
|
sacang
Starting Member
2 Posts |
Posted - 2012-08-04 : 15:58:28
|
| thank you visakh16. It seems that the behavior in your referenced link is contingent on the way indexes are handled. Not sure if that can be relied on?I was able to solve the problem with the ordering of replacements with the following code. The solution was to use select top 1 and order by. If I use the select top 100% more than one row is returned and order is not guranteed. By using top 1 the behavior is as expected.Now need to find a way to handle multiple replacements in the same string without the performance cost of a while loop.any further ideas?thanksUse tempdbGoIf OBJECT_ID('OrigData', 'U') Is Not Null Drop Table OrigData;Create Table OrigData( OrigStr nvarchar(50));If OBJECT_ID('Repl', 'U') Is Not Null Drop Table Repl;Create Table Repl( ReplStr nvarchar(50));Insert Into Repl Values ('A'), ('AB'), ('A BC')Insert Into OrigData Values ('abcd AB A'), ('A BC D');Drop Function SelectReplStr;goCREATE FUNCTION SelectReplStr ( @orig nvarchar(50) ) RETURNS Table AS Return Select Top 1 ReplStr From Repl where charindex(' ' + ReplStr + ' ', ' ' + @orig + ' ') > 0 Order by Len(ReplStr) Desc;goWith C As (select OrigStr, ReplStr, Replace(' ' + O.OrigStr + ' ', ' ' + R.ReplStr + ' ', ' * ') as ModName From OrigData O Cross Apply SelectReplStr(OrigStr) as R where charindex(' ' + R.ReplStr + ' ', ' ' + O.OrigStr + ' ') > 0)Update C Set OrigStr = ModName Output inserted.OrigStr; |
 |
|
|
|
|
|
|
|