Author |
Topic |
NickC
Yak Posting Veteran
68 Posts |
Posted - 2010-08-09 : 07:29:05
|
Anyone ever used pattern matching as part of data cleansing? I have a table with addresses in, and need to clean the data, so far I've removed white spaces and through the help of someone on SQL TEAM was able to capitalise the first letter of every word but now I want to move incorrect characters such as #@%$ etc and someone said a good pattern matching code would do that?Cheers for help, Nick |
|
vaibhavktiwari83
Aged Yak Warrior
843 Posts |
Posted - 2010-08-09 : 08:12:53
|
Please show some sample data and expected output to help you for the same...Vaibhav TTo walk FAST walk ALONE To walk FAR walk TOGETHER |
 |
|
NickC
Yak Posting Veteran
68 Posts |
Posted - 2010-08-10 : 05:43:46
|
Current Data ExampleAddress 1 Address 2 Address 3 Postcode128 A&mbleside Dr1ve Southend-on Sea Essex SS2 4DN44A T%HE R$£IDGEWAY Chalkwell E"SSEX SS1 2BU150 W3STBURY ROAD Southend=on=Sea Essex SS2 4BUAfter Pattern Matching I'd like output to beAddress 1 Address 2 Address 3 Postcode128 Ambleside Drive Southend on Sea Essex SS2 4DN44A THE RIDGEWAY Chalkwell ESSEX SS1 2BU150 WESTBURY ROAD Southend on Sea Essex SS2 4BUNot even sure this is possible, without just doing basic find and replace ... but someone said something about algorithms which would check for a whole range of errors based on "pattern matching"... even a basic for dummies example of pattern matching woudl be nice, even if it can't help |
 |
|
vaibhavktiwari83
Aged Yak Warrior
843 Posts |
Posted - 2010-08-10 : 06:35:32
|
If you want only alfabets and Numbers then its possible is it like that?Vaibhav TTo walk FAST walk ALONE To walk FAR walk TOGETHER |
 |
|
Sachin.Nand
2937 Posts |
Posted - 2010-08-10 : 07:10:20
|
[code]declare @x varchar(max); SET @x = '44A T%HE R$£IDGEWAY Chalkwell E"SSEX SS1 2BU'declare @numbers varchar(max)=''select @numbers=@numbers + '' + number from (select convert(varchar(2),substring(@x,number,1))as number from master.dbo.spt_values where type='p' and number between 1 and LEN(@x))t where number between '0' and '9' or number between 'a' and 'z'select @numbers as numbers[/code]Limitations live only in our minds. But if we use our imaginations, our possibilities become limitless. PBUH |
 |
|
NickC
Yak Posting Veteran
68 Posts |
Posted - 2010-08-10 : 07:37:54
|
Is this for a single record? theres about 600,000 records i need to do this for? sorry idera I'm a bit of a newbie so some guidance would be cool...incidently how did you learn SQL? I'm sort of starting in my role, and knew basic functions but all these t-sql variables etc confuse me, do you have an online resource? Loving the quote btw! Kind RegardsNick |
 |
|
Sachin.Nand
2937 Posts |
Posted - 2010-08-10 : 07:42:06
|
You need to create an user defined function that accepts each row value from the table & returns the desired value in your case without incorrect characters for each row.For a start have a look herehttp://www.sqlteam.com/article/user-defined-functionsLimitations live only in our minds. But if we use our imaginations, our possibilities become limitless. PBUH |
 |
|
ms65g
Constraint Violating Yak Guru
497 Posts |
Posted - 2010-08-10 : 08:03:22
|
No need scalar UDF. Just set-based approach. follow me: SELECT D.i FROM table_name CROSS APPLY (SELECT i + '' FROM (SELECT SUBSTRING(i, n, 1), n FROM Numbs /* Tally Tables */--you need to create a number table first. WHERE n <= LEN(i) )D(i, n) WHERE i LIKE '[0-9]' OR i LIKE '[a-z]' ORDER BY n ASC FOR XML PATH(''))D(i); ______________________ |
 |
|
NickC
Yak Posting Veteran
68 Posts |
Posted - 2010-08-10 : 12:27:17
|
number table? sorry I really am dumb! .... In my mind I'm thinking a table with just numbers in it? |
 |
|
ms65g
Constraint Violating Yak Guru
497 Posts |
Posted - 2010-08-10 : 13:28:13
|
Just execute this for creating and publishing the number table:;SELECT TOP (8000) n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))INTO numbersFROM sys.columns a, sys.columns, (SELECT 1 UNION SELECT 2 UNION SELECT 3) D(i) ______________________ |
 |
|
|