Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 Transact-SQL (2005)
 Pattern matching

Author  Topic 

NickC
Yak Posting Veteran

68 Posts

Posted - 2010-08-09 : 07:29:05
Anyone ever used pattern matching as part of data cleansing? I have a table with addresses in, and need to clean the data, so far I've removed white spaces and through the help of someone on SQL TEAM was able to capitalise the first letter of every word but now I want to move incorrect characters such as #@%$ etc and someone said a good pattern matching code would do that?

Cheers for help,

Nick

vaibhavktiwari83
Aged Yak Warrior

843 Posts

Posted - 2010-08-09 : 08:12:53
Please show some sample data and expected output to help you for the same...

Vaibhav T

To walk FAST walk ALONE
To walk FAR walk TOGETHER
Go to Top of Page

NickC
Yak Posting Veteran

68 Posts

Posted - 2010-08-10 : 05:43:46
Current Data Example

Address 1 Address 2 Address 3 Postcode
128 A&mbleside Dr1ve Southend-on Sea Essex SS2 4DN
44A T%HE R$£IDGEWAY Chalkwell E"SSEX SS1 2BU
150 W3STBURY ROAD Southend=on=Sea Essex SS2 4BU


After Pattern Matching I'd like output to be

Address 1 Address 2 Address 3 Postcode
128 Ambleside Drive Southend on Sea Essex SS2 4DN
44A THE RIDGEWAY Chalkwell ESSEX SS1 2BU
150 WESTBURY ROAD Southend on Sea Essex SS2 4BU

Not even sure this is possible, without just doing basic find and replace ... but someone said something about algorithms which would check for a whole range of errors based on "pattern matching"... even a basic for dummies example of pattern matching woudl be nice, even if it can't help
Go to Top of Page

vaibhavktiwari83
Aged Yak Warrior

843 Posts

Posted - 2010-08-10 : 06:35:32
If you want only alfabets and Numbers then its possible is it like that?

Vaibhav T

To walk FAST walk ALONE
To walk FAR walk TOGETHER
Go to Top of Page

Sachin.Nand

2937 Posts

Posted - 2010-08-10 : 07:10:20
[code]
declare @x varchar(max); SET @x = '44A T%HE R$£IDGEWAY Chalkwell E"SSEX SS1 2BU'
declare @numbers varchar(max)=''

select @numbers=@numbers + '' + number from
(
select
convert(varchar(2),substring(@x,number,1))as number
from master.dbo.spt_values where type='p'
and number between 1 and LEN(@x)
)t where number between '0' and '9' or number between 'a' and 'z'

select @numbers as numbers

[/code]


Limitations live only in our minds. But if we use our imaginations, our possibilities become limitless.

PBUH
Go to Top of Page

NickC
Yak Posting Veteran

68 Posts

Posted - 2010-08-10 : 07:37:54
Is this for a single record? theres about 600,000 records i need to do this for? sorry idera I'm a bit of a newbie so some guidance would be cool...

incidently how did you learn SQL? I'm sort of starting in my role, and knew basic functions but all these t-sql variables etc confuse me, do you have an online resource?

Loving the quote btw!

Kind Regards

Nick
Go to Top of Page

Sachin.Nand

2937 Posts

Posted - 2010-08-10 : 07:42:06
You need to create an user defined function that accepts each row value from the table & returns the desired value in your case without incorrect characters for each row.For a start have a look here

http://www.sqlteam.com/article/user-defined-functions


Limitations live only in our minds. But if we use our imaginations, our possibilities become limitless.

PBUH
Go to Top of Page

ms65g
Constraint Violating Yak Guru

497 Posts

Posted - 2010-08-10 : 08:03:22
No need scalar UDF. Just set-based approach. follow me:

SELECT D.i
FROM table_name
CROSS APPLY
(SELECT i + ''
FROM (SELECT SUBSTRING(i, n, 1), n
FROM Numbs /* Tally Tables */--you need to create a number table first.
WHERE n <= LEN(i)
)D(i, n)
WHERE i LIKE '[0-9]' OR i LIKE '[a-z]'
ORDER BY n ASC
FOR XML PATH(''))D(i);


______________________
Go to Top of Page

NickC
Yak Posting Veteran

68 Posts

Posted - 2010-08-10 : 12:27:17
number table? sorry I really am dumb! .... In my mind I'm thinking a table with just numbers in it?
Go to Top of Page

ms65g
Constraint Violating Yak Guru

497 Posts

Posted - 2010-08-10 : 13:28:13
Just execute this for creating and publishing the number table:
;SELECT TOP (8000) n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))
INTO numbers
FROM sys.columns a, sys.columns, (SELECT 1 UNION SELECT 2 UNION SELECT 3) D(i)


______________________
Go to Top of Page
   

- Advertisement -