| Author |
Topic |
|
WJHamel
Aged Yak Warrior
651 Posts |
Posted - 2012-02-17 : 13:20:26
|
| What i'm prone to doing, to bypass my deficient sql skills, in converting data for customers, and in parsing out squashed street address data, is to export the data to excel, parse the house number, street, quadrant, apartmentno, etc. into it's respective columns and then reimport back to sql.I would like to be able to accomplish this all in sql. So in the case of one particular table, there is a column labeled 'addressline1' which can contain anything ranging from a single character to something along the lines of '1224 E. Main Street, apartment B3 Southeast". What would be suggested to use in sql to parse the relevant data out to adjacent columns to populate house number, street, quadrant, apartmentno, etc? I don't see substring working becuase what i really want to do is break these lines up according to spaces between words or characters in the line. If i could start by just doing that, i could then re-concatenate the needed columns back together on a case by case basis.suggestions? |
|
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
|
|
WJHamel
Aged Yak Warrior
651 Posts |
|
|
WJHamel
Aged Yak Warrior
651 Posts |
Posted - 2012-02-17 : 13:39:04
|
| i assume, using your function 'as is', it is using a 'space' as the delimiter? |
 |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2012-02-17 : 13:59:19
|
| If you have to do a lot of address scrubbing/validation, I strongly suggest buying a commercial program like MelissaData or similar. This is based on my experience with scrubbing millions of addresses. I never found a reliable way to handle things like "1224 E. Main Street, apartment B3 Southeast", abbreviations for street/avenue/boulevard/apartment, and so on in SQL Server. Nor could I find a method that identified the 10% that fell out of a general algorithm that did work on 90%. The advantage that Excel has is flexibility in parsing unstructured data, which SQL isn't as good at. However Excel is not good for managing 100K+ rows of data in a predictable manner. If you are going to spend more than 1-2 hours a week scrubbing addresses, MelissaData will pay for itself within 2-3 months.BTW, is the "Southeast" in your example part of the apartment number, or is it part of the street? Things like that are what make it not worth doing in SQL. |
 |
|
|
WJHamel
Aged Yak Warrior
651 Posts |
Posted - 2012-02-17 : 14:05:30
|
| Robvolk,thanks. i typically have to deal with hundreds of thousands or millions of rows of address data. Just so i know i'm comparing apples to apples, is the below link the product you're referring to?:http://www.melissadata.com/data-fielding-parsing/index.htm |
 |
|
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2012-02-17 : 14:07:41
|
quote: Originally posted by WJHamel i assume, using your function 'as is', it is using a 'space' as the delimiter?
its using , as delimiter------------------------------------------------------------------------------------------------------SQL Server MVPhttp://visakhm.blogspot.com/ |
 |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2012-02-17 : 17:21:00
|
| Yes, that's the product I've used. You may want to ask around on StackExchange if there are other recommendations, I haven't used it in over 5 years. Basically this is a parsing problem, not a SQL problem, and someone may have suggestions on how you can approach it.Ultimately I found the Melissa software good enough for what was needed. My company at the time used another piece of software to scrub addresses in their AS/400 system. I don't know what it was but it only did a marginal job. |
 |
|
|
|