Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 Comma delimited results in Column need to split

Author  Topic 

rcp
Starting Member

32 Posts

Posted - 2012-02-20 : 12:55:14
Hi, I have a table that stores the results as XML, example of a result

<classifications totalrecords="62"><classification numrecords="4">Dairy Farmers</classification><classification numrecords="6">Take Away Meal Outlets</classification><classification numrecords="52">Public Houses Activities Of</classification></classifications>

I need to split the results so that I have

Totals 62
Dairy Farmers 4
Take Away Meals 6
Public Houses Activities Of 52

Any ideas would be welcomed with open arms. Even if the results had the Total, Dairy Farmers etc. as headers that would be fine as well. There are about 5 million records each with 3 or less sets of classifications.

Regards,

Roland

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2012-02-20 : 13:26:30
see an example here

DECLARE @X xml
SET @X='<classifications totalrecords="62"><classification numrecords="4">Dairy Farmers</classification><classification numrecords="6">Take Away Meal Outlets</classification><classification numrecords="52">Public Houses Activities Of</classification></classifications>'

[code]
SELECT t.u.value('.','varchar(100)') as desc,
t.u.value('./@numrecords','int') as count,
t.u.value('../@totalrecords,'int') as totalcount
FROM @X.nodes('/classifications/classification')t(u)


------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

rcp
Starting Member

32 Posts

Posted - 2012-02-21 : 10:11:45
I have tidied the results using this query

SELECT
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
classifications
,'classifications','')
,'Classification','')
,'<','')
,'>','')
,'/','')
,'numrecords="',',')
,' totalrecords="','')
,'"',',')
,', ,',',Totals,')
,' ,',',') as fish
FROM [icd_log].[dbo].[icdlog_2012_02]
where classifications not like '<classifications totalrecords="0"></classifications>'

RESULTS LOOK LIKE THIS:

210,Totals,210,Take Away Meal Outlets
1,Totals,1,Beer Wine And Spirit (Retail)
1,Totals,1,Cafes And Snack Bars
6,Totals,6,Fast Food Delivery
210,Totals,210,Take Away Meal Outlets
1,Totals,1,Florists
1,Totals,1,Restaurant - Pizza
4,Totals,2,Electrical Contractors And Electricians,2,Specialised Building Trade Contractors

What I would like to do now is split the results so that they look like this:

Numbers Market Sections
4 Totals
2 Electrical Contractors And Electricians
2 Specialised Building Trade Contractors


I have been told that I can do it with a loop query, not sure how though.

Cheers,

Roland
Go to Top of Page

sunitabeck
Master Smack Fu Yak Hacker

5155 Posts

Posted - 2012-02-21 : 10:34:31
You can use a split function - for example, the one listed in Fig 21 of this page: http://www.sqlservercentral.com/articles/Tally+Table/72993/

However, the approach Visakh suggested using the XML methods would be a MUCH better approach, so I would recommend trying to make that work.
Go to Top of Page

rcp
Starting Member

32 Posts

Posted - 2012-02-21 : 10:53:22
Okay, have used the original XML method, is there a way to select the results from a query to use as SET. There are over 5 million sets of result. Below is a larger sample set.

<classifications totalrecords="24"><classification numrecords="6">Doctors</classification><classification numrecords="6">Hairdressers (Unisex)</classification><classification numrecords="12">Dentists</classification></classifications>
<classifications totalrecords="28"><classification numrecords="8">Solicitors</classification><classification numrecords="10">Kitchen Planners And Installers</classification><classification numrecords="10">Jewellery Retail Sale Of</classification></classifications>
<classifications totalrecords="8"><classification numrecords="2">Engineers (Consulting)</classification><classification numrecords="2">Gardening Services</classification><classification numrecords="4">Estate Agents</classification></classifications>
<classifications totalrecords="8"><classification numrecords="2">Beauty Salons</classification><classification numrecords="2">Take Away Meal Outlets</classification><classification numrecords="4">Doctors</classification></classifications>
<classifications totalrecords="1"><classification numrecords="1">Schools (Local Authority)</classification></classifications>
<classifications totalrecords="2"><classification numrecords="2">Funeral Services</classification></classifications>
<classifications totalrecords="1"><classification numrecords="1">Banks And Financial Institutions</classification></classifications>
<classifications totalrecords="1"><classification numrecords="1">Doctors</classification></classifications>
Go to Top of Page

sunitabeck
Master Smack Fu Yak Hacker

5155 Posts

Posted - 2012-02-21 : 11:35:01
Since your data is in a table, modify Visakh's query slightly as shwon below. Try it on a small sample table and see if you get the results you want.
SELECT t.u.value('.','varchar(100)') as [desc],
t.u.value('./@numrecords','int') as count,
t.u.value('../@totalrecords','int') as totalcount
FROM YourTable YT
cross apply YT.YourXMLCol
.nodes('/classifications/classification')t(u)
If you don't get the results you are looking for, can you post the exact output you are looking for for the sample data you posted in your latest post?


With 5 million records, this may be slow. If you need to do this query frequently, you may want to consider adding an XML primary index (at least). However, the space requirements for XML indexes are quite large, so that may be a consideration.
Go to Top of Page

rcp
Starting Member

32 Posts

Posted - 2012-02-21 : 12:07:34
This is the query I have so far


SELECT
t.u.value('.','varchar(100)') as [Market section],
t.u.value('./@numrecords','int') as Counts,
t.u.value('../@totalrecords','int') as [Total Count]
FROM [icd_log].[dbo].[icdlog_2012_02] YT
cross apply YT.classifications.nodes('/classifications/classification')t(u)

but it gives me an error message

Msg 9506, Level 16, State 1, Line 2
The XMLDT method 'nodes' can only be invoked on columns of type xml.


The query works fine when I set a variable e.g. <classifications totalrecords="24"><classification numrecords="6">Doctors</classification><classification numrecords="6">Hairdressers (Unisex)</classification><classification numrecords="12">Dentists</classification></classifications>

And declare the variable as xml. Any ideas from here?
Go to Top of Page

rcp
Starting Member

32 Posts

Posted - 2012-02-21 : 12:35:33
I have changed the query slightly

SELECT
t.u.value('.','varchar(100)') as [Market section],
t.u.value('./@numrecords','int') as Counts,
t.u.value('../@totalrecords','int') as [Total Count]
FROM (SELECT CAST(classifications as xml) AS classifications FROM [icd_log].[dbo].[icdlog_2012_02]) YT
CROSS APPLY YT.classifications.nodes('/classifications/classification')t(u)

but am now getting the error message:

Msg 9421, Level 16, State 1, Line 1
XML parsing: line 1, character 77, illegal name character

Regards,

Roland
Go to Top of Page

sunitabeck
Master Smack Fu Yak Hacker

5155 Posts

Posted - 2012-02-21 : 12:50:27
This is most likely because the data type of the column is not XML. It probably is varchar or nvarchar. You can cast it to XML and run the same query, but there are couple of concerns:

a) Casting to XML may fail because of data issues - if the string data is not well-formed XML this will happen.

b) Additional overhead of converting to XML and then shredding it. Given that, I am not sure if you will gain any performance advantage over your initial thought of treating it as a string and doing string replacements.

Regardless, worht a try - cast it to XML like this:
....
FROM (SELECT CAST(YourStringColumn AS XML) AS xmlCol FROM YourTable) YT
cross apply xmlcol.nodes('/classifications/classification')t(u)
Go to Top of Page

rcp
Starting Member

32 Posts

Posted - 2012-02-22 : 04:32:24
When I try to CAST the column to XML, I get the error message:

Msg 9421, Level 16, State 1, Line 1
XML parsing: line 1, character 77, illegal name character

I could change the data in the column so that it is comma delimited by using a replace. e.g.

<classifications totalrecords="62"><classification numrecords="4">Dairy Farmers</classification><classification numrecords="6">Take Away Meal Outlets</classification><classification numrecords="52">Public Houses Activities Of</classification></classifications>

52,Totals,4,Dairy Farmers,6,Take Away Meal Outlets,52,Public Houses Activities Of

Can I then split this? What do you think

Regards,

Roland
Go to Top of Page

sunitabeck
Master Smack Fu Yak Hacker

5155 Posts

Posted - 2012-02-22 : 05:34:54
quote:
Originally posted by rcp

When I try to CAST the column to XML, I get the error message:

Msg 9421, Level 16, State 1, Line 1
XML parsing: line 1, character 77, illegal name character

I could change the data in the column so that it is comma delimited by using a replace. e.g.

<classifications totalrecords="62"><classification numrecords="4">Dairy Farmers</classification><classification numrecords="6">Take Away Meal Outlets</classification><classification numrecords="52">Public Houses Activities Of</classification></classifications>

52,Totals,4,Dairy Farmers,6,Take Away Meal Outlets,52,Public Houses Activities Of

Can I then split this? What do you think

Regards,

Roland

That is what I was afraid of. That some of the data may not be weii-formed XML.

To split the comma separated string, you can use the function in the link in my reply on 2/21/2012. Or, search this forum, younwill find many examples.
Go to Top of Page

rcp
Starting Member

32 Posts

Posted - 2012-02-22 : 06:04:00
I have been informed that running a loop query would give me what I want. In the table there is a unique icdlog_guid. I could then use the original suggestion from visakh16 and loop through the icdlog_guid. What do you think?
Go to Top of Page

sunitabeck
Master Smack Fu Yak Hacker

5155 Posts

Posted - 2012-02-22 : 07:52:52
quote:
Originally posted by rcp

I have been informed that running a loop query would give me what I want. In the table there is a unique icdlog_guid. I could then use the original suggestion from visakh16 and loop through the icdlog_guid. What do you think?

That may not be the best solution - for two reasons:

a) More likely than not, looping would result in very poor performance.
b) You would still run into the problem of XML that is not well-formed.

What I was suggesting was that, since you already have a way of getting the data that you need as a comma-separated string even when it is not well-formed XML, to take that comma-separated string and splitting it into rows.
Go to Top of Page

rcp
Starting Member

32 Posts

Posted - 2012-02-22 : 08:14:22
Okay, if I change it into comma-separated string, can I split it so that it is in columns per row?

e.g.

results
210,Totals,210,Take Away Meal Outlets,10,Curry
1,Totals,1,Beer Wine And Spirit (Retail),
10,Totals,1,Cafes And Snack Bars,4,Chocolate

required results

A B C D E F
210 Totals 210 Take Away Meal Outlets 10 Curry
1 Totals 1 Beer Wine And Spirit (Retail) NULL NULL
10 Totals 1 Cafes And Snack Bars 4 Chocolate

Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2012-02-22 : 09:37:21
use a string parsing function to split values based on , delimiter

http://visakhm.blogspot.com/2010/02/generating-calendar-table.html

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

sunitabeck
Master Smack Fu Yak Hacker

5155 Posts

Posted - 2012-02-22 : 09:43:38
Visakh, I can't see your blog - my place of work very wisely blocks dangerous blogs ;) so I might be repeating what you have in the blog. But, rcp, here is some example code. The example uses the parsing function from Fig. 21 in Jeff Moden's article http://www.sqlservercentral.com/articles/Tally+Table/72993/ So if you want to run this test, you will need to install that function.

The temp table that I have is for generating test data.
---- TEST DATA ---------------------------------------
CREATE TABLE #tmp (id int, col1 VARCHAR(8000));
INSERT INTO #tmp VALUES (1, '210,Totals,210,Take Away Meal Outlets,10,Curry')
INSERT INTO #tmp VALUES (2, '1,Totals,1,Beer Wine And Spirit (Retail),')
INSERT INTO #tmp VALUES (3, '10,Totals,1,Cafes And Snack Bars,4,Chocolate')

-----USING Splitter function ---------------------------
SELECT
*
FROM
#tmp
CROSS APPLY (SELECT * FROM MASTER.dbo.DelimitedSplit8K(col1,',') ) s



------PIVOTING --------------------------------------
;WITH cte AS
(
SELECT
Id,ItemNumber,Item
FROM
#tmp
CROSS APPLY (SELECT * FROM MASTER.dbo.DelimitedSplit8K(col1,',') ) s
)
SELECT
*
FROM
cte
PIVOT
(MAX(Item) FOR ItemNumber IN ([1],[2],[3],[4],[5],[6]))P

---- cleanup
DROP TABLE #tmp;
Go to Top of Page
   

- Advertisement -