Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 Count of similar rows in a hierarchy structure.

Author  Topic 

esquel
Starting Member

6 Posts

Posted - 2012-03-26 : 14:08:20
What I'm trying to do is count rows that have a similar, yet not duplicate, place in a hierarchy structure.
I've tried various types of sub queries and read online about recursive CTE queries but I'm not sure if they apply to what I am trying to do.

There's 5 levels to the structure:
Letter
Letter Letter
Letter Letter Number
Letter Letter Number Number
Letter Letter Number Number Number

An example would be:
A
AA
AA.1000
AA.1000-1000
AA.1000-1000.100

Here's a sample table of the data:

Table1_ID	Table2_ID	HierarchyStructure
1001 1 A
1001 2 BH
1001 3 CJ
1002 1 BH.1000
1002 1 BH.1000-1000
1002 2 AL
1002 2 AL.5000
1002 3 CJ.1000-1500
1002 4 BH.1200
1003 1 AB.2700
1003 1 AB.2700-1200
1003 1 AB.2700-1200.300
1003 1 BH.1300
1004 1 A
1004 1 AH
1004 2 AJ
1004 2 BD.1000-500


Here are a couple of the queries that I have tried:

select s.Table1_ID, 
s.Table2_ID,
count(s.HierarchyStructure) as count
from sample as s
where s.HierarchyStructure in
(select s2.HierarchyStructure from sample as s2 where s.Table1_ID = s2.Table1_ID and s.Table2_ID = s2.Table2_ID)
group by s.Table1_ID, s.Table2_ID
order by count desc
;


select s.Table1_ID, 
s.Table2_ID,
count(s.HierarchyStructure) as count
from sample as s
left join sample as s2
on s.Table1_ID = s2.Table1_ID
and s.Table2_ID = s.Table2_ID
and s.HierarchyStructure like s2.HierarchyStructure + '%'
where s2.Table1_ID is not null and s2.Table2_ID is not null
group by s.Table1_ID, s.Table2_ID
order by count desc
;


Ideally these three rows would return as a count of 2.
1002	1	BH.1000
1002 1 BH.1000-1000
1002 2 AL


Since BH.1000 is in BH.1000-1000. At least that is the behavior of the query that I am looking for.

These four rows would return as a count of 3.
1003	1	AB.2700
1003 1 AB.2700-1200
1003 1 AB.2700-1200.300
1003 1 BH.1300


Since three of the hierarchy structures are within each other.

Would this be a good spot to use the CONTAINS function?
Or is there some real simple way of doing this and I'm over complicating it?
I thought I might accomplish it with a LIKE statement and a subquery but I get an error "Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression."

Any thoughts or insight is appreciated!

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2012-03-26 : 14:39:42
sorry didnt understand why you're interpreting same scenario in different ways

if BH.1000 and BH.1000-1000 are considered simlar
why not consider AB.2700 and AB.2700-1200 also similar
then you'll end up with count as 2 in second case

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

esquel
Starting Member

6 Posts

Posted - 2012-03-26 : 14:43:12
Sorry for the confusion. They both are similar, it's just that in the second example there's 3 of them and I didn't list them all.
So AB.2700 and AB.2700-1200 and AB.2700-1200.300 would all be counted as 3.
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2012-03-26 : 14:45:30
hmm... how will it be 3? going by above logic shouldnt it be 2?

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

esquel
Starting Member

6 Posts

Posted - 2012-03-26 : 14:58:18
Because they are all part of the same hierarchy, just at different levels.

For example:

BH.1000 (3rd level)
BH.1000-1000 (4th level)

AB.2700 (3rd level)
AB.2700-1200 (4th level)
AB.2700-1200.300 (5th level)

I'm trying to count all the ones that are a part of each others hierarchy level. (within the two ID columns)
There's over 60,000 rows like this too in the actual table.
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2012-03-26 : 15:00:19
sorry your rule doesnt make much sense to me... if BH.1000 and BH.1000-1000 are similar

i would definitely expect AB.2700 and AB.2700-1200 to be similar too

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

esquel
Starting Member

6 Posts

Posted - 2012-03-26 : 15:09:04
Sorry, I don't think I'm explaining it very well.
You are correct that they are indeed similar too.
But they also are similar to the next level.
AB.2700 is a part of AB.2700-1200 which is a part of AB.2700-1200.300 so I would count that as 3 versus BH.1000 being a part of BH.1000-1000 which counts as 2.
Only because that's how many levels of each set are shown. Some might have 4 or 5 or even 6. Others only 1.
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2012-03-26 : 15:44:53
what about this?


SELECT Table1_ID,LEFT(HierarchyStructure+'.',CHARINDEX('.',HierarchyStructure+'.')-1),COUNT(*)
FROM table
GROUP BY Table1_ID,LEFT(HierarchyStructure+'.',CHARINDEX('.',HierarchyStructure+'.')-1)


------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

esquel
Starting Member

6 Posts

Posted - 2012-03-26 : 16:17:16
Thanks, I think that is going in the direction that I am trying for. I will try some variations of that to capture all possible 6 levels.
There could be a combination of A as a part of AB as a part of AB.2700 as a part of AB.2700-1200 as a part of AB.2700-1200.300 as a part of AB.2700-1200.300-10.

A (level 1)
AB (level 2)
AB.2700 (level 3)
AB.2700-1200 (level 4)
AB.2700-1200.300 (level 5)
AB.2700-1200.300-10 (level 6)

I did not try other string functions since contains would not work because there's no indexing on that table/column. Will give this a shot though, thanks again.
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2012-03-27 : 12:05:33
welcome
let me know how you got on

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

esquel
Starting Member

6 Posts

Posted - 2012-03-29 : 12:52:47
Well, I couldn't get results that I was looking for with SQL Server so I resorted to VBA code and MS Access. I would like to be able to replicate the results in SQL Server though.

VBA Code:

Public Function CheckHierForMultiLevelUse(t1id As Variant, t2id As Variant, HierStruct As Variant) As Boolean
Dim strSql As String
strSql = ""
strSql = strSql & " SELECT s.HierarchyStructure "
strSql = strSql & " FROM sample as s "
strSql = strSql & " WHERE s.Table1_ID=" & t1id & " AND s.Table2_ID=" & t2id
strSql = strSql & " ;"

CheckHierForMultiLevelUse = False
Dim MultiCount As Integer
MultiCount = 0

On Error GoTo Err_Handler

Dim rs As DAO.Recordset

Set rs = DBEngine(0)(0).OpenRecordset(strSql, dbOpenDynaset)

Do While Not rs.EOF
If Not IsNull(rs(0)) Then
If InStr(1, rs(0), HierStruct, vbTextCompare) > 0 Then
MultiCount = MultiCount + 1
End If
End If
rs.MoveNext
Loop
rs.Close

If MultiCount > 1 Then
CheckHierForMultiLevelUse = True
End If

Exit_Handler:
Set rs = Nothing
Exit Function

Err_Handler:
MsgBox "An error has occured!" _
& vbCrLf & vbCrLf _
& "Number: " & Err.Number _
& vbCrLf & vbCrLf _
& "Description: " & Err.Description _
& vbCrLf & vbCrLf _
& "SQL Statement: " & strSql _
, vbExclamation, "Function CheckHierForMultiLevelUse"
Resume Exit_Handler

End Function


And my SQL query in MS Access:

SELECT
s.Table1_ID,
s.Table2_ID,
s.HierarchyStructure
FROM sample as s
WHERE CheckHierForMultiLevelUse(s.Table1_ID, s.Table2_ID, s.HierarchyStructure) = True;


Basically the VBA code uses INSTR to determine if the HierarchyStructure for all pairs of Table1_ID and Table2_ID are within each other, and then tallies the results. Each result is going to equal at least 1, so True is returned only when greater than 1.
I know that SQL Server has CHARINDEX instead of INSTR but I wasn't able to use it with a subquery as one of the expressions.

At least the VBA/MS Access returns the rows that I am looking for and I can use Count() against it. It's just terribly slow against 60k rows, and I'm sure there's got to be a better/faster more efficient way to accomplish the same thing in SQL Server.
Go to Top of Page
   

- Advertisement -