Calculating duplication percentage

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2005 Forums

Transact-SQL (2005)

Calculating duplication percentage

Author

Topic

dmilam
Posting Yak Master

185 Posts

Posted - 2010-08-13 : 13:14:04

Is there an aggregate function (I can't find one) or other method for calculating the percentage of duplication across tables?

Scenario:

Table 1 is populated with distinct IDs restricted by one set of various criteria.
Table 1 is populated with distinct IDs restricted by another, differing set.


select distinct id
into table1
where <criteria>
join <tableX>

insert into table1
select distinct id
where <different criteria>
join <tableY>

I want to return the percentage of IDs which have been added to table1 which were already added in the first iteration, if that makes sense. Yes, the table holds duplicates throughout the query, then is 'dumped' into another table, where IDs are made distinct for no duplication.

In other words, I'm trying to calculate the waste involved.

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2010-08-14 : 02:44:00

that you need to do before second insert. like


select count(distinct id)*100.0/cnt
where <different criteria>
join <tableY> y
cross join (select count(distinct id) as cnt
where <different criteria>
join <tableY>
)
where exists(select 1 from table1 where somefield= y.somefield)

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

dmilam
Posting Yak Master

185 Posts

Posted - 2010-08-16 : 12:20:24

Thanks, but I don't understand this syntax. It's possible to join without an ON condition?

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2010-08-16 : 12:32:11

select distinct id
into table1
from <tableY>
join <tableX>
where <criteria>

select count(distinct id)*100.0/cnt
from <tableX>
join <tableY>
cross join (select count(distinct id) as cnt
from <tableX>
join <tableY>
where <different criteria>
)
where <different criteria>

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

dmilam
Posting Yak Master

185 Posts

Posted - 2010-08-16 : 13:49:36

Thanks, but I am baffled.

Msg 156, Level 15, State 1, Line 26
Incorrect syntax near the keyword 'where'.

This is the last where clause. I'm using SQL Server 2005.

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2010-08-16 : 13:53:57

give derived table a name


select distinct id
into table1
from <tableY>
join <tableX>
where <criteria>

select count(distinct id)*100.0/cnt
from <tableX>
join <tableY>
cross join (select count(distinct id) as cnt
from <tableX>
join <tableY>
where <different criteria>
)t
where <different criteria>

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

dmilam
Posting Yak Master

185 Posts

Posted - 2010-08-16 : 15:38:27

Thanks; giving the derived table a name, plus adding a group by clause, works now. Thanks again.

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2010-08-17 : 10:33:27

welcome

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources