Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 Sql Question on Rolling Dates with Conditions

Author  Topic 

NiuNiu
Starting Member

4 Posts

Posted - 2011-06-06 : 10:30:37
The general logic is "There are more than 2 test use different ID within 2 days from the same account and the total amount by these tests is above 4000 dollars. If so, the flag is 1, else the flag is 0". I want to produce the last column flag.

I think the best way to do is sql self-join to obtain the flag.

OBS Account Date Time Test ID Amount Flag
1 387000 6/10/2010 12:00:00 1 13.15 2000 1
2 387000 6/9/2010 12:00:00 1 13.15 3000 0
3 387000 6/8/2010 12:00:00 1 13.14 2000 0
4 387000 6/6/2010 11:00:00 1 13.15 2000 1
5 387000 6/5/2010 10:00:00 1 13.14 2000 0
6 387000 6/4/2010 9:00:00 1 13.13 1000 0
7 387000 6/4/2010 8:00:00 1 13.13 500 0
8 386000 5/9/2010 10:00:00 1 12.12 5000 0
9 386000 5/2/2010 9:00:00 1 12.14 2000 0
10 386000 5/1/2010 8:00:00 1 12.13 2000 0
11 385000 3/3/2010 10:00:00 1 11.13 5000 1
12 385000 3/1/2010 9:00:00 1 11.12 3000 0
13 385000 3/1/2010 8:00:00 1 11.11 3000 0

For example:
In obs 1, account 387000, within 2 days, (06/10 /10 -06/08/10), there are three test adopted different ID (13.15 and 13.14), and the total money is more than 4000 (2000+3000+2000), so the flag is 1.
In obs 2, account 387000, within 2 days, (06/9/10 -06/07/10), there are only two test adopted different ID, even though the money is above $4000. The flag is 0.
In Obs 3, account 387000, within 2 days, (06/8/10 -06/06/10), there are only two test adopted different ID, and the money is not more than $4000. The flag is 0.
In Obs 4, account 387000, within 2 days, (06/6/10 -06/04/10), there are four usage adopted different ID (13.15, 13.14, and13.13), and the money is more than $4000 (it is 5500). The flag is 1

The flag has to satisfy the following points:

1) In the same account
2) within 2 days
3) more than 2 test (each observation is one test)
4) the sum of the amount by these test is above $4000
5) used different ID.

Thank you very much for your help!

sunitabeck
Master Smack Fu Yak Hacker

5155 Posts

Posted - 2011-06-06 : 11:13:15
You can do this using a query like this - I may not have the where conditions and having clause exactly right, but the general idea is that, you put the aggregate conditions in the having clause and non-aggregate conditions in the where clause.

SELECT
a.*, 1 AS Flag
FROM
theTable a
WHERE
EXISTS
(
SELECT *
FROM theTable b
WHERE
b.Account = a.Account AND
DATEDIFF(dd,b.date,a.date) BETWEEN 0 AND 2
HAVING
SUM(b.Amount) >= 4000
AND COUNT(DISTINCT obs) > 2
)
Go to Top of Page

NiuNiu
Starting Member

4 Posts

Posted - 2011-06-06 : 15:09:57
Hi Sunitabeck,

Thank you very much! The code did work.

However, the run time is so long. I have a dataset of 466048 rows. It will run several hours.
Are there anyway the run time can be reduced.

thank you very much!
Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2011-06-06 : 15:22:58
Here's a variation that should be faster:
SELECT
a.*, 1 AS Flag
FROM
theTable a
WHERE
EXISTS
(
SELECT *
FROM theTable b
WHERE
b.Account = a.Account AND
b.date between a.date AND dateadd(day, 2, a.date)
HAVING
SUM(b.Amount) >= 4000
AND COUNT(DISTINCT obs) > 2
)
It's not the exact same query as sunita's because the time portion of the date is relevant in mine and not in hers. You'll have to test to see if that matters. If the date column is not indexed, create one on it.
Go to Top of Page

sunitabeck
Master Smack Fu Yak Hacker

5155 Posts

Posted - 2011-06-06 : 15:46:23
Thanks Rob! Optimizing queries is not my forte, but wouldn't an index on Account column also help even if the selectivity is poor?
Go to Top of Page

NiuNiu
Starting Member

4 Posts

Posted - 2011-06-06 : 15:49:28
Hi Rovbolk,

dateadd would not work in SAS environment. Since I will use Proc SQL to run sql query.

Would join the two table works. I think it will reduce the time, but I do not know how to implement it.

Thank you very much!
Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2011-06-06 : 16:04:13
quote:
Originally posted by sunitabeck

Thanks Rob! Optimizing queries is not my forte, but wouldn't an index on Account column also help even if the selectivity is poor?
Definitely would help, possibly a compound or included column for AccountID and Date.
quote:
dateadd would not work in SAS environment. Since I will use Proc SQL to run sql query.
DATEADD is Transact-SQL, if you're calling a T-SQL stored procedure then it should work fine. I don't know SAS or how it uses SQL Server data, but if it can call stored procedures there shouldn't be a problem.
Go to Top of Page

NiuNiu
Starting Member

4 Posts

Posted - 2011-06-06 : 18:02:40
Hi Robvolk & Sunitabeck,

Thank you both very much!

However, I am still confused by how to resolve the account that have the same dates, but different time.

I am using the following query in the where statement. Since I sorted the earlier time first. However, the code below do not work. Any suggest how to deal with this issue. Thank you.

OR (b.date =a.date and b.time <=a.time)

Have a very pleasant day.

Go to Top of Page
   

- Advertisement -