Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
NiuNiu
Starting Member
4 Posts |
Posted - 2011-06-06 : 10:30:37
|
| The general logic is "There are more than 2 test use different ID within 2 days from the same account and the total amount by these tests is above 4000 dollars. If so, the flag is 1, else the flag is 0". I want to produce the last column flag.I think the best way to do is sql self-join to obtain the flag.OBS Account Date Time Test ID Amount Flag1 387000 6/10/2010 12:00:00 1 13.15 2000 12 387000 6/9/2010 12:00:00 1 13.15 3000 03 387000 6/8/2010 12:00:00 1 13.14 2000 04 387000 6/6/2010 11:00:00 1 13.15 2000 15 387000 6/5/2010 10:00:00 1 13.14 2000 06 387000 6/4/2010 9:00:00 1 13.13 1000 07 387000 6/4/2010 8:00:00 1 13.13 500 08 386000 5/9/2010 10:00:00 1 12.12 5000 09 386000 5/2/2010 9:00:00 1 12.14 2000 0 10 386000 5/1/2010 8:00:00 1 12.13 2000 011 385000 3/3/2010 10:00:00 1 11.13 5000 112 385000 3/1/2010 9:00:00 1 11.12 3000 013 385000 3/1/2010 8:00:00 1 11.11 3000 0For example:In obs 1, account 387000, within 2 days, (06/10 /10 -06/08/10), there are three test adopted different ID (13.15 and 13.14), and the total money is more than 4000 (2000+3000+2000), so the flag is 1. In obs 2, account 387000, within 2 days, (06/9/10 -06/07/10), there are only two test adopted different ID, even though the money is above $4000. The flag is 0.In Obs 3, account 387000, within 2 days, (06/8/10 -06/06/10), there are only two test adopted different ID, and the money is not more than $4000. The flag is 0.In Obs 4, account 387000, within 2 days, (06/6/10 -06/04/10), there are four usage adopted different ID (13.15, 13.14, and13.13), and the money is more than $4000 (it is 5500). The flag is 1The flag has to satisfy the following points:1) In the same account2) within 2 days3) more than 2 test (each observation is one test)4) the sum of the amount by these test is above $40005) used different ID.Thank you very much for your help! |
|
|
sunitabeck
Master Smack Fu Yak Hacker
5155 Posts |
Posted - 2011-06-06 : 11:13:15
|
You can do this using a query like this - I may not have the where conditions and having clause exactly right, but the general idea is that, you put the aggregate conditions in the having clause and non-aggregate conditions in the where clause.SELECT a.*, 1 AS FlagFROM theTable aWHERE EXISTS ( SELECT * FROM theTable b WHERE b.Account = a.Account AND DATEDIFF(dd,b.date,a.date) BETWEEN 0 AND 2 HAVING SUM(b.Amount) >= 4000 AND COUNT(DISTINCT obs) > 2 ) |
 |
|
|
NiuNiu
Starting Member
4 Posts |
Posted - 2011-06-06 : 15:09:57
|
| Hi Sunitabeck,Thank you very much! The code did work. However, the run time is so long. I have a dataset of 466048 rows. It will run several hours. Are there anyway the run time can be reduced.thank you very much! |
 |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2011-06-06 : 15:22:58
|
Here's a variation that should be faster:SELECT a.*, 1 AS FlagFROM theTable aWHERE EXISTS ( SELECT * FROM theTable b WHERE b.Account = a.Account AND b.date between a.date AND dateadd(day, 2, a.date) HAVING SUM(b.Amount) >= 4000 AND COUNT(DISTINCT obs) > 2 ) It's not the exact same query as sunita's because the time portion of the date is relevant in mine and not in hers. You'll have to test to see if that matters. If the date column is not indexed, create one on it. |
 |
|
|
sunitabeck
Master Smack Fu Yak Hacker
5155 Posts |
Posted - 2011-06-06 : 15:46:23
|
| Thanks Rob! Optimizing queries is not my forte, but wouldn't an index on Account column also help even if the selectivity is poor? |
 |
|
|
NiuNiu
Starting Member
4 Posts |
Posted - 2011-06-06 : 15:49:28
|
| Hi Rovbolk,dateadd would not work in SAS environment. Since I will use Proc SQL to run sql query. Would join the two table works. I think it will reduce the time, but I do not know how to implement it. Thank you very much! |
 |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2011-06-06 : 16:04:13
|
quote: Originally posted by sunitabeck Thanks Rob! Optimizing queries is not my forte, but wouldn't an index on Account column also help even if the selectivity is poor?
Definitely would help, possibly a compound or included column for AccountID and Date.quote: dateadd would not work in SAS environment. Since I will use Proc SQL to run sql query.
DATEADD is Transact-SQL, if you're calling a T-SQL stored procedure then it should work fine. I don't know SAS or how it uses SQL Server data, but if it can call stored procedures there shouldn't be a problem. |
 |
|
|
NiuNiu
Starting Member
4 Posts |
Posted - 2011-06-06 : 18:02:40
|
| Hi Robvolk & Sunitabeck,Thank you both very much! However, I am still confused by how to resolve the account that have the same dates, but different time.I am using the following query in the where statement. Since I sorted the earlier time first. However, the code below do not work. Any suggest how to deal with this issue. Thank you. OR (b.date =a.date and b.time <=a.time)Have a very pleasant day. |
 |
|
|
|
|
|
|
|