Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 How to rewrite the below SQL query

Author  Topic 

raihan26
Starting Member

8 Posts

Posted - 2012-07-23 : 17:23:51
Below is the data in TestingTable1


BUYER_ID | ITEM_ID | CREATED_TIME
-----------+-------------------+------------------------
1345653 110909316904 2012-07-09 21:29:06
1345653 151851771618 2012-07-09 19:57:33
1345653 221065796761 2012-07-09 19:31:48
1345653 400307563710 2012-07-09 18:57:33
1345653 310411560125 2012-07-09 16:09:49
1345653 120945302103 2012-07-09 13:40:23
1345653 261060982989 2012-07-09 09:02:21


Below is the data in TestingTable2


USER_ID | PRODUCT_ID | LAST_TIME
-----------+-------------------+-------------------
1345653 110909316904 2012-07-09 21:30:06
1345653 152851771618 2012-07-09 19:57:33
1345653 221065796761 2012-07-09 19:31:48
1345653 400307563710 2012-07-09 18:57:33


I need to Compare `TestingTable2` with `TestingTable1` on `BUYER_ID` and `USER_ID`. And I need to find all (basically the count) the missing and mismatch entries in `TestingTable2` after comparing from `TestingTable1`. I created SQL fiddle for this-

[url]
http://sqlfiddle.com/#!3/d87b2/1
[/url]

If you run my query in the SQL Fiddle, you will get output as-


BUYER_ID ERROR
1345653 5


which is right as last `three` rows from `TestingTable1` is missing in `TestingTable2` and rest `two` are mismatch after comparison from `TestingTable1` on `BUYER_ID` and `USER_ID`.

Now the complicated thing is starting.

**Problem Statement-**

In my current output, I am getting ERROR count as `5`. So if you see first row in both the tables `ITEM_ID` and `PRODUCT_ID` are same but `CREATED_TIME` and `LAST_TIME` is not same, and difference between those two times is of only `1 minute`. So currently I am reporting that as a mismatch, but what I need is that if the difference between them is within `15 minutes range`, then I don't want to report as an error. So after implementing this feature in my current query, I will be getting error count as `4` because difference is within `15 minutes range` for the first row.

So below is my solution for this problem, and below is the sql query that works fine in SQL server(which will give error count as 4) but not in Hive as `Hive supports only equality JOINS` and I cannot run the below query in Hive. So I need some other way (alternate way) of doing this problem. Is it possible to do the date difference condition in where clause somehow? `Basically how I can rewrite the below SQL query` in some other way such that it would fulfill my all the requirements above.


SELECT TT.BUYER_ID,
COUNT(*)
FROM (
SELECT testingtable1.buyer_id,
testingtable1.item_id,
testingtable1.created_time
FROM testingtable2
RIGHT JOIN testingtable1
ON (
testingtable1.item_id = testingtable2.product_id
AND testingtable1.BUYER_ID = testingtable2.USER_ID
AND ABS(DATEDIFF(mi, testingtable1.created_time, testingtable2.last_time)) <= 15
)
WHERE testingtable2.product_id IS NULL
) TT
GROUP BY TT.BUYER_ID;



Expected Output that I need after implementing the above feature-


BUYER_ID ERROR
1345653 4

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2012-07-23 : 20:54:32
you would be better off posting this in some Hive related forums so we dont have proficiency over Hive.

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page
   

- Advertisement -