| Author |
Topic |
|
Sny
Starting Member
4 Posts |
Posted - 2011-05-21 : 16:51:09
|
| Hi,I have to find two most similar readers for every zip code but I can not know how to do it. my tables;book(bookid, title, author, category )reader(readerid, name, address, zip)action(actionid,date, type, bookid, readerid)SELECT * FROM (SELECT zip FROM reader GROUP BY zip) t1, (SELECT a.readerid,b.readerid compare_readerid,a.zip ZipFirstReader,count(*) booksInCommonFROM (SELECT r.readerid,a.bookid,r.zip FROM reader r,action a WHERE a.readerid = r.readerid and a.typex='B' GROUP BY r.readerid,a.bookid,r.zip) a , (SELECT r.readerid,a.bookid,r.zip FROM reader r,action a WHERE a.readerid = r.readerid and a.typex='B' GROUP BY r.readerid,a.bookid,r.zip) b WHERE a.bookid = b.bookid and a.readerid <> b.readerid GROUP BY a.readerid,b.readerid,a.zip) t2 --order by booksInCommon DESC) t2 Thank you |
|
|
Andrew Zwicker
Starting Member
11 Posts |
Posted - 2011-05-22 : 03:23:20
|
Hi,The following query should help. It looks like you are on the right track in terms of the query you have - it is using zip as a baseline and going from there to do the comparisons between readers.For the solution, here's some data to work with:create table book(bookid int, title varchar(30), author varchar(20), category varchar(50))create table reader(readerid int, [name] varchar(30), address varchar(70), zip int)create table action(actionid int, date datetime, type varchar(10), bookid int, readerid int)insert into book values(1, 'Book 1', 'Joe Smith', 'Fiction')insert into book values(2, 'Book 2', 'Joe Smith', 'NonFiction')insert into book values(3, 'Book 3', 'Greg Jones', 'Fiction')insert into book values(4, 'Book 4', 'Jane Gomes', 'Fiction')insert into reader values(1, 'Alex', '22 Green St',11111)insert into reader values(2, 'Bill', '4 Stone Street',11111)insert into reader values(3, 'Sue', '6 Main Street',53344)insert into reader values(4, 'Sue', '2 Hamilton Road',11111)insert into reader values(5, 'Mark', '2 Rocky Avenue',53344)insert into reader values(6, 'Sam', '22 Blue St',53344)insert into reader values(7, 'Tom', '4 Hammersmith Street',53344)insert into reader values(8, 'John', '6 Highland Street',53344)insert into reader values(9, 'Greg', '2 Ocean Road',11111)insert into reader values(10, 'Bobby', '2 Hammer Avenue',11111)insert into action values(3, '1/8/2011', 'B', 3, 2)insert into action values(4, '1/10/2011', 'B', 4, 1)insert into action values(5, '1/10/2011', 'B', 4, 2)insert into action values(6, '1/1/2011', 'B', 1, 5)insert into action values(7, '1/4/2011', 'B', 2, 6)insert into action values(8, '1/8/2011', 'B', 2, 7)insert into action values(9, '1/10/2011', 'B', 4, 8)insert into action values(10, '1/10/2011', 'B', 1, 9)insert into action values(11, '1/10/2011', 'B', 3, 3)insert into action values(12, '1/10/2011', 'B', 2, 4)insert into action values(13, '1/1/2011', 'B', 1, 5)insert into action values(14, '1/4/2011', 'B', 2, 1)insert into action values(15, '1/8/2011', 'B', 1, 6)insert into action values(16, '1/10/2011', 'B', 4, 1)insert into action values(15, '1/8/2011', 'B', 2, 5) With this in place, I decided to break up the solution into 2 parts [see below for the solution]. The first gets a list of, for each pair that has a match, how many matches there are. This is similar to the query you have, although I put in a > restriction in the join instead of <> for the readerid to prevent duplicates (for example, [reader A, reader B] is a duplicate of [reader B, reader A] ).For the second part I use a derived table to get the maximum number of matches for each zipcode, and then grab the needed information for displaying the reader information. One minor [potential] issue that remains is that, if 2+ pairs of readers have the same number of common books, they'll both show up on the list. It is just something to be aware of.Also, note that the joins below use the 'on abc=xyz' type syntax and the joins aren't done using simply commas. As queries get more complex, I think it is easier to follow/read the query using the 'on' type of join syntax.select ZipList.zip, reader1.readerid as 'r1', reader2.readerid as 'r2', count(*) as 'NumBookMatches' into #NumMatchesInfoFrom ( select distinct zip from reader) ZipList inner join reader reader1 on reader1.zip = ZipList.zip inner join reader reader2 on reader2.zip = ZipList.zip and reader1.readerid > reader2.readeridinner join action action1 on action1.readerid = reader1.readerid and action1.type='B'inner join action action2 on action2.readerid = reader2.readerid and action1.type='B' and action2.bookid = action1.bookidgroup by ZipList.zip, reader1.readerid, reader2.readeridselect MatchCount.zip, reader1.name, reader2.name From #NumMatchesInfo matchListinner join ( select zip, max(NumBookMatches) as 'NumBookMatches' from #NumMatchesInfo group by zip) MatchCount on MatchCount.NumBookMatches = matchList.NumBookMatchesinner join reader reader1 on reader1.readerid = matchList.r1inner join reader reader2 on reader2.readerid = matchList.r2drop table #NumMatchesInfo Based on the question, some other things you might want to consider [if you want to build this query out even more] are:1) Do you need to take into consideration the values of the type field? The example only has 'B', but what do the other codes represent? Might you need/want to match for other types?2) You might want to consider readers who read books from the same author to some degree even if the books they read are different. 3) You might want to consider readers who read books from the same book category to some degree as being similar. 2 readers who read books in the same category, even if different, would be considered different than readers who read books in the same category.I hope this helps.- Andrew ZwickerVisit http://www.helpwithsql.com |
 |
|
|
MichaelJSQL
Constraint Violating Yak Guru
252 Posts |
Posted - 2011-05-22 : 10:20:32
|
This was fun -- I think this should work, but it's early and I haven't had coffee yet. I used Andrew's tables as a test. Thanks Andrew;With MyBooksas(SELECT DISTINCT reader.zip,Reader.readerid, reader.name, book.bookid FROM reader INNER JOIN action ON reader.readerid = action.readerid INNER JOIN book ON action.bookid = book.bookid),MyResultsas(SELECT R1.zip,R1.name,Row_Number() Over(Partition by R1.zip order by COUNT(1) DESC) RwId,COUNT(1) cnt FROM MyBooks R1 Inner join MyBooks R2 ON R1.BOOKID = R2.BOOKID and R1.readerid !=R2.readerid AND R1.zip= R2.zip GROUP BY R1.zip,R1.name )Select zip,Name from MyResults Where RwId <=2based on the data:In zip code 53344 Mark and Sam are the most similar readersIn zip code 11111 Alex could have been similar to Bill or Sue as it is a tie. I am not sure how you want to break ties.I did a test by adding another book for Sueinsert into action values(15, '1/11/2011', 'B', 4,4)The tie was then broken and Sue became more similar to Alex |
 |
|
|
Sny
Starting Member
4 Posts |
Posted - 2011-05-22 : 13:41:18
|
| Thank you so much for solution. I will try it ,i think it is a good solution.. |
 |
|
|
|
|
|