Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
MariaLua
Starting Member
5 Posts |
Posted - 2004-12-17 : 06:36:17
|
Hello,I am very perplexed with this problem I'm having. I work for a website-based company and my job is to create a data warehouse based on usage of the site. We have a Central Logging Database that logs every single hit to the site, then i have a stored procedure that picks up these hits, does a bit of data scrubbing and transformation, and puts them into a Warehouse database. i then have an Analysis Services cube that points at this warehouse database. I then use Reporting Services and MDX to create reports based on this usage info.Scenario: the total hits in the database can be further grouped into companies, so reports can be run to show a company's usage of the site. Companies are then further divided into Users belonging to a company. I.e. reports are generally run to show a company's usage, broken down into individual users within that company.In order to check that my warehouse-load stored procedure works properly I compared the amount of hits in the Central Logging Database (where all hits are originally recorded) with hits in the final Warehouse database to see if they balance. Everything looks fine and hunky-dory. My problem is this: When I browse the cube in analysis manager, some of the companies have a higher hit count, i.e. show more hits, than actually exist in any of the original databases! In the cube one company is even showing 1,000 hits for a user who does not have ANY hits in any of the original databases!!! the total amount of hits shown in the cube is equal to the original databases, so some how it balances. but i don't understand where all this extra\incorrect data is coming from.My cube consists of a very simple Star-schema design. one main table called Hits, and UserAccount, BusinessEntity, and Date dimensions. the Hits table consists of the following columns:Hit_ID (int)DatabaseSource (int)UserAccountID (int)BusinessEntityID (int)LogDateID (int)One hit is defined by a Hit_ID and DatabaseSource (these two form a composite primary key). My Measure in the cube is the Hit_ID and I’m doing a Count on it.Does anyone know what i could possibly be doing wrong? I would be very grateful for any bit of help anyone could provide.Thanks in advance!Maria |
|
MariaLua
Starting Member
5 Posts |
Posted - 2004-12-20 : 08:44:06
|
Hellojust a follow up on this problem i was having. On the microsoft.olap forum someone suggested that i enable drillthrough on the cube so that i could browse directly back to the database to see what exactly these extra rows are. I did this and i found out this:every so often we get a request to merge all users from once companyID into another companyID (e.g. for a company merge). So in our UserAccounts table, all users who were previously assigned to CompanyID 7 (old company) will now be assigned to CompanyID 8 (new company). this is just a straight UPDATE on the table.BUT, in our Hits database, we don't update the hits to reflect this change in CompanyID. so to get a meaningful report for a company that has merged, you will have to run one report each on both CompanyID 7 and CompanyID 8, and then combine the two.THIS is exactly what Analysis Services was doing!!! I had forgotten that we had this data issue so the report i was running off the original Hits database had less hits than in the Analysis Services cube - because i was not adding on the extra hits from the old companyID!!!I am walking on air at the moment! Analysis Services rocks!!Maria |
|
|
|
|
|
|
|