Testing data in SQL and SPSS - SQL Server Forums

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

General SQL Server Forums

New to SQL Server Programming

Testing data in SQL and SPSS

Author

Topic

Mcguns
Starting Member

4 Posts

Posted - 2015-03-16 : 13:40:03

I'm new to the site and not really worked with SQL products but need technical advice between testing the same data file using SQL and SPSS.

Ok it might be a strange question but looking expert advice on a debate that is going on through work present.

I'm testing a data file in SPSS just using various logic like is the field blank or an unexpected value or if a value of 1 in column A then it has to be a value of 2 in column B etc.

Another person is testing the same dataset in SQL, they are meant to be using the same logic.
We then compare counts of how many records fail each validation.

I understand there are certain nuances with each software like upper & lower case treated as different values in SPSS, while in SQL it doesn't distinguish these.

So if we account for all these nuances I believe we should get the same numbers if our logic is the same. The other folk don't believe this! I don't understand why not!

Is there any rationale as to why we would get different results, even after accounting for the nuances and working on the same data and logic behind each validation is the same?

Thanks

Stephen McGonagle

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2015-03-16 : 13:45:37

1. What is SPSS?
2. Which collation is the SQL database using?
3. This isn't really a question we can answer for you. It is something that would have to be tested in your environment.

Tara Kizer
SQL Server MVP since 2007
http://weblogs.sqlteam.com/tarad/

Mcguns
Starting Member

4 Posts

Posted - 2015-03-16 : 14:15:18

Tara,

SPSS is an IBM product, similar to SAS but main purpose for statistical analysis. http://www-01.ibm.com/software/uk/analytics/spss/

I'll have to ask regarding point 2.

Regarding point 3, the main premise is one organisation test a data file using SQL for error using an agreed set of logic for each test. This data is then passed to ourselves, we load into SPSS and then we apply the same set of validations on the data, the logic being the same behind these tests.

I think we should end up with the same counts against each validation being tested, while accounting for nuances in each software, but others think that differences will always exist even if we test the same data, use the same logic behind each validation and account for the differences in how each software package treat data.

I'm looking for some expert advice to either confirm their thinking on this or believe the opposite to be true.

Thanks

Stephen

Thanks

Stephen McGonagle

gbritton
Master Smack Fu Yak Hacker

2780 Posts

Posted - 2015-03-16 : 15:20:05

"I understand there are certain nuances with each software like upper & lower case treated as different values in SPSS, while in SQL it doesn't distinguish these. "

SQL can distinguish these. It depends on collation (CS vs CI)

Also, if you use SSIS, you can do this sort of thing with a Data Profiling task.

Mcguns
Starting Member

4 Posts

Posted - 2015-03-17 : 05:57:24

Gbritton - the version of SQL the other organisation are using does not distinguish between upper and lower case but this is not the issue I'm trying to resolve.

I don't use SQL so forgive me for my ignorance.

The upper/lower case was just an example of how 2 different software packages can treat the same data.

This question is not about how SQL can deal with certain data items.

My query is if we account for the known differences between SPSS and SQL in how the same data is treated, we test the same data, using the same logic behind each validation/test for errors in the data we should end up with the same error counts in either spss or SQL?

Thanks

Stephen McGonagle

gbritton
Master Smack Fu Yak Hacker

2780 Posts

Posted - 2015-03-17 : 09:58:20

Well, the reason you are getting different results is probably that the assumption, "they are meant to be using the same logic." is false.

You will need to carefully compare the logic in both approaches. At some point SPSS fails a tests case while the SQL code passes it, or vice versa. Take that one test case through the logic step by step and see where one spits out an error where the other does not.

Mcguns
Starting Member

4 Posts

Posted - 2015-03-17 : 17:44:01

Gbritton, thanks again.

I totally agree with you.

The other organisation say differences can be down to the different software and don't believe we will ever match on the error counts.

I've always said if the data file is the same, logic the same and any differences how each software handles data is accounted for, then there is no reason to end up with different counts.

Thanks - I just needed some advice my thinking was sound.

Thanks

Stephen McGonagle

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources