Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 SQL Server Development (2000)
 Beginner question

Author  Topic 

marshdabeachy
Starting Member

9 Posts

Posted - 2009-05-26 : 17:27:35
Hey everyone. I'm primarily a C++/C# developer but for this particular assignment I've been thrust into SQL databases, which I've never even touched before. What I need to do is generate a new table based off of records from two other tables. This is for a college, so my goal is to grab all students for the current quarter... I need their name and student ID number, and that's it.

However, this information is tough to get to. The student's name (STU_NAME) and number (SID) is stored in a table called SM_STU_D. However, this table has the entire history of students. What I need are just the ones for the current quarter, which is not stored in this table. It's stored in another table, called SM_STU_CLASS_D. Inside this, I've got another column called STU_QTR, which has the student's ID number with the quarter code appended onto the end of each number.

What I need to do is pass in a quarter code, and have it find all of the matching records in the SM_STU_CLASS_D table, and then use the ID number and pair up to each name in the SM_STU_D table. I need to use substring from STU_QTR to extract the actual ID, since the quarter code is appended onto the end.

This is the code I've got... pardon the uglyness with the duplicate substring, but I couldn't seem to get variables to work.


CREATE PROCEDURE st_GenerateValidID
@iCurrentQuarter VARCHAR(50)
AS

SELECT SUBSTRING(STU_QTR, 1, LEN(STU_QTR) - LEN(@iCurrentQuarter)) AS StudentID,
STU_NAME AS StudentName
FROM SM_STU_CLASS_D

JOIN SM_STU_D
ON SUBSTRING(STU_QTR, 1, LEN(STU_QTR) - LEN(@iCurrentQuarter)) = SM_STU_D.SID

WHERE STU_QTR LIKE '%' + @iCurrentQuarter


This will run, but it goes very slowly and I get multiple duplicate records in the output, so something is obviously going wrong. Does anyone have any tips to point me in the right direction?

Thanks,
Marshall

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2009-05-26 : 17:58:34
[code]
CREATE PROCEDURE dbo.st_GenerateValidID
(
@iCurrentQuarter VARCHAR(50)
)
AS

SET NOCOUNT ON

SELECT w.SID AS StudentID,
w.STU_NAME AS StudentName
FROM (
SELECT SUBSTRING(STU_QTR, 1, LEN(STU_QTR) - LEN(@iCurrentQuarter)) AS StudentID,
FROM SM_STU_CLASS_D
WHERE RIGHT(STU_QTR, LEN(@iCurrentQuarter)) = @iCurrentQuarter
) AS q
INNER JOIN SM_STU_D AS w ON w.SID = q.StudentID[/code]


E 12°55'05.63"
N 56°04'39.26"
Go to Top of Page

marshdabeachy
Starting Member

9 Posts

Posted - 2009-05-26 : 20:00:48
Fixed an extra comma to get that to compile, but it's giving me 0 results as an output. It does execute far far faster, however.

Everything in there looks logical sound though. I'm not sure where it's going wrong. :-/
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2009-05-27 : 02:23:35
Yes. There is an extra comma after the StudentID column alias, in the derived table.

The reason for this to run faster, is that the RIGHT part can be hashed and thus joined much faster, than using LIKE.
Can you post how long it took before, and how long it takes now?

Also, please post some sample data for both tables for verification of query.


E 12°55'05.63"
N 56°04'39.26"
Go to Top of Page

marshdabeachy
Starting Member

9 Posts

Posted - 2009-05-27 : 03:19:36
Yeah I noticed that RIGHT was much faster than LIKE when I was doing testing. Before it took at least several minutes, and I never let it keep going long enough to find out how long it really took. It does it now in under 2 seconds. Of course, it's giving me 0 results, so maybe that has something to do with it. :P

So in the SM_STU_D table, I've got data like this:

quote:

SID STU_NAME
925073715 AHLMAN KARIN A
925086006 GOE JOSEPH M



And in the SM_STU_CLASS_D table, the data is like so:

quote:

STU_QTR
9251049799454
9252266659454



Where the last 4 digits are the quarter code. There's obviously a ton more records and misc columns in each table, but I've excluded those since they aren't relevant.
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2009-05-27 : 03:27:28
Work perfectly for me. Maybe you didn't copy and paste the full code?
DECLARE	@SM_STU_CLASS_D TABLE
(
STU_QTR CHAR(13) NOT NULL
)

INSERT @SM_STU_CLASS_D
SELECT '9251049799454' UNION ALL
SELECT '9250737159454'

DECLARE @SM_STU_D TABLE
(
SID CHAR(9) NOT NULL,
STU_NAME VARCHAR(20)
)

INSERT @SM_STU_D
SELECT '925073715', 'AHLMAN KARIN A' UNION ALL
SELECT '925086006', 'GOE JOSEPH M'

DECLARE @iCurrentQuarter VARCHAR(50)

SET @iCurrentQuarter = '9454'

SELECT @iCurrentQuarter AS Quarter,
w.SID AS StudentID,
w.STU_NAME AS StudentName
FROM (
SELECT SUBSTRING(STU_QTR, 1, LEN(STU_QTR) - LEN(@iCurrentQuarter)) AS StudentID
FROM @SM_STU_CLASS_D
WHERE RIGHT(STU_QTR, LEN(@iCurrentQuarter)) = @iCurrentQuarter
) AS q
INNER JOIN @SM_STU_D AS w ON w.SID = q.StudentID



E 12°55'05.63"
N 56°04'39.26"
Go to Top of Page

marshdabeachy
Starting Member

9 Posts

Posted - 2009-05-27 : 03:30:16
I got it to generate an output by adding a +1 into RIGHT.


WHERE RIGHT(STU_QTR, LEN(@iCurrentQuarter)+1) = @iCurrentQuarter


It's finding records, but it's taking just as long as before, and I've got duplicates. I'm going to let it run to completion this time and see what happens.

[edit]
Ohh you replied already. Hmm, yeah that's weird. :-/ Let me try and figure out what's going on here. I definitely pasted all my code.
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2009-05-27 : 03:35:42
Adding the +1 means one of two things.

1. You have declared the STU_QTR column as CHAR with more than 13 characters
2. You have declared the STU_QTR column as VARCHAR with more than 13 characters, and data has a trailing space

In either case, you can fix them both by using this piece of code...
FROM 		(
SELECT SUBSTRING(STU_QTR, 1, LEN(STU_QTR) - LEN(@iCurrentQuarter)) AS StudentID
FROM @SM_STU_CLASS_D
WHERE RIGHT(RTRIM(STU_QTR), LEN(@iCurrentQuarter)) = @iCurrentQuarter
) AS q
...to get rid of the ugly +1 hack.


E 12°55'05.63"
N 56°04'39.26"
Go to Top of Page

marshdabeachy
Starting Member

9 Posts

Posted - 2009-05-27 : 03:56:26
Yeah, the RTRIM took care of it. Ok, I think I figured out what's going on with the multiple records. The SM_STU_CLASS_D can have multiple records for each student ID. As far as I can tell, each record is a class they've signed up for, so it makes sense to have multiple records here, I suppose. That's where the dups are coming from. This is a pretty muddled table to pull data from... sigh. Is there a way to force it so only one record for each student ID shows up?

Also, this is still really really slow. It's probably because the SM_STU_CLASS_D table has 1.5 million records (yeesh, note to self: look at this stuff before coding). So odds are there's not really a good way to speed this up, huh?

Let me talk to the database admins again and see if they've got another table stashed away somewhere that has this data. I am curious about that 'merged' record question, though.

Also, many thanks for your help Peso. You truly are a Patron Saint of Lost Yaks.
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2009-05-27 : 04:00:06
1. Rewrite "INNER JOIN" to "INNER HASH JOIN"
2. Are CurrentQuarter ALWAYS 4 digits? And StudentID always 9 characters? If so, you can probably make a indexed view for sm_stu_class_d table.



E 12°55'05.63"
N 56°04'39.26"
Go to Top of Page

marshdabeachy
Starting Member

9 Posts

Posted - 2009-05-27 : 04:05:08
Odd, I get a compiler error when I do INNER HASH JOIN:

quote:

Server: Msg 8622, Level 16, State 1, Procedure st_GenerateValidID, Line 12
Query processor could not produce a query plan because of the hints defined in this query. Resubmit the query without specifying any hints and without using SET FORCEPLAN.



I have no idea what it's talking about. I'm not using SET FORCEPLAN anywhere.

I'll look into an indexed view. Hopefully I can just find a smaller table with less redundant data, that would make things way easier.
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2009-05-27 : 04:45:02
The HASH JOIN was only a suggestion from my part.
What about the character lengths?


E 12°55'05.63"
N 56°04'39.26"
Go to Top of Page

marshdabeachy
Starting Member

9 Posts

Posted - 2009-05-27 : 05:15:16
As far as I know the character lengths are fixed, although I was told that could theoretically change in the future. I'm letting it run until it finishes right now and I'll see what's acceptable for the database guys in the morning.

As far as the merging goes, it looks like GROUP BY is what I'm looking for, but it also wants an aggregate clause to merge the records together... which is somewhat redundant since the names are all duplicates too. But everything I've tried with GROUP BY returns 0 results after I stop it. Maybe I'm just not letting it run long enough.
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2009-05-27 : 05:26:34
[code]FROM (
SELECT DISTINCT
SUBSTRING(STU_QTR, 1, LEN(STU_QTR) - LEN(@iCurrentQuarter)) AS StudentID
FROM @SM_STU_CLASS_D
WHERE RIGHT(RTRIM(STU_QTR), LEN(@iCurrentQuarter)) = @iCurrentQuarter
) AS q[/code]


E 12°55'05.63"
N 56°04'39.26"
Go to Top of Page

marshdabeachy
Starting Member

9 Posts

Posted - 2009-05-27 : 05:35:09
Wow. I mean... wow.

That works, and it finishes in 3 seconds now. DISTINCT must unleash magic pixies inside the server. You killed two birds with one stone there. I doff my virtual hat to you Peso.

:-)
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2009-05-27 : 05:45:03
Thank you.

The dupes are because same student is attending several times for same Quarter (probably different classes).



E 12°55'05.63"
N 56°04'39.26"
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2009-05-27 : 05:55:11
Now, with DISTINCT in place, try the HASH JOIN hint.
SELECT 		@iCurrentQuarter AS Quarter,
w.SID AS StudentID,
w.STU_NAME AS StudentName
FROM (
SELECT DISTINCT
SUBSTRING(STU_QTR, 1, LEN(STU_QTR) - LEN(@iCurrentQuarter)) AS StudentID
FROM @SM_STU_CLASS_D
WHERE RIGHT(RTRIM(STU_QTR), LEN(@iCurrentQuarter)) = @iCurrentQuarter
) AS q
INNER HASH JOIN @SM_STU_D AS w ON w.SID = q.StudentID



E 12°55'05.63"
N 56°04'39.26"
Go to Top of Page

marshdabeachy
Starting Member

9 Posts

Posted - 2009-05-27 : 06:03:18
HASH JOIN works now. It produces the exact same result in the same amount of time. Is there a specific advantage to HASH JOIN that isn't apparent?

Also, I have another question that maybe you (or someone else) could help with. I'm using DTS to run this at a scheduled time every day. Since this procedure takes a parameter, I don't want to have to hard-code that value into the DTS package. Does SQL have something akin to a global variable that I can read/write to? I just need a variable that hangs around even after the procedure has finished. I thought of adding a table containing the data, but if there's an easier way I'll go that route.
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2009-05-27 : 06:33:16
HASH JOIN will be faster for larger datasets.
How many records do the query in the derived table produce? If they are small, SQL Server will go for a MERGE JOIN, which need both sides to be sorted, which in turn introduces an extra step in the execution plan.

If you look at the execution plan now, I think SQL Server choose INNER LOOP JOIN for the query. Loop joins works best for 100-1000 records. Below 100 records, a INNER MERGE JOIN will be faster.
Over 1000 records an INNER HASH JOIN will be faster.

These numbers are not 100% accurate, but they are in the correct neighbourhood.


E 12°55'05.63"
N 56°04'39.26"
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2009-05-27 : 06:34:56
1. You can store the "parameter" value in a table and have the DTS package read the value.
2. You can pass the parameter value to the package when executing itm using DTSRUN and the /A option.




E 12°55'05.63"
N 56°04'39.26"
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2009-05-27 : 06:40:51
DTSRun /S /U "*****" /P "*****" /N "*****" /A "Param1":7="Mar 30 2005" /A "Param2":8="d100"

As you can see, you should implicit convert the values when sending the parameters to the package.
It is done with the 7 (datetime) and 8 (varchar) options.
There are more numbers but I can't remember them right now.



E 12°55'05.63"
N 56°04'39.26"
Go to Top of Page
    Next Page

- Advertisement -