Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
cjp
Yak Posting Veteran
69 Posts |
Posted - 2011-09-07 : 05:33:31
|
I am using SQLS 2008 x64 SP 2 running on W 7 Pro x64 SP 1.I have been outputting manipulated data from a large medical database to txt files for statistical analysis. I have used bcp to export the data direct to the usb disk where I am holding the backups. The dataset is for a study on a serious arterial disease called telangiectasia. Here is the query (which, essentially, works fine):USE [telangiectasia]GOIF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[outtelang]') AND type in (N'P', N'PC'))DROP PROCEDURE [dbo].[outtelang]GOCREATE PROCEDURE dbo.outtelangASBEGINSELECT'combid', 'evdate', 'evdatereal', 'medcode', 'descr'Union AllSelectConvert(varchar (100), combid),Convert(varchar (100), evdate),Convert(varchar (100), evdatereal, 103),Convert(varchar (100), medcode),Convert(varchar (100), descr)FROM ##telangENDGOEXEC xp_cmdshell'bcp "EXEC telangiectasia.dbo.outtelang" queryout "L:\telangiec\telang.txt" -T -c -a32768'GOThe bcp takes its data from a temp table (##telang). This is very fast and everything outputs correctly except for some of the entries in the last field 'descr' (description). For reasons that baffle me, most of the entries are correct but some of them add spurious extra characters; this seems to be restricted to descriptions that are quite short (longer ones output correctly even though they are under 100 characters in length). Here is what it looks like:Telang_eventscombid evdate evdatereal medcode descra670600F@ 20100511 11/05/2010 P736.11 Pulmonary arterio-venous fistulaÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿa670601hD 20040000 7A60.00 Arteriovenous shuntÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿa670602Vf 20000000 G770.00 Hereditary Haemorrhagic Telangiectasiaa673201Kz 19950901 01/09/1995 G770.00 Hereditary Haemorrhagic Telangiectasiaa6734005m 20071012 12/10/2007 7A60.00 Arteriovenous shuntÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿa673407RK 20100805 05/08/2010 7A60.00 Arteriovenous shuntÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿa674400tH 19990302 02/03/1999 G770.00 Hereditary Haemorrhagic TelangiectasiaSorry for the layout problems.As you can see, the first four fields are fine and the issue is restricted to shorter entries in the final field. I should add that, if I make a saved table and then pull this into Access, there is no problem - the description field is perfect. So I don't think there are underlying problems in the data - it is the bcp process that is generating the spurious elements.By the way, if a date string cannot be parsed into date format, evdatereal is blank (eg 20040000).Can anyone help me to understand this and to fix the issue with the description field?Thanks.Chris |
|
webfred
Master Smack Fu Yak Hacker
8781 Posts |
Posted - 2011-09-07 : 08:39:40
|
can you please show the code that is filling the table ##telang with data? No, you're never too old to Yak'n'Roll if you're too young to die. |
|
|
cjp
Yak Posting Veteran
69 Posts |
Posted - 2011-09-08 : 03:22:59
|
I was about to post the requested code when I had a thought that means I have found the solution (there is nothing wrong with bcp).The lookup for diagnoses comes in a professionally prepared dictionary that has no spurious characters. For this study, however, a student had hacked the dictionary and had presented me with his hack. For some reason, he had introduced spaces (and even a couple of carriage returns) into the description field for shorter entries whilst he was preparing the lookup. Perhaps he was trying to even up the length of the field for each entry.The long and the short of it is that I tested this hypothesis by identifying and removing the spurious spaces in his lookup and, lo and behold, the procedure worked perfectly - no nonsense in the output.I suppose this shows that nothing should be taken for granted. Students have been instructed not to hack dictionaries in Excel, but they do it anyway. So... I will simply have to assume that lookups are broken and go looking for trouble before I use them.Sorry to have posted unnecessarily on this but I hope the cautionary tale will be of use: don't use other people's data until you have established that it is bullet-proof. Thanks.Chris |
|
|
|
|
|
|
|