Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 Express Edition and Compact Edition (2005)
 Problems with Unicode comparisons

Author  Topic 

schmmd
Starting Member

1 Post

Posted - 2007-07-24 : 17:18:43
I have a java program that communicates with a SQL database. Previously, the java program called a "BULK INSERT" with CODEPAGE=65001 to import a UTF-8 unicode file. This seemed to import data fine, however it was rather undocumented. I tried converting the input file to UTF-16 (java wouldn't make a UCS-2 file) and "BULK INSERT"ing with DATAFILETYPE='widechar', which worked as well. Is one of these methods preferrable? Or is there a better method?

Most of the data is handled just fine inside of SQL Server. However, certain texts do not compare correctly. I would expect nvarchar fields to compare themselves in unicode, however, if I do a "SELECT DISTINCT" on a set of Amharic words, they all collapse into a single word! Similarly, if I do a join on these words, SQL Server simply chooses the first Amharic word it finds, even if it isn't a match! Does anyone have any idea why SQL Server thinks these values are equal?

If I convert the values to binary, they do indead show up as different.

0x00132D121812951200000000000000000000000000
0x05129512F512000000000000000000000000000000
0x091303132B1272129B120000000000000000000000
0x0A13EE122D120A1330120000000000000000000000
0x0D121D126112251320000D12951261122513000000
0x0D132A12AD129B1200000000000000000000000000
0x101219123512000000000000000000000000000000
0x1313F0129B12000000000000000000000000000000
0x15129512F5129B1200000000000000000000000000
0x1512ED12CB12951200000000000000000000000000
0x18120D13DB12751200000000000000000000000000
0x19122D122513000000000000000000000000000000
0x191234120000000000000000000000000000000000
0x1912DA124312000000000000000000000000000000
0x1B120B12ED129B1200000000000000000000000000
0x1B12AD1230129E1200000000000000000000000000

It will "work" with a binary collation, but I want resutls to sort lexicographically. I was using SQL_Latin1_General_CP_CS_AS.
   

- Advertisement -