Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 Development Tools
 ASP.NET
 Inserting HTML into SQL, getting gibberish?

Author  Topic 

JeTmAn81
Starting Member

11 Posts

Posted - 2008-09-25 : 16:56:02
I originally posted this in the SQL forums, but it looks like this is more of a .NET thing:

I've got a program where snippets of HTML code are inserted into a SQL 2000 database and then pulled out to be put together into an HTML file. The assembled HTML pieces are actually being saved as a .doc so the user can open it in Microsoft Word and essentially get a Word document that is assembled on the fly according to various sections which are specified as needed for the document.

Anyway, I've got a problem where most of the document looks fine, but wherever there should be an apostrophe, in the finished document it's replaced by three ASCII characters, the last of which is the trademark symbol. So for some reason it just doesn't translate this correctly. The HTML is getting saved in the database in an ntext field, though it's being converted to a varchar before getting saved there. It seems to me like this is some kind of an issue involving encoding, possibly ASCII vs. Unicode, but I don't know how to resolve it.

When the HTML gets inserted into the database, it's done through an ASP.NET webpage which takes input from a textbox. When I copy and paste into the textbox from notepad, I notice that apostrophes show up but do not appear to be the exact same character as if I had just typed the apostrophe directly into the textbox. When I replace an apostrophe that I've pasted in by just typing the character, then it seems to display correctly after being pulled from the database.

I tested it by pulling the data in Query Analyzer, and it doesn't seem to have any of the gibberish present that I see when I put the document back together. I also tried using URLEncode on the data before inserting it, and then using URLDecode when displaying it, but that didn't get rid of the garbage characters either. Has anybody encountered this or have any ideas concerning this problem?

afrika
Master Smack Fu Yak Hacker

2706 Posts

Posted - 2008-09-25 : 17:07:49
Can we see some sample data ?

And how are you saving it in .doc ?
Go to Top of Page

JeTmAn81
Starting Member

11 Posts

Posted - 2008-09-25 : 17:22:41
What I'm actually doing is taking the original Word file that was given to me for a proposal document, saving that as HTML, then taking the different sections of the document (introduction, conclusion, etc.) and inserting them into the database so they can be pulled out again in whatever order I want (sections can be left out, etc.). I'm then putting all the HTML back together into one text file which I save as a .doc, so Word will automatically open it. Since Word is generating the original HTML there's a lot of custom style stuff that it includes as well but I've kept that in the database and it seems to work fine. Actually this whole process works really well except for the occasional gibberish.

Here's the section I've been testing with to see if I can eliminate the gibberish (names have been anonymized):

<h1>Personalized Text (optional)</h1>

<h3>Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Donec non nisl.
Nulla quis lacus. Sed quis ligula in neque dapibus commodo. Integer pede diam,
blandit et, viverra in, mollis non, dolor. Aliquam faucibus est. Nam tellus
sapien, iaculis eu, faucibus ac, fringilla non, velit. Etiam neque quam,
vulputate sit amet, dictum nec, egestas nec, est. <span style='mso-bidi-font-size:
11.0pt'><o:p></o:p></span></h3>

<h2><b style='mso-bidi-font-weight:normal'><o:p> </o:p></b></h2>

<h2>Introduction <br style='mso-special-character:line-break'>
<![if !supportLineBreakNewLine]><br style='mso-special-character:line-break'>
<![endif]></h2>

<h3>Student Student, ’08, and Student Student, ’07 are enrolled in Harvard University’s
Ph.D. program in biophysics. Both were awarded competitive full-ride
fellowships covering tuition and a living stipend from Harvard after turning
down matching offers from Yale. Student and Student are the latest in a long and
growing line of scientists, doctors, teachers, engineers and researchers who
are using their College science and math degrees to advance frontiers of
science and industry and to tackle some of the world’s most challenging
problems. <span style='mso-spacerun:yes'> </span>Equipped to embrace curiosity
and conviction as complementary rather than competing values, College alumni
are needed more than ever to address today’s most pressing ethical issues where
science, faith and public policy intersect.</h3>

<h3><o:p> </o:p></h3>

<h3>College is and always has been committed to the sciences as essential to an
excellent liberal arts education. The university renews that commitment in its
strategic vision for the future, calling for a significant investment in new
science facilities, state-of-the-art laboratory equipment and research
endowment to solidify College’s position as one of the best Christian
liberal-arts universities in the country.</h3>

<h3><!--[if gte vml 1]><v:shape id="photoRepeater__ctl0_imgJPG" o:spid="_x0000_s1030"
type="#_x0000_t75" alt="" style='position:absolute;margin-left:0;margin-top:11.35pt;
width:210.25pt;height:139.75pt;z-index:2;mso-position-horizontal:left'
o:allowoverlap="f" stroked="t" strokecolor="#272727">
<v:imagedata src="Science%20proposal_All_Variables_files/image005.jpg" o:href="http://web2/PhotoArchive/~Images/Spring2006/1463.jpg"/>
<w:wrap type="square"/>
</v:shape><![endif]--><![if !vml]><img width=282 height=188
src="Science%20proposal_All_Variables_files/image006.jpg" align=left hspace=12
v:shapes="photoRepeater__ctl0_imgJPG"><![endif]></h3>

<h3>President President, now in his 15<sup>th</sup> year as president, understands
and articulates College’s distinctive mission in a powerful way that has
inspired faculty, staff, alumni, donors and friends to lift the university to
new heights. Over the past decade, College has seen a 480 percent increase in
freshman applications; record levels of enrollment and retention; and
significant expansion of the endowment and capital facilities. <i
style='mso-bidi-font-style:normal'>U.S. News</i> consistently ranks College
one of the 10 best regional universities and values in the western United
States.</h3>

<h3><o:p> </o:p></h3>

<h3>With a rising academic profile, however, comes increased competition for
top students, even as the number of college-aged students is declining
nationally. For a tuition-driven institution like College, high-quality
academic facilities are vitally important to maintaining strong enrollment
demand and continuing the university’s positive trajectory. Nowhere is the need
for improved facilities greater than in the sciences. And nowhere is the
response more important for College, the region and beyond.</h3>

<h2><o:p> </o:p></h2>
Go to Top of Page

afrika
Master Smack Fu Yak Hacker

2706 Posts

Posted - 2008-09-25 : 17:30:30
What gibberish ?

The mark up tags ?
Go to Top of Page

afrika
Master Smack Fu Yak Hacker

2706 Posts

Posted - 2008-09-25 : 18:07:45
Well from my opinion. If the gibberish you are talking about are the <h2> stuff, those are basic markup tags for html formatting, Which will be removed by the browser when presented to the client and you shouldnt worry about that.

Unless am missing the point
Go to Top of Page

JeTmAn81
Starting Member

11 Posts

Posted - 2008-09-25 : 18:35:29
quote:
Originally posted by afrika

Well from my opinion. If the gibberish you are talking about are the <h2> stuff, those are basic markup tags for html formatting, Which will be removed by the browser when presented to the client and you shouldnt worry about that.

Unless am missing the point



No, it's not the markup tags, I know those are needed to preserve the formatting of the page. I will post a screenshot of what the reassembled document looks like to give you a better idea:




I know the resolution isn't large, but if you look closely you can see some ASCII characters appearing in the place of apostrophes.
Go to Top of Page

afrika
Master Smack Fu Yak Hacker

2706 Posts

Posted - 2008-09-25 : 18:54:17
Ok could barely see them.

Beats me.

Did you try removing them in the presentation layer ? Using replace

I havent saved data as html files, we use notepad and make a refrence to them or simply use xml data
Go to Top of Page

JeTmAn81
Starting Member

11 Posts

Posted - 2008-09-25 : 19:00:36
quote:
Originally posted by afrika

Ok could barely see them.

Beats me.

Did you try removing them in the presentation layer ? Using replace

I havent saved data as html files, we use notepad and make a refrence to them or simply use xml data



I will try running some replaces which should fix it for now, I just wish I knew where it was coming from in the first place, because it seems like the type of problem that shouldn't be that hard to solve. Thanks for taking a look!
Go to Top of Page
   

- Advertisement -