Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 Script Library
 Function: HTMLDecode

Author  Topic 

aiken
Aged Yak Warrior

525 Posts

Posted - 2002-10-15 : 15:54:50
This will take character data that's been HTML encoded and put it back to normal text. At present, it gets any and all encoding in the form &#abc; as well as some common HTML-named characters. It would be easy to expand to get any additional HTML names. It also replaces <P> with CRLF, which I needed but which would be easy to remove.

Cheers
-b

CREATE FUNCTION dbo.f_HTMLDecode (@vcWhat varchar(8000))
RETURNS varchar(8000) AS
BEGIN
DECLARE @vcResult varchar(8000)
DECLARE @vcCrLf varchar(2)
DECLARE @siPos smallint,@vcEncoded varchar(7),@siChar smallint

set @vcCrLF=char(13) + char(10)

select @vcResult=@vcWhat
select @siPos=PatIndex('%&#___;%',@vcResult)
WHILE @siPos>0
BEGIN
select @vcEncoded=substring(@vcResult,@siPos,6)
select @siChar=cast(substring(@vcEncoded,3,3) as smallint)
select @vcResult=replace(@vcResult,@vcEncoded,nchar(@siChar))
select @siPos=PatIndex('%&#___;%',@vcResult)
END

select @siPos=PatIndex('%&#____;%',@vcResult)
WHILE @siPos>0
BEGIN
select @vcEncoded=substring(@vcResult,@siPos,7)
select @siChar=cast(substring(@vcEncoded,3,4) as smallint)
select @vcResult=replace(@vcResult,@vcEncoded,nchar(@siChar))
select @siPos=PatIndex('%&#____;%',@vcResult)
END

select @vcResult=replace(@vcResult,'"','"')
select @vcResult=replace(@vcResult,'&','&')
select @vcResult=replace(@vcResult,'©','©')
select @vcResult=replace(@vcResult,'«','«')
select @vcResult=replace(@vcResult,'»','»')
select @vcResult=replace(@vcResult,'¼','¼')
select @vcResult=replace(@vcResult,'½','½')
select @vcResult=replace(@vcResult,'¿','¿')

select @vcResult=replace(@vcResult,'<P>',@vcCrLf)

return @vcResult
END


Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2002-10-15 : 16:33:25
quote:

select @vcResult=replace(@vcResult,'"','"')
select @vcResult=replace(@vcResult,'&','&')
select @vcResult=replace(@vcResult,'©','©')
select @vcResult=replace(@vcResult,'«','«')
select @vcResult=replace(@vcResult,'»','»')
select @vcResult=replace(@vcResult,'¼','¼')
select @vcResult=replace(@vcResult,'½','½')
select @vcResult=replace(@vcResult,'¿','¿')


I think that was supposed to say

select @vcResult=replace(@vcResult,'&quot;','"')
select @vcResult=replace(@vcResult,'&amp;','&')
select @vcResult=replace(@vcResult,'&copy;','©')
select @vcResult=replace(@vcResult,'&laquo;','«')
select @vcResult=replace(@vcResult,'&raquo;','»')
select @vcResult=replace(@vcResult,'&frac14;','¼')
select @vcResult=replace(@vcResult,'&frac12;','½')
select @vcResult=replace(@vcResult,'&iquest;','¿')


Go to Top of Page

SamC
White Water Yakist

3467 Posts

Posted - 2002-10-15 : 17:43:34
I'd like to know what kind of problem this is solving?

My guess: Incoming HTML email is stored as plain text in DB?

What else?

Sam

Go to Top of Page

aiken
Aged Yak Warrior

525 Posts

Posted - 2002-10-29 : 14:21:39
Thanks, Arnold. Yes, that was what it was supposed to say.

The problem it's for is indeed to correct HTML that makes its way into the database. This can happen for a variety of reasons, but the most common one I see seems to be related non-english folks; probably do to different encoding or something, their entries in text fields often end up at least partially HTML-encoded (single quotes become &#XXXX; etc.

I've also used this to update an app that used to convert input to HTML before storing it in the db, which is an incredibly bad idea (what if you want to use that data in a non-browser environment?). Change the app to store raw data and HTML-ify it on presentation, use this function to update all of the old data, and voila.

Cheers
-b

Go to Top of Page

SamC
White Water Yakist

3467 Posts

Posted - 2002-11-18 : 20:41:18
Hi Aiken,

I gave this function a try with

declare @test varchar (1000)
set @test= '<p>Testing > " .</p>'

Print @test
Print dbo.f_HTMLDecode (@test)

The function didn't convert anything at all. Did I do something wrong or is there a bug?

Sam

Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-09-18 : 08:33:03
Also see http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=88926



E 12°55'05.25"
N 56°04'39.16"
Go to Top of Page

danp129
Starting Member

1 Post

Posted - 2008-11-21 : 17:57:09
Had to change nchar(@siChar) to char(@siChar) for #153; (™) and othersto work, thanks for sharing this though, it was a time saver.

You can run this to compare the difference between nchar() and char():

print dbo.f_HTMLDecode('
128 € = #128;
129 � = #129;
130 ‚ = #130;
131 ƒ = #131;
132 „ = #132;
133 … = #133;
134 † = #134;
135 ‡ = #135;
136 ˆ = #136;
137 ‰ = #137;
138 Š = #138;
139 ‹ = #139;
140 Œ = #140;
141 � = #141;
142 Ž = #142;
143 � = #143;
144 � = #144;
145 ‘ = #145;
146 ’ = #146;
147 “ = #147;
148 ” = #148;
149 • = #149;
150 – = #150;
151 — = #151;
152 ˜ = #152;
153 ™ = #153;
154 š = #154;
155 › = #155;
156 œ = #156;
157 � = #157;
158 ž = #158;
159 Ÿ = #159;
160   = #160;
161 ¡ = #161;
162 ¢ = #162;
163 £ = #163;
164 ¤ = #164;
165 ¥ = #165;
166 ¦ = #166;
167 § = #167;
168 ¨ = #168;
169 © = #169;
170 ª = #170;
171 « = #171;
172 ¬ = #172;
173 ­ = #173;
174 ® = #174;
175 ¯ = #175;
176 ° = #176;
177 ± = #177;
178 ² = #178;
179 ³ = #179;
180 ´ = #180;
181 µ = #181;
182 ¶ = #182;
183 · = #183;
184 ¸ = #184;
185 ¹ = #185;
186 º = #186;
187 » = #187;
188 ¼ = #188;
189 ½ = #189;
190 ¾ = #190;
191 ¿ = #191;
192 À = #192;
193 Á = #193;
194 Â = #194;
195 Ã = #195;
196 Ä = #196;
197 Å = #197;
198 Æ = #198;
199 Ç = #199;
200 È = #200;
201 É = #201;
202 Ê = #202;
203 Ë = #203;
204 Ì = #204;
205 Í = #205;
206 Î = #206;
207 Ï = #207;
208 Ð = #208;
209 Ñ = #209;
210 Ò = #210;
211 Ó = #211;
212 Ô = #212;
213 Õ = #213;
214 Ö = #214;
215 × = #215;
216 Ø = #216;
217 Ù = #217;
218 Ú = #218;
219 Û = #219;
220 Ü = #220;
221 Ý = #221;
222 Þ = #222;
223 ß = #223;
224 à = #224;
225 á = #225;
226 â = #226;
227 ã = #227;
228 ä = #228;
229 å = #229;
230 æ = #230;
231 ç = #231;
232 è = #232;
233 é = #233;
234 ê = #234;
235 ë = #235;
236 ì = #236;
237 í = #237;
238 î = #238;
239 ï = #239;
240 ð = #240;
241 ñ = #241;
242 ò = #242;
243 ó = #243;
244 ô = #244;
245 õ = #245;
246 ö = #246;
247 ÷ = #247;
248 ø = #248;
249 ù = #249;
250 ú = #250;
251 û = #251;
252 ü = #252;
253 ý = #253;
254 þ = #254;
255 ÿ = #255;
')
Go to Top of Page
   

- Advertisement -