Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
SQLNOVICE999
Yak Posting Veteran
62 Posts |
Posted - 2011-10-28 : 16:20:28
|
| Guys,Guys I have a table with a column that has html text. The column with html text is pretty big datatye varchar(max)... I wanted to check if any of you have any function that I can use to Strip out the HTML tags... I saw couple of version online, but it was running too slow..This is the one I used:http://cosier.wordpress.com/2008/10/22/tsql-strip-html-function/Any suggestion is helpful.Thanks,Laura |
|
|
slimt_slimt
Aged Yak Warrior
746 Posts |
Posted - 2011-10-29 : 01:35:00
|
| hi,if you are doing this only once, then it should not be a problem to wait. Functions as this one is presumably slow, but still you should keep in mind that this is no easy job. since HTML is a definite language, all the tags are well known. you can create a library and store all the tags and use replace in standard T-SQL language instead of going through each word.You might as well use CRL if you are going to use this more frequently. best |
 |
|
|
Sachin.Nand
2937 Posts |
Posted - 2011-10-29 : 03:23:11
|
| Cant you just do it with any of the application programs which are more flexible and have a very rich set of functions to do this kind of stuff.TSQL is not optimized for something like this.PBUH |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2011-10-29 : 03:34:01
|
| In my experience you need something that will parse the HTML. Otherwise too great a risk that you remove something like this:100 is < 200, and 300 is > 200which should have escaped < and >, but will probably display just fine in browsers, and thus may well exist in the code. There is also the issue of what you do with broken code, such as:<SomeTag xxxx </SomeTag>a reg-ex type query will be liable to remove everything between the "<" and the ">" |
 |
|
|
Sachin.Nand
2937 Posts |
Posted - 2011-10-29 : 03:45:58
|
quote: Originally posted by Kristen In my experience you need something that will parse the HTML. Otherwise too great a risk that you remove something like this:100 is < 200, and 300 is > 200which should have escaped < and >, but will probably display just fine in browsers, and thus may well exist in the code. There is also the issue of what you do with broken code, such as:<SomeTag xxxx </SomeTag>a reg-ex type query will be liable to remove everything between the "<" and the ">"
That's why I recommended NOT to use TSQL cause there are always lots of ifs and buts in this kind of stuff..One example would be java script or a CSS tag embedded in middle of an HTML tag not to mention the junk tags AJAX(if used) adds to HTML which of course you wouldn't want to read as a part of your HTML data.PBUH |
 |
|
|
|
|
|
|
|