Avoid using "<![CDATA[ ... ]]>" in RSS

Published on
Updated on

<![CDATA[ ... ]]> is very commonly used in RSS (also Atom) feeds to escape XML special characters. At first glance, it looks very convenient, you simply add <![CDATA[ ... ]]> blocks and write any (almost) content inside of them without worrying about escaping characters:

		<item>
	<title><![CDATA[Using <CDATA> in Titles]]></title>
	<link>http://example.com</link>
	<description>
		<![CDATA[
			<p>This description contains <strong>HTML markup</strong>.</p>
			<p>It allows us to use characters like "<b>&</b>" and brackets directly.</p>
		]]>
	</description>
</item>

Why not CDATA?

CDATA seems to be perfect, isn't it? Except it's not possible to escape some CDATA special character sequences inside a single CDATA block, particularly ]]> (the one that ends the CDATA block). In order to do that, you have to split the CDATA block into multiple parts:

<text>
	<![CDATA[hello ]]]]><![CDATA[> world]]>
</text>

The encoded text is "hello ]]> world". As you can see, the XML code is less readable now. CDATA loses most of its simplicity advantage.

Even though splitting makes the encoding of ]]> possible, I would say it's still not worth using CDATA:

What to do instead?

Just escape these characters (works for HTML too):

function xmlEscape(text) {
	return text
		.replaceAll("&", "&amp;")
		.replaceAll("<", "&lt;")
		.replaceAll(">", "&gt;")
		.replaceAll('"', "&quot;")
		.replaceAll("'", "&#39;");
}

Normal escaping is simpler and more uniform.

OK, but some people might say that CDATA might make the RSS content smaller on average since characters don't need any escape (which requires more characters in encoded form) and ]]> is encountered rarely. Fair point, however:

Conclusion

Here I listed the reasons why you should avoid using CDATA. This is especially true if you are going to implement your custom RSS / Atom feed generator. Many libraries / frameworks / CMSs still generate CDATA for RSS / Atom feeds and many of them handle the mentioned character sequence ]]> in their own ways. And they are perfectly fine to use if you have to rely on them. CDATA is common because it is convenient for legacy feed generators and visually cleaner for embedded HTML. But for new code, ordinary XML escaping is usually cleaner and more uniform.

See you later.






Disqus uses cookies, please check Privacy & cookies before loading the comments.
Please enable JavaScript to view the comments powered by Disqus.


UP
This site uses cookies for some services. By clicking Accept, you agree to their use. To find out more, including how to control cookies, see here: Privacy & cookies.