How to sanitize HTML text using only vanilla DOM API

Published on 07/05/2025

What is sanitization and why is it important?

Many websites, especially content management systems, heavily rely on dynamic text content, such as saved rich texts, comments, posts, etc. Oftentimes such content is saved in a form of a raw HTML which can be directly embedded in the web page html. So, this creates potential security issues if the website doesn't do any validations on the HTML text. Such security issues are oftentimes exploited in Cross Site Scripting (XSS) attacks. Some common security problems are:

Inline scripts, which can contain potentially malicious code. For example: <script> alert("Potentially malicious code"); </script>
Inline styles, which don't execute code by themselves but can still modify or break the website UI. For example: <style> body { background: red; } </style>
Dangerous HTML tags and attributes. This is also related to the previous 2 points to some extent. Some HTML elements, especially when used with certain attributes can also have dangerous behaviors. For example: <a href="javascript:alert('Potentially malicious link');">Potentially malicious link</a> <button onclick="alert('Potentially malicious button');">Potentially malicious button</button> <img src="some_invalid_url" onerror="javascript:alert('Potentially malicious error callback');">

Sanitization is the removal (or the replacement to safe HTML text) of such dangerous HTML fragments. Now it's understandable why HTML sanitization is very important, especially for websites with dynamic content.

Implementing a basic HTML sanitizer

Here I'll implement a basic HTML sanitizer using JS generator and DOMParser which does the following:

Supports the following elements:
- <a>. The href attribute must be a valid URL that starts with http:// or https://, otherwise the link is not valid and the element is considered invalid. The element also supports target attribute. If the target attribute is specified as _blank or blank, the target in the sanitized HTML will be _blank, otherwise the attribute will not be added.
- <img>. The src attribute must be a valid URL that starts with http:// or https://, otherwise the image is not valid and the element is considered invalid.
- <font>. It supports color and size attributes. No validation is done for attributes, since they are harmless even if they are invalid.
- <br>.
- <b>.
- <strong>.
- <i>.
- <em>.
- <del>.
- <s>.
- <u>.
- <p>.
- <hr>.
- <li>.
- <ul>.
- <ol>.
- Text nodes.
If the element is unsupported or invalid the nested elements and text nodes will be added recursively.
It will only copy the supported tags and attributes, the unsupported attributes will be always ignored.
There will be no unclosed tags.

The code

The parsing and the construction of the elements is done via DOMParser, because the construction of the elements via the methods of the active document (window.document) and innerHTML can still send http requests and execute some callbacks, such as onload or onerror.

function sanitizeHtml(html) {
	// construct an inactive document by 
	// parsing the html with DOMParser 
	// in order to prevent any possible 
	// code execution and http requests
	const inactiveDocument = new DOMParser()
		.parseFromString(html, 'text/html');

	const inputElement = inactiveDocument.documentElement;

	// construct the output element via the 
	// inactive document in in order to prevent
	// any possible code execution and http requests
	const outputElement = inactiveDocument.createElement('div');

	function* sanitizeRecursively(root) {
		for (const child of root.childNodes) {
			if (child instanceof HTMLElement) {
				try {
					// check if the element is in the list 
					// of the supported types
					if (![
						'A', 'IMG', 'FONT', 'BR', 'B',
						'STRONG', 'I', 'EM', 'DEL', 'S', 'U',
						'P', 'HR', 'LI', 'UL', 'OL'
					].includes(child.tagName)) {
						throw new Error(`${child.tagName} is not supported`);
					}

					// construct the new child via the inactive document in 
					// in order to prevent any possible 
					// code execution and http requests
					const newChild = inactiveDocument
						.createElement(child.tagName);

					// handling the <a> tag
					if (
						newChild instanceof HTMLAnchorElement &&
						child instanceof HTMLAnchorElement
					) {
						const url = new URL(child.href);

						// validate URL
						if (url.protocol !== 'https:' && url.protocol !== 'http:') {
							throw new Error(
								`href ${url.protocol} is not supported`
							);
						}

						newChild.href = url.href;

						// set target _blank if valid
						if (child.target === 'blank' || child.target === '_blank') {
							newChild.target = '_blank';
						}
					}
					// handling the <img> tag
					else if (
						newChild instanceof HTMLImageElement &&
						child instanceof HTMLImageElement
					) {
						const url = new URL(child.src);

						// validate URL
						if (url.protocol !== 'https:' && url.protocol !== 'http:') {
							throw new Error(
								`src ${url.protocol} is not supported`
							);
						}

						newChild.src = url.href;
					}
					// handling the <font> tag
					else if (
						newChild instanceof HTMLFontElement &&
						child instanceof HTMLFontElement
					) {
						// set size if present
						if (child.size) {
							newChild.size = child.size;
						}

						// set color if present
						if (child.color) {
							newChild.color = child.color;
						}
					}

					// append children
					newChild.append(...sanitizeRecursively(child));
					yield newChild;
				} catch (e) {
					console.error(e);
					// if some validation error occurred just try 
					// to recursively copy the children
					yield* sanitizeRecursively(child);
				}
			} else if (
				child instanceof Node && 
				child.nodeType === Node.TEXT_NODE
			) {
				// copying text nodes
				yield child.cloneNode(true);
			}
		}
	}

	// filling with the copied children
	outputElement.append(...sanitizeRecursively(inputElement));
	return outputElement.innerHTML;
}

Since NodeJS doesn't have DOM API natively, you can use jsdom package if you want this on NodeJS.

Playground

Here you can see how the sanitizer works. Just type some html code in the input and see the output immediately.

Please enable JavaScript if you want to test the html input.

Input html:

Output html: Output visual look: