Viki

Viki 写东西的地方

努力上进且优秀
x
github
email
bilibili

A discussion on character entities in HTML (such as ` `, `<`, etc.)

When learning the basics of writing simple web pages with HTML and CSS, we may use   to achieve the effect of multiple spaces because multiple spaces in the content of tags will be automatically merged into one space.

In addition, when trying to achieve certain layouts and not being familiar with CSS properties like margin and padding, we may accidentally discover this thing called  .

Indeed, it can solve the problem of multiple spaces being automatically merged and can also be used to adjust page layouts, but it is best to use CSS to handle layout issues and not rely on  .

So, the question is, what are the   and <, >, etc. that we mentioned earlier?

Named Character References#

In SGML, HTML, and XML documents, many characters are reserved characters, such as < and >, which have special meanings in markup languages. If certain Unicode characters cannot be directly represented in the current encoding of the document (such as ISO-8859-1), or if they cannot be displayed as the original characters due to the use of a subset of HTML syntax symbols, these characters that cannot be directly encoded can be represented by either character value references or character entity references. The &nbsp; mentioned earlier belongs to the category of character entity references.

Explanation of entity names in the HTML standard: Named character references. The standard also provides a stable JSON format data for developers to download and use.

The same symbol can be referenced using either a character entity or a character value. For example, the space character mentioned earlier can be represented using the corresponding character value reference &#32;. The advantage of using entity names is that they are easy to remember, but it cannot be guaranteed that all browsers can recognize them smoothly. On the other hand, character values do not have this concern, but they are not convenient to remember.

HTML 4 DTD defines 252 named entities. If you pay attention, you may notice that these character values actually correspond to the ASCII codes of the symbols, which can be represented in decimal or hexadecimal.

To view all character entities, you can also visit the List of XML and HTML character entity references - Wikipedia.

How to Convert#

If you want to know how a symbol is represented as a character entity or character value, you can use this website for quick conversion. It supports setting the conversion result as either character entities or character values.

How to Handle in Actual Development#

You can use the community's html-entities npm module for handling.

Update on August 29, 2023: In addition to html-entities, you can also use entities and he for handling.

Here is a simple example of using ESM:

import { encode, decodeEntity } from 'html-entities'

encode('< > " \' & © ∆')
// -> '&lt; &gt; &quot; &apos; &amp; © ∆'

encode('< ©', { mode: 'nonAsciiPrintable' })
// -> '&lt; &copy;'

encode('< ©', { mode: 'nonAsciiPrintable', level: 'xml' })
// -> '&lt; &#169;'

decodeEntity('&lt;')
// -> '<'

decodeEntity('&copy;', { level: 'html5' })
// -> '©'

decodeEntity('&copy;', { level: 'xml' })
// -> '&copy;'

I hope this article has been helpful or inspiring to you.

References#

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.