Viki

Viki 写东西的地方

努力上进且优秀
x
github
email
bilibili

An Introduction to Character Entities in HTML (such as ` `, `<`, etc.)

When learning to write simple web pages with HTML and CSS, we may use   to achieve the effect of multiple spaces because multiple spaces in the content of tags will be automatically merged into one space.

In addition, when you are not familiar with CSS properties such as margin and padding but need to achieve certain layouts, you may accidentally discover this thing called  .

Indeed, it can solve the problem of multiple spaces being automatically merged and can also be used to adjust page layout, but it is best to use CSS to handle layout issues instead of relying on  .

So, the question is, what are the   mentioned earlier and the <, >, etc. that we often see?

Named Character References#

In SGML, HTML, and XML documents, many characters are reserved characters, such as < and >, which have special meanings in markup languages. If certain Unicode characters cannot be directly represented in the current encoding of the document (such as ISO-8859-1), or if they cannot be displayed as the original characters due to the use of a subset of HTML syntax symbols that cause parsing into syntax, these characters that cannot be directly encoded can be represented by two types of escape sequences: character value references or character entity references. The &nbsp; mentioned earlier belongs to the character entity reference.

Explanation of entity names in the HTML standard: Named character references. The standard also provides a stable JSON format data for developers to download and use.

The same symbol can be referenced in two ways: character entity and character value. For example, the space character mentioned earlier can be represented using the corresponding character value reference &#32;. The advantage of entity names is that they are easy to remember, but it cannot be guaranteed that all browsers can recognize them smoothly. On the other hand, character values do not have this concern, but they are not convenient to remember.

HTML 4 DTD defines 252 named entities. If you are careful, you may notice that the character value here actually corresponds to the ASCII code of the symbol, which can be represented in decimal or hexadecimal.

To view all character entities, you can also go to List of XML and HTML character entity references - Wikipedia

How to Convert#

If you want to know how a symbol is represented as a character entity or character value, you can use this website for quick conversion. It supports setting the conversion result as a character entity or character value.

How to Handle in Actual Development#

You can use the community's html-entities npm module for handling.

Updated on August 29, 2023: In addition to html-entities, you can also use entities and he for handling.

Here is a simple example of using ESM:

import { encode, decodeEntity } from 'html-entities'

encode('< > " \' & © ∆')
// -> '&lt; &gt; &quot; &apos; &amp; © ∆'

encode('< ©', { mode: 'nonAsciiPrintable' })
// -> '&lt; &copy;'

encode('< ©', { mode: 'nonAsciiPrintable', level: 'xml' })
// -> '&lt; &#169;'

decodeEntity('&lt;')
// -> '<'

decodeEntity('&copy;', { level: 'html5' })
// -> '©'

decodeEntity('&copy;', { level: 'xml' })
// -> '&copy;'

I hope this article is helpful or inspiring to you.

References#

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.