The Glyphs selection UI-widget has some entries that are curated via a YAML formated text file lib/assets/glyph-groups.yaml. The data is however loaded by the application from a JSON formatted file lib/assets/glyph-groups.json. The main reason for YAML as human input is that YAML can contain comments, which help to describe the expected data and to explain the actual data. The reason for JSON for reading by the application is that it’s a format that can be read natively in a Browser without an extra dependency.
The data expected in the YAML formatted is also described as a comment directly in the file lib/assets/glyph-groups.yaml
Character groupings can be one or two levels deep (see Latin > Symbols)
Two ways of handling “extended” character sets:
Here’s an excerpt from the file, .
Latin:
Uppercase: ABCDEFGHIJKLMNOPQRSTUVWXYZ&
Lowercase: abcdefghijklmnopqrstuvwxyz
Mixed: DžLjNjlj
ASCII: "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789<([{@#$%&?!/|\\\"~`*^':;.,)]}>"
Latin-1: " !\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\u00A0¡¢£¤¥¦§¨©ª«¬\u00AD®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ"
Symbols:
Punctuation: ',-.:;…∙·!¡?¿–—―‐'
Reference: '*§¶†‡•'
Quotation: "\"'“”‚„‹›«»′″"
Parenthetical: '()[]{}'
Math: '+÷×−±<>≤≥≈≠=^~¬∕/'
Commercial: '®©™#@⁒ʹʺ/\¦|_№⟨⟩µ⁄'
Greek:
Uppercase: ΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΩ
Lowercase: αβγδεζηθικλμνξοπρστυφχψως
Cyrillic:
Uppercase: АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ
Lowercase: абвгдежзийклмнопрстуфхцчшщъыьэюя
# [...]
_extended:
# YAML evaluates Y/N/y/n as true and false, so we have to put quotes around those
A: ÀÁÂÃÄÅĀĂǺȀȂĄẠẢẤẦẨẪẬẮẰẲẴẶÆǼ
C: ÇĆĈĊČ
D: ĎÐĐDŽDž
E: ÈÉÊËĒĔĖĘĚȄȆẸẺẼẾỀỂỄỆÆǼŒƏ
G: ĜĞĠĢǦ
H: ĤĦ
# [...]
Latin-1
is also known as ISO/IEC 8859-1
and an explanation can be found on Wikipedia,
where we can also find a table with all the contents:
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
0x | ||||||||||||||||
1x | ||||||||||||||||
2x | SP | ! | " | # | $ | % | & | ' | ( | ) | * | + | , | - | . | / |
3x | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | : | ; | < | = | > | ? |
4x | @ | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O |
5x | P | Q | R | S | T | U | V | W | X | Y | Z | [ | \ | ] | ^ | _ |
6x | ` | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o |
7x | p | q | r | s | t | u | v | w | x | y | z | { | | | } | ~ | |
8x | ||||||||||||||||
9x | ||||||||||||||||
Ax | NBSP | ¡ | ¢ | £ | ¤ | ¥ | ¦ | § | ¨ | © | ª | « | ¬ | SHY | ® | ¯ |
Bx | ° | ± | ² | ³ | ´ | µ | ¶ | · | ¸ | ¹ | º | » | ¼ | ½ | ¾ | ¿ |
Cx | À | Á | Â | Ã | Ä | Å | Æ | Ç | È | É | Ê | Ë | Ì | Í | Î | Ï |
Dx | Ð | Ñ | Ò | Ó | Ô | Õ | Ö | × | Ø | Ù | Ú | Û | Ü | Ý | Þ | ß |
Ex | à | á | â | ã | ä | å | æ | ç | è | é | ê | ë | ì | í | î | ï |
Fx | ð | ñ | ò | ó | ô | õ | ö | ÷ | ø | ù | ú | û | ü | ý | þ | ÿ |
Undefined
Symbols and punctuation
Undefined in the first release of ECMA-94 (1985).[14] In the original draft Œ was at 0xD7 and œ was at 0xF7. |
SP
is the (Space character)[https://en.wikipedia.org/wiki/Space_character],
we enter it as the literal space character:
.NBSP
is the (Non-breaking space)[https://en.wikipedia.org/wiki/NBSP],
it is entered by its Unicode escape sequence \u00A0
.SHY
is the (Soft hyphen)[https://en.wikipedia.org/wiki/Soft_hyphen],
it is entered by its Unicode escape sequence \u00AD
."
it must be escaped as \"
\
must be escaped as well \\
All together, we arrive at the data string, which is put into quotes:
" !\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\u00A0¡¢£¤¥¦§¨©ª«¬\u00AD®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ"
Within the YAML, the category “Latin” is the right superior category to “Latin-1” and as a computer encoding it makes sense to put it next to “ASCII”:
Latin:
# [...]
ASCII: # [...]
Latin-1: # Add data here
If you don’t have the yq
command on your system we can help with
generating the yaml in your pull request.
$ yq glyph-groups.yaml -o json > glyph-groups.json