unicode-character-map
7 posts
Don't wanna be here? Send us removal request.
Photo
The Chinese New Year is here, and a very happy ęååč“¢ and so on to all my readers who celebrate this occasion.
My New Year plans have sadly been different from the usual due to the coronavirus. Here in Singapore, restrictions still apply on gatherings, which means Chinese New Year this time round is going to be a little quieter for my family.
Todayās character is U+1F260 š , a stylized version of U+798F ē¦, pronounced fĆŗ and meaning āfortuneā. The character is often found on ornaments and positioned upside down as a pun between the phrases ē¦å, meaningĀ āupside-down fuā, and ē¦å°, meaning āfortune has arrivedā. An upside-down ē¦ ornament graces our front door almost every Chinese New Year. When we lived in Shanghai, my family would often buy ornaments from the local supermarket to bring back to Singapore. We would distribute them to our family members - the grandparents were the first to get them, followed by my uncles and aunts - for them to place around their houses.
We can only hope fortune will bless us with a chance to see our relatives again before the next year.
1 note
Ā·
View note
Photo
Todayās post is about how the Unicode Consortium handled the Japanese imperial transition of 2019.
The story begins in 2016, when Emperor Akihito gave a televised address to the Japanese public in which he emphasized his declining health, suggesting that he intended to abdicate. In December of 2017, the Prime Minister of Japan announced the date on which the imperial transition would happen. April 30th, 2019 would be the last day of the Heisei era, and Emperor Naruhitoās reign would start on the following day.
With a new emperor comes a new era name, and the name of the new era, which was then unknown to the public, needed a single-character representation in Unicode. The Unicode Consortium thus reserved the code point U+32FF for that character.
The reasoning for reserving the code point and not assigning it immediately was that Unicode had, and still has, very strict policies regarding changing character properties. The character needed a decomposition mapping, to map the two characters to its constituent kanji, which could not be changed after it was assigned. Thus, the Unicode Consortium prepared by indicating that systems could use the code point U+32FF to represent the new era, without actually assigning a glyph, name, or decomposition until the era name was released.
On the first of April, the new era name was presented to the public: Reiwa (令å). It may have taken Japanese band Golden Bomber under a day to release a song that mentioned the new era by name, but it took the Unicode Consortium until the seventh of May, seven days into the era itself, to finish all the checks and release Unicode 12.1, assigning U+32FF the name SQUARE ERA NAME REIWA and the decomposition mapping <square> 4EE4 548C.
0 notes
Photo
U+216A9 CJK UNIFIED IDEOGRAPH-216A9
This Chinese character was used as a simplification of the character č¦ (pronouncedĀ yaò and roughly meaning āneedā) in Singapore between the promulgation of the official ā502 Table of Simplified Charactersā in 1969 and the schemeās abolition in favor of mainland Chinaās simplified characters in 1976. The table consisted of 502 simplified characters, of which 11 were simplified in Singapore but not in China, 38 were simplified differently from the mainland, and 29 retained traditional radicals but were otherwise identical to the mainland simplifications.
My parents are Singapore citizens who grew up in the 1960s and 70s. Though they no longer recognize the characters, many of their generation learned these characters in school. It wasnāt until 1977 when the Peopleās Republic of China abandoned the novel second-stage simplification scheme that the simplification debacle was resolved once and for all, with Singapore adopting the same simplified characters used today in the mainland.
0 notes
Photo
Hereās an idea that has come up regularly when we think about numbers: Why do we use base ten, when twelve is only slightly larger and evenly divides into two, three and four?
The idea of using twelve of something as a unit has precedent: analog clocks are numbered from one to twelve; imperial units, as cumbersome as they are, have twelve inches to a foot; every Chinese New Year I recall the twelve signs of the Chinese zodiac that sort peopleās births into a cycle with a period of twelve years, and the twelve earthly branches associated with them. The Babylonians used a similar system, using base sixty, or sexagesimal, which also divides evenly into five, though making up sixty different cuneiform signs for digits would be unwieldy and so they resorted to splitting the sexagesimal digits into tens and ones.
Base twelve, to someone operating in base ten, would be called duodecimal, from the Latin duodecim, which in turn can be broken down into duo (two) and decim (ten): 10+2. Of course, people operating on base twelve found this unacceptable, and named it dozenal - a dozen, of course, meaning twelve of something.
In the system of notation used by the Dozenal Society of Great Britain, the upside-down three, the character weāre talking about today, represents el, the digit for eleven, while its comrade ā, the upside-down two encoded at U+218A, represents dek, the digit for ten. However, this is not the only system of notation for dek and el.
The Dozenal Society of America provides a chart of options for dozenal notation. Out of those using characters already in common use, interesting ones include using Cyrillic Š® yu and Š i for ten and eleven, owing to their similarities to the decimal 10 and 11 in shape, and using * for ten and # for eleven due to their presence on the twelve-key telephone keypad. However, for quite some time the Dozenal Society of America used the Greek letter chi raised to the baseline for dek, and an upside down Ź (ezh) for el.Ā
When dozenal digits were added to the Unicode Standard in version 8.0, only the British digits were encoded; today, the versions used by the Dozenal Society of America remain unrepresented in Unicode.
0 notes
Photo
Hereās an interesting character added in Unicode 12.0. When Japan was under Taiwanese rule, the Japanese administration used a system of modified katakana to teach Japanese-speaking people the Taiwanese Hokkien dialect, a system that was used in textbooks and dictionaries of the time.
The Hokkien dialect is far more complex phonetically than Japanese. Where Japanese has five vowels and one moraic or syllabic nasal, Taiwanese Hokkien has six vowels and two syllabic nasals, /mĢ©/ and /ÅĢ/. In order to accommodate the extra vowel, they reused the now-obsolete vowel ć² (wo) to represent the schwa and overloaded ć (mu) as the syllabic m.
Unlike Mandarin, Hokkien also retained stop sounds in syllable codas, of which there are four: /p/, /t/, /k/, and /Ź/. For the first three, the small kana also found in Ainu are used: ć·ć, ć, and ć°; the last one is represented by writing the katakana vowel at a smaller size.
The use of this character, therefore, is in combinations ending in /ÉŹ/. While I couldnāt find any characters with this pronunciation owing to my lack of knowledge of Taiwanese Hokkien, they must have existed somewhere in order for this character to have been encoded.
Also of note is that the Taiwanese kana orthography required special tone markers, which are not encoded in Unicode 13.0.
0 notes
Photo
This character is one of the most complex characters in Simplified Chinese, with 42 strokes. It is used in the famous name of a noodle dish from Shaanxi, š°»š°»é¢ (biangbiangmian), or biangbiang noodles.
When I was living in China, I visited Shaanxi with my parents on a tour. As we were on our way to a restaurant in a small, cramped tour bus, our tour guide asked a question: what is the most complex Chinese character - that is, the one with the most strokes?
A passenger on the bus piped up, knowing exactly what the tour guide was talking about. After all, biangbiang noodles are one of the āeight famous curiosities of Shaanxiā (é脿å
«å¤§ęŖ), owing exactly to their highly complex name.
Had I known of U+2A6A5 šŖ„ at the time, that character being composed of four é¾ and blowing both the simplified U+30EDD š°» and its traditional form U+30EDEĀ š°» out of the water with 64 strokes, I would have challenged his answer. But I was no more than ten years old and hadnāt started my journey to become a hardcore Unicode nerd at the time, so I sat satisfied that I had learned something new.
We got off the bus a few minutes later, and were treated to a serving ofĀ authentic biangbiang noodles from a local restaurant. My mother, knowing I couldnāt take spicy food, requested the non-spicy version on my behalf, and soon enough a plate with a single, long piece of noodle coiled over itself myriad times was delivered to our table.
The phonetic biƔng is said to derive from the sound of the chef pulling the noodles and slapping them onto the table, while the character seems to be a local invention, perhaps one invented by a noodle shop as a marketing strategy, capitalizing on the fact that no character had been assigned to the dialectal word biƔng.
Though itās been so long that I canāt remember the taste of the noodles, the linguistic curiosity of U+30EDD has lingered in my mind since. It wasnāt until a few years back that I decided to search up the character again, and discovered that there was no Unicode codepoint assigned in Unicode 10.0.
However, that changed with the addition of CJK Unified Ideographs Extension G in Unicode 13.0, which added both the simplified and traditional characters. While most fonts do not support the characters - in fact, as I am typing this, the character š°» appears as a box with the hexadecimal codepoint inside - GlyphWiki has added a Ming style glyph for both, which should be included in their Hanazono Mincho font, and Source Han fonts already support the character.
0 notes
Photo
Hello and welcome to the inaugural post on my blog. Here Iāll share insights about linguistics, thoughts on character encoding and technology in general, and my personal cultural experiences through explaining Unicode codepoints.
Iām sure this character is very familiar to a majority of you, especially since you need to understand English to read this text. After all, the ISO basic Latin alphabet set is a set of what might be the most well-known graphemes in the world.
However, thatās not why I chose the letter U for this opening post. Rather, itās because Unicode codepoints are notated as U+xxxx.
The Unicode Standard version 13.0, the current version of Unicode as of today, defines 143,859 named characters, excluding 2048 noncharacters used for four-byte code units in UTF-16, 65 C0 and C1 control characters left over from ISO 8859-1, 137,468 characters reserved in the three Private Use Areas, and 66 characters designated noncharacters.
The one thing the Unicode Consortium, and the Standard that it publishes, is known for among the common internet community might be the small subset of characters known as emoji. However, this blog will focus far more on letters of various writing systems compared to emoji.
Of course, these posts will focus on codepoints rather than glyphs, so glyphs like flags, which are composed of two Regional Indicator Symbols, and more obscure accented letters, which require combining diacritics, will not receive posts.
I hope you enjoy reading through what I have to say.
1 note
Ā·
View note