UTF-8 SAMPLER
Ź $ Ą ˘ Ł ¤ Ľ Ś § ¨ Š Ş Ť Ž Ż
Frank da Cruz
The Kermit Project - Columbia University
New York City
fdc@columbia.edu
Last update:
Thu Dec 20 14:43:18 2007
[ PEACE ]
[ Poetry ]
[ I Can Eat Glass ]
[ The Quick Brown Fox ]
[ HTML Features ]
[ Credits, Tools, Commentary ]
UTF-8 is an ASCII-preserving encoding method for
Unicode (ISO 10646), the Universal Character Set
(UCS). The UCS encodes most of the world's writing systems in a single
character set, allowing you to mix languages and scripts within a document
without needing any tricks for switching character sets. This web page is
encoded directly in UTF-8.
As shown HERE,
Columbia University's Kermit 95 terminal emulation
software can display UTF-8 plain text in Windows 95, 98, ME, NT, XP, or 2000
when using a monospace Unicode font like Andale Mono WT J or Everson Mono Terminal, or the lesser
populated Courier New, Lucida Console, or Andale Mono. C-Kermit can handle it too,
if you have a Unicode
display. As many languages as are representable in your font can be seen
on the screen at the same time.
This, however, is a Web page. Some Web browsers can handle UTF-8, some can't.
And those that can might not have a sufficiently populated font to work with
(some browsers might pick glyphs dynamically from multiple fonts; Netscape 6
seems to do this).
CLICK HERE
for a survey of Unicode fonts for Windows.
The subtitle above shows currency symbols of many lands. If they don't
appear as blobs, we're off to a good start!
From the Anglo-Saxon Rune Poem (Rune version):
ㄡف᛫ᛒ��ㄡᚩㄡá᛫ㄡáᚪ᛫ᚷᛖᚻᚹ�ᚳá
ᛋᚳᛖᚪᛚ᛫�ᚪᚻ᛫ᛗᚪᚾᚾᚪ᛫ᚷᛖᚻᚹ�ᚳ᛫ᛗáᛚá᛫ᚻ�áᛞᚫᛚᚪᚾ
ᚷáㄡᚻᛖ᛫ᚹáᛖ᛫ㄡᚱ᛫ᛞᚱááᛖ᛫ᛞᚩᛗᛖᛋ᛫ᚻᛚفáᚾ᛬
From Laamon's Brut
(The Chronicles of England, Middle English, West Midlands):
An preost wes on leoden, Laamon was ihoten
He wes Leovenaes sone -- lie him be Drihten.
He wonede at Ernlee at elen are chirechen,
Uppen Sevarne stae, sel ar him uhte,
Onfest Radestone, er he bock radde.
(The third letter in the author's name is Yogh, missing from many fonts;
CLICK HERE for another Middle English sample
with some explanation of letters and encoding).
From the Tagelied of
Wolfram von Eschenbach (Middle High German):
Sne klwen durh die wolken sint geslagen,
er stget f mit grzer kraft,
ich sih in grwen tgelch als er wil tagen,
den tac, der im geselleschaft
erwenden wil, dem werden man,
den ich mit sorgen n verliez.
ich bringe in hinnen, ob ich kan.
sn vil manegiu tugent michz leisten hiez.
Some lines of
Odysseus Elytis (Greek):
Monotonic:
Τη γλÏÏÎ μο
έδマÎν ελληνική
Î ÏίÎ ΟĪマخκ ÏÎ αμμοฮιέ Î
μήÎ
.
ονάخ έγνοια η γλÏÏÎ μο
ÏÎ αμμοฮιέ Î
μήÎ
.
α Î Ξιον ÏÎ
Î
δฯÏÎα �ÏÎ
|
Polytonic:
Τὴ γλῶÏÎ μοῦ �マÎν �ληνικὴ
á ÏίÎ ΟĪマخκὸ Ïá ἀμμοฮιὲ Îῦ ホήÎ
.
ονάخ �νοια ἡ γλῶÏÎ μο
Ïá ἀμμοฮιὲ Îῦ ホήÎ
.
ἀὸ á Юιον ἐÏÎ
Îῦ �ฯÏÎα �ÏÎ
|
The first stanza of
Pushkin's Bronze Horseman (Russian):
а беег пÑÑŃнн
волн
СĐŴ он, д� велики
полн,
вдал глŴел. ед ним �око
Река неÐаÑ; беднй ر�н
о ней ÑŃемилÑ одиноко.
о м�ÑŃм, Đпким беегам
Ченели изб здеÑ и Đм,
иÑ �огого رÑะнΠ;
ле, неведомй лÑذм
Ń�ане ÐÑĐнного ÐлнΠ,
�ом ��ел.
Šota Rustaveli's Vepxis Ṭqaosani,
̣︡Th, The Knight in the Tiger's Skin (Georgian):
ვეპ�იĦ ˘§აოĦანი
¨ოთა £Ħთაველი
Ĥმე თĦი ¨ემვედ ე, ნ£თ£ კვლა დამ�ĦნაĦ Ħო¤ლიĦა ¨ ომაĦა,
ŞეŞ�ლĦ, Ĵ§ალĦა და მიĴაĦა, °აე თა თანა მ ომაĦა;
მომŞნეĦ ¤ თენი და აĤვ¤ ინდე, მივ°�ვდე მაĦ İემĦა ნდომაĦა,
დĤიĦით და Ĥამით ვ°�ედვიდე მზიĦა ელვათა კ თომააĦა.
Tamil poetry of Cupiramaniya Paarathiyar,
�ààமணிய பாரதியார (1882-1921):
யாமறிநà ம�ி�ில தமிழà�ி பல ـிதாவத à�à�à �ணம,
பாமரராய வில�à�à�ாய, ��אàà ـ�à�à���ப பானà �ΰà,
நாமமத தமிழரΰ� ��à ـ�à� வாழààிàல நனà? ��àர!
தـதàத தமிழ� ��ΰாம பரவàà� �ΰàல வـàà.
And from the sublime to the ridiculous, here is a
certain phrase¹ in an assortment of languages:
- Sanskrit: �� श�àमàतàà । नपहिनसàि माम ॥
- Sanskrit (standard transcription): kcaṃ aknomyattum; nopahinasti mm.
- Classical Greek: �λον �γε� δύναμαι ÎῦÎ οὔ με βλάÎι.
- Greek (monotonic): οÏ να Ξ ÏαÎένα γฮλιά دマÎ να άθ ÎοÎ.
- Greek (polytonic): οá νὰ Ξ ÏαÎένα γฮλιὰ دマá νὰ άθ ÎοÎ.
Etruscan: (NEEDED)
- Latin: Vitrum edere possum; mihi non nocet.
- Old French: Je puis mangier del voirre. Ne me nuit.
- French: Je peux manger du verre, a ne me fait pas de mal.
- Provenal / Occitan: Pdi manjar de veire, me nafrari pas.
- Qubcois: J'peux manger d'la vitre, a m'fa pas mal.
- Walloon: Dji pou magn do vre, oula m' freut nn m.
Champenois: (NEEDED)
Lorrain: (NEEDED)
- Picard: Ch'peux mingi du verre, cha m'fo mie n'ma.
Corsican: (NEEDED)
Jèrriais: (NEEDED)
- Kreyl Ayisyen: Mwen kap manje v, li pa blese'm.
- Basque: Kristala jan dezaket, ez dit minik ematen.
- Catalan / Catal: Puc menjar vidre, que no em fa mal.
- Spanish: Puedo comer vidrio, no me hace dao.
- Aragones: Puedo minchar beire, no me'n fa mal .
- Galician: Eu podo xantar cristais e non cortarme.
- European Portuguese: Posso comer vidro, no me faz mal.
- Brazilian Portuguese (8):
Posso comer vidro, no me machuca.
- Caboverdiano: M' pod cum vidru, ca ta magu-m'.
- Papiamentu: Ami por kome glas anto e no ta hasimi dao.
- Italian: Posso mangiare il vetro e non mi fa male.
- Milanese: Sn bn de magn el vder, el me fa minga mal.
- Roman: Me posso magna' er vetro, e nun me fa male.
- Napoletano: M' pozz magna' o'vetr, e nun m' fa mal.
- Sicilian: Puotsu mangiari u vitru, nun mi fa mali.
- Venetian: Mi posso magnare el vetro, no'l me fa mae.
- Zeneise (Genovese): Psso mangi o veddro e o no me f m.
- Romansch (Grischun): Jau sai mangiar vaider, senza che quai fa donn a mai.
Romany / Tsigane: (NEEDED)
- Romanian: Pot s mnnc sticl i ea nu m rnete.
- Esperanto: Mi povas mani vitron, i ne damaas min.
Pictish: (NEEDED)
Breton: (NEEDED)
- Cornish: M a yl dybry gwder hag f ny wra ow ankenya.
- Welsh: Dw i'n gallu bwyta gwydr, 'dyw e ddim yn gwneud dolur i mi.
- Manx Gaelic: Foddym gee glonney agh cha jean eh gortaghey mee.
- Old Irish (Ogham): ᚛᚛�แ��ᚋ ᚔב ááแ แᚋсแ᚜
- Old Irish (Latin): Coniccim ithi nglano. Nmgna.
- Irish: Is fidir liom gloinne a ithe. N dhanann s dochar ar bith dom.
- Scottish Gaelic: S urrainn dhomh gloinne ithe; cha ghoirtich i mi.
- Anglo-Saxon (Runes):
á᛫ᛗᚨᚷ᛫ᚷᛚᚨᛋ᛫ᛖᚩáᚾ᛫ᚩᚾᛞ᛫ᚻááᚾᛖ᛫ᚻᛖᚪᚱᛗáແᛗᛖ᛬
- Anglo-Saxon (Latin): Ic mg gls eotan ond hit ne hearmia me.
- Middle English: Ich canne glas eten and hit hirti me nout.
- English: I can eat glass and it doesn't hurt me.
- English (IPA): [aɪ kn it gls nd ɪt dz nt ht mi] (Received Pronunciation)
- English (Braille): ��â⠀�â⠀⠛قââââ��⠀���â⠞⠀��⠀â
- Lalland Scots / Doric: Ah can eat gless, it disnae hurt us.
Glaswegian: (NEEDED)
- Gothic (4):
МАВ
ВЛД𐍃
ЙנđАН,
НЙ
МЙ𐍃
๐П
НГАН
БđЙВВЙИ.
- Old Norse (Runes): ᛖᚴ ᚷᛖᛏ ᛖá
ᚧ ᚷᛚᛖᚱ ᛘᚾ
�ᛋᛋ ᚨᚧ �
ᚱແ ᛋᚨᚱ
- Old Norse (Latin): Ek get eti gler n ess a vera sr.
- Norsk / Norwegian (Nynorsk): Eg kan eta glas utan skada meg.
- Norsk / Norwegian (Bokml): Jeg kan spise glass uten skade meg.
- Froyskt / Faroese: Eg kann eta glas, skaaleysur.
- slenska / Icelandic: g get eti gler n ess a meia mig.
- Svenska / Swedish: Jag kan ta glas utan att skada mig.
- Dansk / Danish: Jeg kan spise glas, det gr ikke ondt p mig.
- Sønderjysk: ka e glass uhen at det go m naue.
- Frysk / Frisian: Ik kin gls ite, it docht me net sear.
- Nederlands / Dutch: Ik kan glas eten, het doet
mij
geen kwaad.
- Kirchradsj/Bchesserplat: Iech ken glaas se, mer 't deet miech
jing pieng.
- Afrikaans: Ek kan glas eet, maar dit doen my nie skade nie.
- Ltzebuergescht / Luxemburgish: Ech kan Glas iessen, daat deet mir nt wei.
- Deutsch / German: Ich kann Glas essen, ohne mir zu schaden.
- Ruhrdeutsch: Ich kann Glas verkasematuckeln, ohne dattet mich wat jucken tut.
- Langenfelder Platt:
Isch kann Jlaas kimmeln, uuhne datt mich datt weh dd.
- Lausitzer Mundart ("Lusatian"): Ich koann Gloos assn und doas
dudd merr ni wii.
- Odenwlderisch: Iech konn glaasch voschbachteln ohne dass es mir ebbs daun doun dud.
- Schsisch / Saxon: 'sch kann Glos essn, ohne dass'sch mer wehtue.
- Pflzisch: Isch konn Glass fresse ohne dasses mer ebbes ausmache dud.
- Schwbisch / Swabian: I k Glas frssa, ond des macht mr nix!
- Bayrisch / Bavarian: I koh Glos esa, und es duard ma ned wei.
- Allemannisch: I kaun Gloos essen, es tuat ma ned weh.
- Schwyzerdtsch: Ich chan Glaas sse, das tuet mir nd weeh.
- Hungarian: Meg tudom enni az veget, nem lesz tle bajom.
- Suomi / Finnish: Voin syd lasia, se ei vahingoita minua.
- Sami (Northern): Shtn borrat lsa, dat ii leat bvÄas.
- Erzian: он ÑÐн
Ñ�икадо, д
зŴ
ĞÑŃĞз а
�и.
- Northern Karelian: Mie voin syvv lasie ta minla ei ole kipie.
- Southern Karelian: Min voin syvv st'oklua dai minule ei ole kibie.
Vepsian: (NEEDED)
Votian: (NEEDED)
Livonian: (NEEDED)
- Estonian: Ma vin klaasi sa, see ei tee mulle midagi.
- Latvian: Es varu st stiklu, tas man nekait.
- Lithuanian: Aš galiu valgyti stikl
ir jis mans nežeidžia
Old Prussian: (NEEDED)
Sorbian (Wendish): (NEEDED)
- Czech: Mohu jst sklo, neublž mi.
- Slovak: Mžem jesť sklo. Nezran ma.
- Polska / Polish: Mog je szko i mi nie szkodzi.
- Slovenian: Lahko jem steklo, ne da bi mi škodovalo.
- Croatian: Ja mogu jesti staklo i ne boli me.
- Serbian (Latin): Mogu jesti staklo a da mi ne škodi.
- Serbian (Cyrillic): ог �ÑĐ ÑĐкло
а
да ми
не
�оди.
- Macedonian: ожам да �дам ÑĐкло, а не ме �ĐĐ.
- Russian: Я мог еÑŃ ÑĐкло, оно мне не веди.
- Belarusian (Cyrillic): Я маг еÑΡ �ло, Ŵо мне не �одзΡ.
- Belarusian (Lacinka): Ja mahu jeci ško, jano mne ne škodzi.
- Ukrainian: Я мож ÑĐ �ло, й воно мен не по�одиŃ.
- Bulgarian: ога да Ŵ ÑŃ�ло, Đ не ми веди.
- Georgian: მინაĦ ვამ და ა ა მ˘კივა.
- Armenian: Կնամ ապակի ոŐել ինծի անհանգիստ չընե
- Albanian: Un mund t ha qelq dhe nuk m gjen gj.
- Turkish: Cam yiyebilirim, bana zararı dokunmaz.
- Turkish (Ottoman): جا
�� بŲ�
بڭا ضرر ط�Ů�Ωุ
- Bangla / Bengali:
ΰি �à �ـ পারি, তাত ΰার �ন �àতি হ না।
- Marathi: म � � श�, मला त दà� नाह.
- Hindi: मא �à � स�ा हŕ � मà �स � � नह पहàŕ�.
- Tamil: நான �àாி �பàிàـ, เனால à�à� � �ـà வராத.
- Urdu(2):
�к کاΪ کھا سکتا Ù� ا� ุھ تکÛй ΫÛк Ù�
- Pashto(2): ز ش� خ�Û شุ ظ ุ Ω خ��
- Farsi / Persian: .ู � ت�Ω
بد�Ω احساس درد ش� بخ�
- Arabic(2): أΨ Řدر عŲ أÙ اØجاج ظا Ø �ŲูΩ.
Aramaic: (NEEDED)
- Hebrew(2): � � � .
- Yiddish(2): קע עס � עס � נ� װײ.
Judeo-Arabic: (NEEDED)
Ladino: (NEEDED)
Gʼz: (NEEDED)
Amharic: (NEEDED)
- Twi: Metumi awe tumpan, ny me hwee.
- Hausa (Latin): Ina iya taunar gilashi kuma in gama lafiya.
- Hausa (Ajami) (2):
إΨ إル تÙ�ΩØ غŲØش ÙÙู إ غÙูØ ŲØÙルØ
- Yoruba(3): Mo l je̩ dg, k n pa m lra.
- Lingala: Nakoki koliya biteni bya milungi, ekosala ngai mabe t.
- (Ki)Swahili: Naweza kula bilauri na sikunyui.
- Malay: Saya boleh makan kaca dan ia tidak mencederakan saya.
- Tagalog: Kaya kong kumain nang bubog at hindi ako masaktan.
- Chamorro: Sia yo' chumocho krestat, ti ha na'lalamen yo'.
- Javanese: Aku isa mangan beling tanpa lara.
- Burmese:
က္ယ္ဝန္сော္с�္ယ္ဝန္с မ္ယက္сแးနုိáс�с á္ရောáс
ထိáိက္с္ဟု မရ္ဟိပာ။
(9)
- Vietnamese (quốc ngữ): Ti c thể n thủy tinh m khng hại g.
- Vietnamese (nm) (4): 些 𣎏 世 咹 水 晶 𦓡 空 𣎏 害 咦
Khmer: (NEEDED)
Lao: (NEEDED)
- Thai: ��à�àะאàā� à�אั�āא�à���אـ
- Mongolian (Cyrillic): � �л идĞ ذдна, надад ะĐй би
- Mongolian (Classic) (5):
ᠪᠢ ᠰᠢᠯᠢ ᠢᠳᠡᠶᠦ ᠴᠢᠳᠠᠨᠠ ᠂ ᠨᠠᠳᠤᠷ ᠬᠣᠤᠷᠠᠳᠠᠢ ᠪᠢᠰᠢ
Dzongkha: (NEEDED)
Nepali: (NEEDED)
- Tibetan: ཤེལས��ā�ŕ�རེ�
- Chinese: 我½吞下»èфä身体。
- Chinese (Traditional): 我½吞下»èфåˇ身體。
- Taiwanese(6): Ga -tng chiah po-l, m b tioh-siong.
- Japanese: ã¯Źİš�¹�у¾すăу¯ãˇ¤け¾せん。
- Korean: 나는 �를 먹을 수 ל�. 그래도 íė �ė
- Bislama: Mi save kakae glas, hemi no save katem mi.
- Hawaiian: Hiki iaʻu ke ʻai i ke aniani; ʻaʻole n l au e ʻeha.
- Marquesan: E koʻana e kai i te karahi, mea ʻ, ʻaʻe hauhau.
- Chinook Jargon: Naika mkmk kaksht labutay, pi weyk ukuk munk-sik nay.
- Navajo: Tssǫʼ yishฬÄ
go bnshghah d doo shi neezgai da.
Cherokee (and Cree, Ojibwa, Inuktitut, Náhuatl, Quechua,
and other American languages): (NEEDED)
Garifuna: (NEEDED)
Gullah: (NEEDED)
- Lojban: mi kakne le nu citka le blaci .iku'i le se go'i na xrani mi
- Nrdicg: Ljœr ye caudran crne jor cẃran.
(Additions, corrections, completions,
gratefully accepted.)
For testing purposes, some of these are repeated in a monospace font . . .
- Euro Symbol: Ź.
- Greek: οÏ να Ξ ÏαÎένα γฮλιά دマÎ να άθ ÎοÎ.
- slenska / Icelandic: g get eti gler n ess a meia mig.
- Polish: Mog je szko, i mi nie szkodzi.
- Romanian: Pot s mnnc sticl i ea nu m rnete.
- Ukrainian: Я мож ÑĐ �ло, й воно мен не по�одиŃ.
- Armenian: Կնամ ապակի ոŐել ինծի անհանգիստ չընե
- Georgian: მინაĦ ვამ და ა ა მ˘კივა.
- Hindi: मא �à � स�ा हŕ, मà � स � पडा नह हत.
- Hebrew(2): � � � .
- Yiddish(2): קע עס � עס � נ� װײ.
- Arabic(2): أΨ Řدر عŲ أÙ اØجاج ظا Ø �ŲูΩ.
- Japanese: ã¯Źİš�¹�у¾すăу¯ãˇ¤け¾せん。
- Thai: ��à�àะאàā� à�אั�āא�à���אـ
Notes:
- The "I can eat glass" phrase and initial translations (about 30 of them)
were borrowed from Ethan Mollick's I Can Eat Glass page
(which disappeared on or about June 2004) and converted to UTF-8. Since
Ethan's original page is gone, I should mention that his purpose was to offer
travelers a phrase they could use in any country that would command a
certain kind of respect, or at least get attention. See Credits for the many additional contributions since
then. When submitting new entries, the word "hurt" (if you have a choice)
is used in the sense of "cause harm", "do damage", or "bother", rather than
"inflict pain" or "make sad". In this vein Otto Stolz comments (as do
others further down; personally I think it's better for the purpose of this
page to have extra entries and/or to show a greater repertoire of characters
than it is to enforce a strict interpretation of the word "hurt"!):
This is the meaning I have translated to the Swabian dialect.
However, I just have noticed that most of the German variants
translate the "inflict pain" meaning. The German example should
read:
"Ich kann Glas essen ohne mir zu schaden."
rather than:
"Ich kann Glas essen, ohne mir weh zu tun."
(The comma fell victim to the 1996 orthographic reform,
cf. http://www.ids-mannheim.de/reform/e3-1.html#P76.
You may wish to contact the contributors of the following translations
to correct them:
- Ltzebuergescht / Luxemburgish: Ech kan Glas iessen, daat deet mir nt wei.
- Lausitzer Mundart ("Lusatian"): Ich koann Gloos assn und doas dudd merr ni wii.
- Schsisch / Saxon: 'sch kann Glos essn, ohne dass'sch mer wehtue.
- Bayrisch / Bavarian: I koh Glos esa, und es duard ma ned wei.
- Allemannisch: I kaun Gloos essen, es tuat ma ned weh.
- Schwyzerdtsch: Ich chan Glaas sse, das tuet mir nd weeh.
In contrast, I deem the following translations *alright*:
- Ruhrdeutsch: Ich kann Glas verkasematuckeln, ohne dattet mich wat jucken tut.
- Pflzisch: Isch konn Glass fresse ohne dasses mer ebbes ausmache dud.
- Schwbisch / Swabian: I k Glas frssa, ond des macht mr nix!
(However, you could remove the commas, on account of
http://www.ids-mannheim.de/reform/e3-1.html#P76
and
http://www.ids-mannheim.de/reform/e3-1.html#P72, respectively.)
I guess, also these examples translate the wrong sense of "hurt",
though I do not know these languages well enough to assert them
definitely:
- Nederlands / Dutch: Ik kan glas eten; het doet mij geen
pijn. (This one has been changed)
- Kirchradsj/Bchesserplat: Iech ken glaas se, mer 't deet miech jing pieng.
In the Romanic languages, the variations on "fa male" (it) are probably
wrong, whilst the variations on "hace dao" (es) and "damaas" (Esperanto) are probably correct; "nocet" (la) is definitely right.
The northern Germanic variants of "skada" are probably right, as are
the Slavic variants of "škodi/�оди" (se); however the Slavic variants
of " boli" (hv) are probably wrong, as "bolena" means "pain/ache", IIRC.
That was from July 2004. In December 2007, Otto writes again:
Hello Frank,
in days of yore, I had written:
> "Ich kann Glas essen ohne mir zu schaden."
> (The comma fell victim to the 1996 orthographic reform,
cf. http://www.ids-mannheim.de/reform/e3-1.html#P76.
The latest revision (2006) of the official German orthography
has revived the comma around infinitive clauses commencing with
ohne, or 5 other conjunctions, or depending from a noun or
from an announcing demonstrative
(http://www.ids-mannheim.de/reform/regeln2006.pdf, §75).
So, it's again: Ich kann Glas essen, ohne mir zu schaden.
Best wishes,
Otto Stolz
- The numbering of the samples is arbitrary, done only to keep track of how
many there are, and can change any time a new entry is added. The
arrangement is also arbitrary but with some attempt to group related
examples together. Note: All languages not listed are wanted, not just the
ones that say (NEEDED).
- Correct right-to-left display of these languages
depends on the capabilities of your browser. The period should
appear on the left. In the monospace Yiddish example, the Yiddish digraphs
should occupy one character cell.
- Yoruba: The third word is Latin letter small 'j' followed by
small 'e' with U+0329, Combining Vertical Line Below. This displays
correctly only if your Unicode font includes the U+0329 glyph and your
browser supports combining diacritical marks. The Lingala and Indic examples
also include combining sequences.
- Includes Unicode 3.1 (or later) characters beyond Plane 0.
- The Classic Mongolian example should be vertical, top-to-bottom and
left-to-right. But such display is almost impossible. Also no font yet
exists which provides the proper ligatures and positional variants for the
characters of this script, which works somewhat like Arabic.
- Taiwanese is also known as Holo or Hoklo, and is related to Southern
Min dialects such as Amoy.
Contributed by Henry H. Tan-Tenn, who comments, "The above is
the romanized version, in a script current among Taiwanese Christians since
the mid-19th century. It was invented by British missionaries and saw use in
hundreds of published works, mostly of a religious nature. Most Taiwanese did
not know Chinese characters then, or at least not well enough to read. More
to the point, though, a written standard using Chinese characters has never
developed, so a significant minority of words are represented with different
candidate characters, depending on one's personal preference or etymological
theory. In this sentence, for example, "-tng", "chiah",
"m" and "b" are problematic using Chinese characters.
"Ga" (I/me) and "po-l" (glass) are as written in other Sinitic
languages (e.g. Mandarin, Hakka)."
- Wagner Amaral of Pinese & Amaral Associados notes that
the Brazilian Portuguese sentence for
"I can eat glass" should be identical to the Portuguese one, as the word
"machuca" means "inflict pain", or rather "injuries". The words "faz
mal" would more correctly translate as "cause harm".
- Burmese: In English the first person pronoun "I" stands for both
genders, male and female. In Burmese (except in the central part of Burma)
kyundaw (က္ယ္ဝန္сော္) for male and kyanma (က္ယ္ဝန္с) for female.
Using here a fully-compliant Unicode Burmese font -- sadly one and only Padauk
Graphite font exists -- rendering using graphite engine.
CLICK HERE to test Burmese
characters.
The "I can eat glass" sentences do not necessarily show off the orthography of
each language to best advantage. In many alphabetic written languages it is
possible to include all (or most) letters (or "special" characters) in
a single (often nonsense) pangram. These were traditionally used in
typewriter instruction; now they are useful for stress-testing computer fonts
and keyboard input methods. Here are a few examples (SEND MORE):
- English: The quick brown fox jumps over the lazy dog.
- Irish: "An ḃfuil do ro ag bualaḋ ḟaitos an ġr a ṁeall lena �g ada
ṡl do leasa ṫ?"
"D'ḟuascail osa rṁac na hiġe Beannaiṫe pr ava agus áaiṁ."
- Dutch: Pa's wijze lynx bezag vroom het fikse aquaduct.
- German: Falsches ben von Xylophonmusik qult jeden
greren Zwerg. (1)
- German: Im finſteren Jagdſchlo am offenen Felsquellwaſſer patzte der affig-flatterhafte kauzig-hfliche Bcker ber ſeinem verſifften kniffligen C-Xylophon. (2)
- Swedish: Flygande bckasiner ska strax hwila p mjuka tuvor.
- Icelandic: Svr grt an v lpan var nt.
- Polish: Pchnฤ w t Ădź jeża lub om skrzy fig.
- Czech: P�liš
žluťouk ků pl
Ãbelsk kdy.
- Slovak: Star k na hbe
knh žuje tško povdnut
ruže, na stĺpe sa ateľ
uà kvkať nov du o
živote.
- Greek (monotonic): ξεÎεάζ Îν �ฯخΞЯÎ βδελฮμία
- Greek (polytonic):
ξεÎεάζ áν �ฯخΞЯÎ βδελฮμία
- Russian: ذミ
Ðа жил-бл
ΠŃÑ? �,
но
Đлб�вй
ĞземплÑ!
.
- Bulgarian: �ĐĐ дÐ бе� ミÑĐива, ذ пÑั�, койĐ Ρ�Đа, зам�на каĐ ган.
- Sami (Northern): Vuol Ruoŧa geggiid leat mga luosa ja uovžža.
- Hungarian: rvztűr tkrfrgp.
- Spanish: El pingino Wenceslao hizo kilmetros bajo exhaustiva lluvia y fro, aoraba a su querido cachorro.
- Portuguese: O prximo vo noite sobre o Atlntico, pe freqentemente o nico mdico. (3)
- French: Les nafs githales htifs pondant Nol o il gle sont srs d'tre
dus et de voir leurs drles d'ufs abms.
- Esperanto: Eĥoano
iuĵaŭde.
- Hebrew: � סת ש� תנצ קרפ עץ .
- Japanese (Hiragana):
ã㯫»¸© ¡�¬るを
ãуדу ¤ª�
γ®�ãã¾ けµこד¦
ăãγã¿じ ゑ²ăず
(4)
Notes:
- Other phrases commonly used in Germany include: "Ein wackerer Bayer
vertilgt ja bequem zwo Pfund Kalbshaxe" and, more recently, "Franz jagt im
komplett verwahrlosten Taxi quer durch Bayern", but both lack umlauts and
esszet. Previously, going for the shortest sentence that has all the
umlauts and special characters, I had
"Gre aus Brenhfe
(und echtringen)!"
Acute accents are not used in native German words, so I was surprised to
discover "echtringen" in the Deutsche Bundespost
Postleitzahlenbuch:
It's a small village in eastern Lower Saxony.
The "oe" in this case
turns out to be the Lower Saxon "lengthening e" (Dehnungs-e), which makes the
previous vowel long (used in a number of Lower Saxon place names such as Soest
and Itzehoe), not the "e" that indicates umlaut of the preceding vowel.
Many thanks to the echtringen-Namenschreibungsuntersuchungskomitee
(Alex Bochannek, Manfred Erren, Asmus Freytag, Christoph Päper, plus
Werner Lemberg who serves as
echtringen-Namenschreibungsuntersuchungskomiteerechtschreibungsprfer)
for their relentless pursuit of the facts in this case. Conclusion: the
accent almost certainly does not belong on this (or any other native German)
word, but neither can it be dismissed as dirt on the page. To add to the
mystery, it has been reported that other copies of the same edition of the
PLZB do not show the accent! UPDATE (March 2006): David Krings was
intrigued enough by this report to contact the mayor of Ebstorf, of which
Oechtringen is a borough, who responded:
Sehr geehrter Mr. Krings,
wenn Oechtringen irgendwo mit einem Akzent auf dem O geschrieben wurde,
dann kann das nur ein Fehldruck sein. Die offizielle Schreibweise lautet
jedenfalls „Oechtringen“.
Mit freundlichen Grssen
Der Samtgemeindebrgermeister
i.A. Lothar Jessel
- From Karl Pentzlin (Kochel am See, Bavaria, Germany):
"This German phrase is suited for display by a Fraktur (broken letter)
font. It contains: all common three-letter ligatures: ffi ffl fft and all
two-letter ligatures required by the Duden for Fraktur typesetting: ch ck ff
fi fl ft ll ſch ſi ſſ ſt tz (all in a
manner such they are not part of a three-letter ligature), one example of f-l
where German typesetting rules prohibit ligating (marked by a ZWNJ), and all
German letters a...z, ,,,, ſ [long s]
(all in a manner such that they are not part of a two-letter Fraktur
ligature)."
Otto Stolz notes that "'Schlo' is now spelled 'Schloss', in
contrast to 'grer' (example 4) which has kept its
''. Fraktur has been banned from general use, in 1942, and long-s
(ſ) has ceased to be used with Antiqua (Roman) even earlier (the
latest Antiqua-ſ I have seen is from 1913, but then
I am no expert, so there may well be a later instance." Later Otto confirms
the latter theory, "Now I've run across a book “Deutsche
Rechtschreibung” (edited by Lutz Mackensen) from 1954 (my reprint
is from 1956) that has kept the Antiqua-ſ in its dictionary part (but
neither in the preface nor in the appendix)."
- Diaeresis is not used in Iberian Portuguese.
- From Yurio Miyazawa: "This poetry contains all the sounds in the
Japanese language and used to be the first thing for children to learn in
their Japanese class. The Hiragana version is particularly neat because it
covers every character in the phonetic Hiragana character set." Yurio also
sent the Kanji version:
イ¯ă¸© ��¬るを
我ф�ぞ 常ª�
�ş®奥山 ���ד¦
ใå見じ 酔²ăず
Accented Cyrillic:
(This section contributed by Vladimir Marinov.)
In Bulgarian it is desirable, customary, or in some cases required to
write accents over vowels. Unfortunately, no computer character sets
contain the full repertoire of accented Cyrillic letters. With Unicode,
however, it is possible to combine any Cyrillic letter with any combining
accent. The appearance of the result depends on the font and the rendering
engine. Here are two examples.
- Той вид бŴаĐ коÐ по главаĐ и и коÑÐ на амоĐ и, и еÑذ да и
еذ: "ааÑĐ по паÑи о паÑаĐ, не ミ паи!", но Ð помиÑÐи: "Хей,
помиÐи Ð! и ека, а е Ðоذла в Đзи ека, коÑĐ ミ� да Đذ,
а не ĐÑذ."
- о п�ÑŃ п�ŃÌÐа кÌÑди и ÐоÐавÌÐи.
Here is the Russian alphabet (uppercase only) coded in three
different ways, which should look identical:
- РСТУФХЦЧШЩЪЫЬЭЮЯ
(Literal UTF-8)
- АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ
(Decimal numeric character reference)
- АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ
(Hexadecimal numeric character reference)
In another test, we use HTML language tags to distinguish Bulgarian, Russian,
and Serbian,
which have different italic forms for lowercase
б, г, д, п, and/or :
Bulgarian:
| [ бгдп ]
| [ бгдп ]
| ога да Ŵ ÑŃ�ло и не ме боли.
|
Russian:
| [ бгдп ]
| [ бгдп ]
| Я мог еÑŃ ÑĐкло, ÑĐ мне не веди.
|
Serbian:
| [ бгдп ]
| [ бгдп ]
| ог �ÑĐ ÑĐкло
а
да ми
не
�оди.
|
- Credits:
-
The "I can eat glass" phrase and the initial collection of translations:
Ethan Mollick.
Transcription / conversion to UTF-8: Frank da Cruz.
Albanian: Sindi Keesan.
Afrikaans: Johan Fourie, Kevin Poalses.
Anglo Saxon: Frank da Cruz.
Arabic: Najib Tounsi.
Armenian: Vae Kundakı.
Belarusian: Alexey Chernyak.
Bengali: Somnath Purkayastha, Deepayan Sarkar.
Bislama: Dan McGarry.
Braille: Frank da Cruz.
Bulgarian: Sindi Keesan, Guentcho Skordev, Vladimir Marinov.
Burmese: "cetanapa".
Cabo Verde Creole: Cludio Alexandre Duarte.
Catalán: Jordi Bancells.
Chinese: Jack Soo, Wong Pui Lam.
Chinook Jargon: David Robertson.
Cornish: Chris Stephens.
Croatian: Marjan Bae.
Czech: Stanislav Pecha, Radovan Garabk.
Dutch: Peter Gotink. Pim Blokland, Rob Daniel, Rob de Wit.
Erzian: Jack Rueter.
Esperanto: Franko Luin, Radovan Garabk.
Estonian: Meelis Roos.
Faroese: Jón Gaasedal.
Farsi/Persian: Payam Elahi.
Finnish: Sampsa Toivanen.
French: Luc Carissimo, Anne Colin du Terrail, Sean M. Burke.
Galician: Laura Probaos.
Georgian: Giorgi Lebanidze.
German: Christoph Pper, Otto Stolz, Karl Pentzlin, David Krings,
Frank da Cruz.
Gothic: Aurélien Coudurier.
Greek: Ariel Glenn, Constantine Stathopoulos, Siva Nataraja, Christos Georgiou.
Hebrew: Jonathan Rosenne, Tal Barnea.
Hausa: Malami Buba, Tom Gewecke.
Hawaiian: na Hauʻoli Motta, Anela de Rego, Kaliko Trapp.
Hindi: Shirish Kalele, Nitin Dahra.
Hungarian: Andrs Rcz, Mark Holczhammer.
Icelandic: Andrs Magnsson, Sveinn Baldursson.
International Phonetic Alphabet (IPA): Siva Nataraja / Vincent Ramos.
Irish: Michael Everson, Marion Gunn, James Kass, Curtis Clark.
Italian: Thomas De Bellis.
Japanese: Makoto Takahashi, Yurio Miyazawa.
Karelian: Aleksandr Semakov.
Kirchradsj: Roger Stoffers.
Kreyl: Sean M. Burke.
Korean: Jungshik Shin.
Langenfelder Platt: David Krings.
Ltzebuergescht: Stefaan Eeckels.
Lingala: Denis Moyogo Jacquerye
(Nkta ya Kng mbal ).
(Nkta ya Kng mbal
Lithuanian: Gediminas Grigas.
Lojban: Edward Cherlin.
Lusatian: Ronald Schaffhirt.
Macedonian: Sindi Keesan.
Malay: Zarina Mustapha.
Manx: Éanna Ó Brádaigh.
Marathi: Shirish Kalele.
Marquesan: Kaliko Trapp.
Middle English: Frank da Cruz.
Milanese: Marco Cimarosti.
Mongolian: Tom Gewecke.
Napoletano: Diego Quintano.
Navajo: Tom Gewecke.
Nrdicg:
Yẃlyan Rott.
Norwegian: Herman Ranes.
Odenwlderisch: Alexander Heß.
Old Irish: Michael Everson.
Old Norse: Andrs Magnsson.
Papiamentu: Bianca and Denise Zanardi.
Pashto: N.R. Liwal.
Pflzisch: Dr. Johannes Sander.
Picard: Philippe Mennecier.
Polish: Juliusz Chroboczek, Pawe Przeradowski.
Portuguese: "Cludio" Alexandre Duarte, Bianca and Denise
Zanardi, Pedro Palhoto Matos, Wagner Amaral.
Qubcois: Laurent Detillieux.
Roman: Pierpaolo Bernardi.
Romanian: Juliusz Chroboczek, Ionel Mugurel.
Romansch: Alexandre Suter.
Ruhrdeutsch: "Timwi".
Russian: Alexey Chernyak, Serge Nesterovitch.
Sami: Anne Colin du Terrail, Luc Carissimo.
Sanskrit: Siva Nataraja / Vincent Ramos.
Schsisch: Andr Mller.
Schwbisch: Otto Stolz.
Scots: Jonathan Riddell.
Serbian: Sindi Keesan, Ranko Narancic, Boris Daljevic, Szilvia Csorba.
Slovak: G. Adam Stanislav, Radovan Garabk.
Slovenian: Albert Kolar.
Spanish: Aleida
Muñoz, Laura Probaos.
Swahili: Ronald Schaffhirt.
Swedish: Christian Rose, Bengt Larsson.
Taiwanese: Henry H. Tan-Tenn.
Tagalog: Jim Soliven.
Tamil: Vasee Vaseeharan.
Tibetan: D. Germano, Tom Gewecke.
Thai: Alan Wood's wife.
Turkish: Vae Kundakı, Tom Gewecke, Merlign Olnon.
Ukrainian: Michael Zajac.
Urdu: Mustafa Ali.
Vietnamese: Dixon Au,
[James] � B Phước
杜 伯 福.
Walloon: Pablo Saratxaga.
Welsh: Geiriadur Prifysgol Cymru (Andrew).
Yiddish: Mark David,
Zeneise: Angelo Pavese.
- Tools Used to Create This Web Page:
- The UTF8-aware Kermit 95 terminal emulator on
Windows, to a Unix host with the EMACS text editor. Kermit
95 displays UTF-8 and also allows keyboard entry of arbitrary Unicode BMP
characters as 4 hex digits, as shown HERE. Hex codes
for Unicode values can be found in The Unicode
Standard (recommended) and the online code charts. When
submissions arrive by email encoded in some other character set (Latin-1,
Latin-2, KOI, various PC code pages, JEUC, etc), I use the TRANSLATE command
of C-Kermit on the Unix host (where I read my mail) to convert the character set to
UTF-8 (I could also use Kermit 95 for this; it has the same TRANSLATE
command). That's it -- no "Web authoring" tools, no locales, no "smart"
anything. It's just plain text, nothing more. By the way, there's nothing
special about EMACS -- any text editor will do, providing it allows entry of
arbitrary 8-bit bytes as text, including the 0x80-0x9F "C1" range. EMACS 21.1
actually supports UTF-8; earlier versions don't know about it and display the
octal codes; either way is OK for this purpose.
- Commentary:
- Date: Wed, 27 Feb 2002 13:21:59 +0100
From: "Bruno DEDOMINICIS" <b.dedominicis@cite-sciences.fr>
Subject: Je peux manger du verre, cela ne me fait pas mal.
I just found out your website and it makes me feel like proposing an
interpretation of the choice of this peculiar phrase.
Glass is transparent and can hurt as everyone knows. The relation between
people and civilisations is sometimes effusional and more often rude. The
concept of breaking frontiers through globalization, in a way, is also an
attempt to deny any difference. Isn't "transparency" the flag of modernity?
Nothing should be hidden any more, authority is obsolete, and the new powers
are supposed to reign through loving and smiling and no more through
coercion...
Eating glass without pain sounds like a very nice metaphor of this attempt.
That is, frontiers should become glass transparent first, and be denied by
incorporating them. On the reverse, it shows that through globalization,
frontiers undergo a process of displacement, that is, when they are not any
more speakable, they become repressed from the speech and are therefore
incorporated and might become painful symptoms, as for example what happens
when one tries to eat glass.
The frontiers that used to separate bodies one from another tend to divide
bodies from within and make them suffer.... The chosen phrase then appears
as a denial of the symptom that might result from the destitution of
traditional frontiers.
Best,
Bruno De Dominicis, Paris, France
Other Unicode pages onsite:
Unicode samplers and resources offsite:
Unicode fonts:
[ Kermit 95 ]
[ K95 Screen Shots ]
[ C-Kermit ]
[ Kermit Home ]
[ Display Problems? ]
[ The Unicode Consortium ]
UTF-8 Sampler / The Kermit Project /
Columbia University /
kermit@columbia.edu