CMCL to Unicode bidirectional converter





Help and documentation

Unicode Coptic diacritics best practices

  • Single-letter superlinear stroke must be rendered with combining U+0304.
    ⲁ̄ = U+2C81 U+0304 ?
  • Conjoining superlinear stroke (alias “Bindestrich”) must be rendered using combining U+FE24 after first character and U+FE25 after last character.
    ⲓ︤ⲥ︥ = U+2C93 U+FE24 U+2CA5 U+FE25 ?
    If the Bindestrich covers more than two characters, the in-between character(s) should be followed by U+FE26
    ⲓ︤ⲏ︦ⲙ︥ = U+2C93 U+FE24 U+2C8F U+FE26 U+2C99 U+FE25 ?
  • A single-letter superlinear stroke that is just as wide as the letter above which it is placed and also to be able to join two such single-letter strokes into a Bindestrich over 2 or more letters, it might be rendered with U+0305. This practice is note recommended and should be explicitly declared.
    ⲁ̅ = U+2C81 U+0305 ?
    ⲓ̅ⲥ̅ = U+2C93 U+0305 U+2CA5 U+0305 ?
    ⲓ̅ⲏ̅ⲙ̅ = U+2C93 U+0305 U+2C8F U+0305 U+2C99 U+0305 ?
  • The correct use of U+0305 must be that of marking letters as numerals.
    ⲁ̅ = U+2C81U+0305 ?
    ⲃ̅ = U+2C83U+0305 ?
    ⲅ̅ = U+2C85U+0305 ?
    ⲇ̅ = U+2C87U+0305 ?

Pay attention: U+0305 is very similar to U+FE26 (at least in Antinoou font), but these two diacritics must not be mixed up and their use must not be confused!

Caveats, known issues and “won't-fixes”

Special attention must be paid to diacritics, particularly to superlinear strokes (see above). The converter will properly work and promptly guess the correct form to use in most cases if conversion from cmcl to unicode is performed. But it will fail to correctly convert the way back, specially in most complec cases.

For example, CMCL a_ (Coptonew: a_) will be correctly converted to Antinoou: ⲁ̄ (Unicode U+2C81 U+304), but it will not work the way back. Antinoou: ⲁ̄ (Unicode U+2C81 U+304) will be converted to CMCL a+ (Coptonew: a+). This should not be considered a bug and no fix will be provided in the future.

The same is true for other combinations, eg:

CMCL (Coptonew) Antinoou (Unicode) CMCL (Coptonew)

Coptonew nomina sacra shortcuts

ASCII shortcut Unicode output

CMCL's entities

Regex Verbose explanation Meaning Replace policy Examples
&([0-9]{1,2})n; an integer of one or two digits followed by n Lacuna of known length plus-minus (±, U+00B1) character followed by the number of missing characters, enclosed by brackets &2n; = [±2]
&([0-9]{1,2})\?; an integer of one or two digits followed by ? Lacuna of supposed length space and dot repeted the supposed length, enclosed by parentheses &2?; = ( . .)
&\?(cap|capitale); ? followed by string cap or capitale Unknown capital character space followed by dot (same output as entity &1?;) &?cap; = .
&[0-9]{1,2}b; an integer of one or two digits followed by question mark Blank space of known length Not to be rendered &2b; =
&([a-z]{1})\?; one alphabetic character followed by question mark Uncertain alphabetic character The alphabetic character followed by subliteral dot (U+0323) &a?; =
&coppa; coppa string Character coppa Character coppa (U+03D9) &coppa; = ϙ
One of the following strings (comma separated): basilios, Crs, Cs, eiote, ekklHsia, fq, i:lHm, iHl, iHs, ilHm, is, isrl, iws, js, monaCos, oute, pna The same string (CMCL encoding system) converted to Unicode &ekklHsia; = ⲉⲕⲕⲗⲏⲥⲓⲁ
&ebol_compresso; string: ebol_compresso CMCL's ebol equivalent in Unicode ⲉⲃⲟⲗ &ebol_compresso; = ⲉⲃⲟⲗ
&etcompresso; string: etcompresso CMCL's et equivalent in Unicode ⲉⲧ &etcompresso; = ⲉⲧ
&Hspir; string: Hspir Heta with combining dot above (U+2C8F U+0307) ⲏ̇ &Hspir; = ⲏ̇
&.b; Simple dot . &.b; = .

Tracking Papyrus and Parchment Paths.

An Archaeological Atlas of Coptic Literature Literary Texts in their Geographical Context: Production, Copying, Usage, Dissemination and Preservation

cmcl2unicode is an open source software available for download or fork on GitHub. Please report any issue you might encounter here.