[Pending] Adding support for Right-to-left languages (Hebrew, Arabic...) (3 Viewers)

velis · October 23, 2014

Seems libraries for this are hard to come by. Further research only turned up d-type (commercial), seems pretty powerful. Besides yesterday's Pango (used in gnome, it seems).

I found this page where basics are explained.
It's not particulary hard to do, but honestly I believe using existing functions, such as the suggested DirectWrite might prove easier / more performant.
I have no idea why current approach was taken though. Perhaps there's a specific need for it which makes using such functions impossible.

In case it is decided to implement this on top of current renderer, here's a SO question with an excellent answer on how to determine whether a character is RTL or not.

morpheus_xx · October 24, 2014

Thanks @velis, this is very useful information

!

I also spent more time into the actual rendering part:
My preferred solution should be directly working with (D3D9Ex) textures, so we can render them directly. Unfortunately none of the available text renderers directly support this. A common way would be to render text to a bitmap, then create a texture of it and let it render into our scene (GUI controls).

But if you look at this approach, I'm afraid it's much less effective than the current rendering: now we only have a 1024x1024 texture per font/size and collecting vertices (the corner coordinates of glyphs in our texture) which are then rendering using the same texture.
But with bitmap->texture per string this would require many textures to be used. But I'm not sure if one big texture is better than many smaller ones, but I guess so.

I tried to find a way to let DirectWrite render directly into a texture. If that would work we could save a lot of copy/bitmap format changes etc.:
There exists an example to use shared resources (surfaces, textures) between DX and DirectWrite: https://github.com/enix/SharpDXSharedResources.
Unfortnutaley I only found this DX10 based example, although I've read that DX9Ex should be able to do the same. The major reason why we can't switch to DX10 is the lack of DXVA decoding support (at least I've read about this issue), which sounds like a No-Go for a video targeting application. (see http://devgurus.amd.com/thread/146158 and http://stackoverflow.com/questions/...ideo-h-264-decoding-with-directx-11-and-windo)

(edit: example for DirectWrite to bitmap: https://gist.github.com/Lyynx/3834740)

velis · October 24, 2014

I'm afraid I don't have much C# experience and I never tried core MP development so my full solution overhead would be too high.
But, if you find it useful, I could write some pseudo-code that would effectively be a proper mixed RTL-LTR text renderer and then you could translate that to proper function calls in the actual renderer.
Would that be helpful to you?

morpheus_xx · October 24, 2014

velis said:
But, if you find it useful, I could write some pseudo-code that would effectively be a proper mixed RTL-LTR text renderer and then you could translate that to proper function calls in the actual renderer.

That would be great!

The place I changed (for experimental reversal) was only this method: https://github.com/MediaPortal/Medi...nagement/AssetCore/FontAssetCore.cs#L442-L468

But I think there might be more changes required (i.e. partial text size measure)

velis · October 24, 2014

I'm almost done with this, but I encountered a very serious issue:
my sample text includes some farsi and it seems it renders significantly differently when characters are combined together than when they are standalone. I have no idea how the actual character fusion is performed. I will have to look it up. I believe the same problem exists at least for arabic, but have no idea what other languages have the same issues.
You mentioned the problem in post #9.

Anyway, I'll finish this up and then start researching the character fusion. I bet it's easy once you find the algorithm

P.S.: I also found a page with example hebrew text, but it's heavily accented and my current algorithm also has issues with rendering that.

velis · October 24, 2014

I found this discouraging page explaining the basics of character fusion. This needs lots of work still

I'm attaching the sample program. It's written in python 3.3 and seems to render text correctly. I say "seems" because the farsi characters seem to render differently depending on order of rendering (even though each is rendered separately). I believe this is tkinter-specific and should not show in MePo renderer.

I tried to comment the code a lot and also didn't use any of the nice Python stuff to make the code more understandable to a C# programmer. I hope it will do, especially because this implementation is really short (some 50 lines of code). It will however fail immediately when somebody makes a translation containing fused characters (well, this is actually also possible with western scripts and it fails with them as well).

What is done:
1. processes LTR / RTL text
2. properly handles direction changes within a string
3. Makes no attempt at character fusion / combining (I don't know the correct term)
4. Incorrectly handles whitespace characters as LTR (they should be direction-neutral) - this results in words being rendered in reverse order if main language setting is LTR and there are mutiple consecutive RTL words in a string.

Basically this takes care of RTL scripts without you having to worry about it. You could also not provide the language global RTL flag but then the strings would be left-aligned (but still rendered correctly) which doesn't look so nice in RTL languages.

If I may suggest two paths for a better solution:
1. Research the system renderer functions
2. Use one of the libraries to handle proper character fusion. I believe proper implementation requires a bit more code than provided here...

velis · October 25, 2014

Ahhh, according to this, DirectWrite DOES have API that enables you to render text directly to a texture. (wipes sweat)
Here's the relevant msdn article with a sample on how to do it. I'm not sure it the equivalent Direct2D page isn't even more appropriate for MePo.

I've also found the correct term for character fusion: typographic ligature (combining characters), and the rest (combining accents, etc.) can be found here (together with a listing of current major implementations)
- just in case somebody wants to educate themselves some more

morpheus_xx · October 25, 2014

I've started the C# implementation, based on the full unicode table (taken from your link

).

I now try to convert your python code (never used python before, so it's not that easy).

The first thing which might problematic is the initialization of the rtl start to fixed 1000:

Code:

        # initial settings
        if self.lang_is_rtl:
            x = 1000  # right bound of texture
            dir = -1

Shouldn't the start index be based on the actual string length (and its size)? As far as I understand our text rendering, it's based on collection of "rects" for each character to be rendered. The coordinates point to the characters bitmap (texture).

The later positioning of the constructed text (left or right aligned) is subject of the containing control.

Do you think this works without fixed "1000" as start?

velis · October 25, 2014

That 1000 is supposed to be the right border of the texture we're rendering into (text should be right-aligned). Current function just sets x to 0 (LTR), but there's no hint (that I could see) in the renderer as to how wide the texture is.

OH, re-reading made me think of one thing: that 1000 IS NOT index into string to be rendered. It is rendering offset in device units / pixels. String always renders from first to last character. The widths array contains info about character widths / offsets by which rendering offset has to move when the craracter is drawn.

P.S. Isn't CreateQuad() function the one that renders the desired glyph to the destination canvas @x,y? If this is so, the only piece of info you're missing right now is texture width.

morpheus_xx · October 26, 2014

I made many tests today, based on proper arranging of character order for rendering. This gives the same result as when we render the glyphs in different order, so I sort the string like it has to be rendered.

My latest algorithm is the shortes and most efficient yet, but still lacks minor details (see red marked places) in screenshot.

Code:

    public static string ReverseRTLCharsForRendering(string text)
    {
      StringBuilder sb = new StringBuilder(text.Length);

      int caretPosition = 0;
      for (int index = 0; index < text.Length; index++)
      {
        var codepoint = char.ConvertToUtf32(text, index);
        bool isRTL = _characterTable.IsRandALCat(codepoint);
        char c = text[index];
        sb.Insert(caretPosition, c);
        if (!isRTL && !IsWhiteSpaceFollowedByRTL(text, index))
          caretPosition++;
      }
      return sb.ToString();
    }

    static bool IsWhiteSpaceFollowedByRTL(string text, int index)
    {
      for (int i = index; i < text.Length; i++)
      {
        var codepoint = char.ConvertToUtf32(text, i);
        char c = text[i];
        if (char.IsWhiteSpace(c))
          continue;
        return _characterTable.IsRandALCat(codepoint);
      }
      return false;
    }

In summary, single direction texts are rendering properly, but mixed direction have small issues at start or end (and around whitespaces). But anyway a good progress.

For completeness, this is the check if the selected culture is RTL:

Code:

bool cultureIsRTL = ServiceRegistration.Get<ILocalization>().CurrentCulture.TextInfo.IsRightToLeft;

[Pending] Adding support for Right-to-left languages (Hebrew, Arabic...) (3 Viewers)

velis

MP Donator

morpheus_xx

Retired Team Member

velis

MP Donator

morpheus_xx

Retired Team Member

velis

MP Donator

velis

MP Donator

Attachments

velis

MP Donator

morpheus_xx

Retired Team Member

velis

MP Donator

morpheus_xx

Retired Team Member

Users who are viewing this thread