Globalization string management tool (1 Viewer)

Albert · December 3, 2010

There was already some talk with davidf about a new string management tool. But I guess I should make development talk more public, so I write some requirements for the new tool down here.
This thread should be used for further discussions about the requirements for that tool. Basically me and davidf are talking.
I don't want to see unrelated messages in this thread. VERY good ideas are allowed, but can also be sent to me by PM.

We'll see if that works. If not, the next development discussions will be internal again.

MP2 globalization basics
The resource management is quite simple. Technical, it works like this:

The localization system scans all plugins for language folders (language folders are registered in the plugin.xml file; the string management tool could look into the plugin.xml and find the language directory automatically, for example). In the language folders, the files strings_XX.xml are read, with XX=current language.
String files look like this:

Code:

<Section Name="Abc">
  <String Name="Xyz" Text="Hallo"/>
</Section>

Section names and string names are case-insensitive, but should be written in the correct case.

Some text names also are like this, which is valid:

Code:

<Section Name="Abc">
  <String Name="Rst.Uvw.Xyz" Text="Hallo"/>
</Section>

Not valid is a section name with dots inside.

The files are read into a map with two levels, first level is section names mapped to section instances, section instances contain another map with string names mapped to string values.

String values can also contain placeholders, like this:

Code:

  <String Name="StringWithPlaceholders" Text="Hallo {0}"/>

But that should not be interesting for the resource management tool development.

After all files are read, we have one big map of sections with many maps with strings. All resources are put together into one big map (scan for collisions would be sensible here to be done in your tool).

Requirements for the new string management tool
The idea for the new tool is, that every plugin brings all strings which are needed in that plugin. It is also allowed to access strings of referenced plugins (your tool could check that no strings from plugins are accessed which are not referenced explicitly).

My idea is now that the tool should show ALL strings, in different grouping modes: Per plugin, per section, per language etc.
If you select one string, the tool shows where it is defined and where it is used. I also want to see the code which uses it, with the string reference highlighted.
Strings which are concatenated in the code like "[" + SectionName + "." + StringName + "]", COULD be found by your tool and displayed as potential matches, but that isn't needed for the first version.

I want functions like "Find missing strings", "Find unused strings" etc., I'm sure there are many more sensible functions.
I want refactoring functions (rename of a string, move a string to a different section, to a different file, to a new file, ...)
I want a function to create missing strings in a target language, where the string names are copied from a source language (english, for example). That is needed if someone wants to create or update language resources of another language. The new strings should be saved in files which are located next to the strings with their original string.

I'm sure you can imagine many more useful functions if you think about which work someone has to do to edit localization resources effectively.

You see, the final tool needs much more complex data structures than the current simple tool. I guess you need a sort of database for the strings in the background. And I would hold all data in memory, tracking what was changed and writing the changes in the target files on "save".

Much, much work.

davidf · December 3, 2010

Albert,

I can think of tons of things to do with them - if there is a valid source of data.

The strings_<lang>.xml are currently the only reliable source of data. Take for instance the code from WeatherDotComCatcher:

Code:

        ILocalization localization = ServiceRegistration.Get<ILocalization>();
        // handle only mappings of different spellings
        if (string.Compare(tokenSplit, "T-Storms", true) == 0 || string.Compare(tokenSplit, "T-Storm", true) == 0)
          localizedWord = localization.ToString("[Weather.TStorm]");
        else if (string.Compare(tokenSplit, "Cloudy", true) == 0)
          localizedWord = localization.ToString("[Weather.Clouds]");
        else if (string.Compare(tokenSplit, "Shower", true) == 0 ||
                 string.Compare(tokenSplit, "T-Showers", true) == 0)
          localizedWord = localization.ToString("[Weather.Showers]");
        else if (string.Compare(tokenSplit, "Isolated", true) == 0)
          localizedWord = localization.ToString("[Weather.Scattered]");
        else if (string.Compare(tokenSplit, "Gale", true) == 0 ||
                 string.Compare(tokenSplit, "Tempest", true) == 0)
          localizedWord = localization.ToString("[Weather.Storm]");
        else 
          // for all other tokens do a direct lookup
          localizedWord = localization.ToString("[Weather."+tokenSplit+"]");

        localizedLine.AppendFormat("{0} ", localizedWord ?? tokenSplit); //if not found, let fallback
      }
      return localizedLine.ToString();

with all the code inspection in the world there is no way to determine that [Weather.DewPoint] is valid (or invalid for that matter). One way to get around this is to make the use of localized strings explicit:

[LocalizedString("Weather.DewPoint")] or similar somewhere in the code (or more likely assembly) where it can be examined and compared against the strings_<lang>.xml files to determine if any are missing or extra. Similarly [ImportedLocalizedString("Media.MusicMenuItem")] could be used to declare uses of imported strings.

So far we have achieved a chore for devs, so to make it work a carrot (or two) is needed.

1. Carrot (and stick): A tool to run (after compilation?) which will prompt on new and missing strings and test for the existence of imported strings.
2. Another tool to manage translations/editing of existing files (this is pretty close to what's there). It would need the functionality of the tool in 1 to be complete rounded solution.

Without a second source of data most of the really useful functionality is not possible or it would have disproportionate time needed spent (and wouldn't every be accurate as can be seen).

Achieving a second source of data to make string management work correctly needs a change - it simply cannot work with only one data source. My preference would be a method which can be used on assemblies rather than on code - as sometimes code will not be easily available.

Skin files are a slightly different proposition and would more easily be parsed.

Opinions/alternatives welcome

David

Albert · December 4, 2010

AW: Globalization string management tool

I absolutely understand your problem with those string instances which are composed at runtime. You're absolutely right, it is not computable which of those dynamic occurences are globalized strings. That's the reason why such a "Find unused strings" function potentially can never find all occurences. Always a user has to look over the result, and the tool must be built to make it possible for the user to check each single search result.

The tool should be a help for devs who used localization resources as said in the guidelines (which are not written yet...). That means, in code we have localization string references like this:

Code:

    public const string RES_PLAYLIST_LOAD_NO_PLAYLIST = "[Media.PlaylistLoadNoPlaylistText]";

    ...

    dialogManager.ShowDialog(SkinBase.General.Consts.RES_SYSTEM_ERROR, Consts.RES_PLAYLIST_LOAD_NO_PLAYLIST, DialogType.OkDialog, false, null);

That example is taken from the Media plugin.
Another way to reference localized strings is to use those placeholders as literal constants in the code like

Code:

  SomeMethod("[SomeSection.SomeString]");

To scan both of those strings, you simply need to scan for "[ at the beginning, n alpha-num chars, a dot, n alpha-num chars and ]" at the end. Something like "\"[\w*\.\w*]\"" or similar.

Simply ignore strings which don't match that pattern, for example strings which are composed in code like in the weather plugin:

Code:

  localizedWord = localization.ToString("[Weather."+tokenSplit+"]");

Here, the person using the tool must know what he does, i.e. if the string manager tool says "String Weather.XYZ doesn't seem to be used", he must know that the weather plugin does such a hack. But that's the problem of code which doesn't explicitly declare globalization strings like the Media plugin does.
Later, a function which finds potential matches would be nice. It could scan for the regex pattern "\"[.*]\" for example.

In the XAML files, parsing should be more simple, here, the simple regex pattern should always work.

In your PM, you asked how to identify plugin dependencies. Such dependencies are defined in the plugin.xml file, you can see an example in the Media plugin. So the string management tool also needs to parse plugin descriptors (the plugin.xml files are called plugin descriptors). The parser class is PluginDirectoryDescriptor.

davidf · January 25, 2011

Albert,

I've done the initial screens for what I envisaged the tool would need. Now I've just got lots of functionality to add

. I'm working with a number of assumptions at the minute:

English is the base language - I want to highlight mismatches in list of available languages against English. And potentially always show the English version of a string.
The tool needs to be able to run against compiled binaries (this is how I think most people will use it - you are an exception).
The tool should not manage Plugin.xml (i.e. to set the language directory).
All Changes to be committed together (I'm not sure about that one).
More than one plugin can be worked on at the same time (or at least viewed).
An option should be available to view all strings simultaneously.

Make sense?

I'm just refactoring for an in memory data store before I go any further, so feel free to comment as heavily as you want as lots will be getting changed anyway. Is it worth allowing Excel to be used as the editing grid as it comes with all of the filtering functionality by default (and most people have it) falling back to a more basic interface if it's not availiable?

Albert · January 25, 2011

AW: Globalization string management tool

Hey David, that looks pretty well!
I would say, all your assumptions are correct. And I like the UI very much - to have a base form for the complete project and to have an own form per plugin.
The only problem I currently see is this: if new localization resources for a new language should be added, the developer isn't forced to add those resources in the same plugin as the english language - in fact, a new language will most probably be provided in a separate plugin which covers the strings for one or multiple other plugins.

Example 1:
The default plugins Media, SkinBase, Weather, ... contain the default (english) language.
A second language plugin for german will provide german resources for all those default plugins.

Example 2:
If a user has installed a plugin OnlineVideos (for example), there also exists a german localization plugin which covers the string resources for OnlineVideos (and maybe for more). That means, this user will probably have multiple plugins installed containing german language and all those plugins are different from the plugins where the base english resources are located.

To make that manageable, I think we need another kind of dependency concept; the dependency of a language plugin (e.g. german for OnlineVideos) to the one (or more) "parent" plugin OnlineVideos where the default (english) language is contained. Only if the StringManagement tool knows that dependency, it can check all german strings against the default strings.

Smeulf · January 25, 2011

Re: AW: Globalization string management tool

Albert said:
To make that manageable, I think we need another kind of dependency concept; the dependency of a language plugin (e.g. german for OnlineVideos) to the one (or more) "parent" plugin OnlineVideos where the default (english) language is contained. Only if the StringManagement tool knows that dependency, it can check all german strings against the default strings.

Albert,

I think this is possible only if you do not permit any other language than the default english one in all plugins, or you have the risk it becones very hard to understand how languages works...

And then I assume you want to get the "parent" plugin with it's GUID...

But, maybe you should introduce a naming convention for the localized strings plugins, or you can have a plugin named "arhejoj" to be the language plugin for the german translation... Once again, hard to understand, and to find any problems...

Idea (don't know if it possible or ok in the MP2 concept) : you could have a folder named "Localisation", containing plugins for the translations...

@davidf : Your tool have a good look. I like this.
A question : Will the tool produce the full language plugins by itself ? Including new GUID, plugin.xml file... Could be very cool

Cheers.

Smeulf.

davidf · February 1, 2011

I don't think that it as simple as a Localization folder as you'd need to multiple plugins there i.e. one strings_de contains 90% of what you want so you need another. It actually makes a lot of sense to push the language files to where they should be (in the plugin which the tanslation is for) but again there are a number of issues with that approach - what do you do if one of the plugins is not present when the language pack is installed.

I have a few assumptions based on the fact that english strings should be present for each plugin - which may be nonsensical to some plugins, but something consistent is needed to build everything else on.

Producing a full plugin would not really be an issue but one of the things I don't like is that dependencies could not be done properly i.e. a plugin with the German language for online videos could not depend on online videos as if there is a translation for my tv series present then that would not be loaded if online videos was not present (that's why pushing the translations to the plugins seems nice).

The best way I can come up with at the minute is to put an attribute in the section which says which plugin is refered too - and this defaults to the current plugin where the attribute is not specified. That seems to give the flexibilty needed while still making languages identifiable to their plugin. It also allows identification of missing/extra translation strings easy

I had been using English as the base for that idea but it could have went very wrong.

davidf · February 1, 2011

Just realised that putting the attribute into the section means that two translation plugins for the same language could be combined into one strings file. This would make the localization folder a possibility.

Albert · February 2, 2011

AW: Re: Globalization string management tool

davidf said:
I don't think that it as simple as a Localization folder as you'd need to multiple plugins there i.e. one strings_de contains 90% of what you want so you need another. It actually makes a lot of sense to push the language files to where they should be (in the plugin which the tanslation is for) but again there are a number of issues with that approach - what do you do if one of the plugins is not present when the language pack is installed.

"it makes a lot of sense to push the strings to where they should be" -> What exactly do you mean by that? During translation time or during runtime?
At translation time, I don't want to change the original plugins (which are being localized). Why? Because our community members will do translations in their favorite language and send us the localization as a new plugin. That's why I think localizations will most probably be in separate plugins. And off course, the translator doesn't want to create N plugins; he will probably create ONE single plugin where his translation is contained.

davidf said:
I have a few assumptions based on the fact that english strings should be present for each plugin - which may be nonsensical to some plugins, but something consistent is needed to build everything else on.

That assumption is absolutely correct. Every plugin MUST provide english strings for each localized resource, so you can see the set of english strings as the master set - that set which consists of 100% of the strings which need to be localized. If a plugin dev puts another strings file into his plugin, maybe an english strings file and a french one, and if they are different, the english one counts.

davidf said:
Producing a full plugin would not really be an issue but one of the things I don't like is that dependencies could not be done properly i.e. a plugin with the German language for online videos could not depend on online videos as if there is a translation for my tv series present then that would not be loaded if online videos was not present (that's why pushing the translations to the plugins seems nice).

The best way I can come up with at the minute is to put an attribute in the section which says which plugin is refered too - and this defaults to the current plugin where the attribute is not specified. That seems to give the flexibilty needed while still making languages identifiable to their plugin. It also allows identification of missing/extra translation strings easy I had been using English as the base for that idea but it could have went very wrong.

My idea is something like that:
Lets say, we identify a standard way how translations are done. Maybe the way I explained above: Normal plugins will contain english strings and then there are additional language plugins containing languages for multiple other plugins.

If the string manager tool is started, it sees multiple plugins, multiple languages and multiple dependencies. The tool has the job to support the translator with his work, so it would be good if it could give as much help as possible. But, the problem is, because our system is so flexible, the tool cannot really decide which situation is ok and which is not ok. And here comes the "standard way" into play; The tool can compare the situation it finds in the plugins and strings files with the standard way how things should be done. Then it can give the user hints in which aspects the current situation differs from the way how it should be done.

The user can also do it completely different, but then, the tool cannot help him so much as if he would do it as expected.

Would that be a solution?

Smeulf · September 2, 2011

Re: AW: Re: Globalization string management tool

Hi davidf, Hi everybody

davidf, did you made any progress with that tool ? Can I help you in some ways ?

Cheers.

Smeulf.

Globalization string management tool (1 Viewer)

Albert

MP2 Developer

davidf

Retired Team Member

Albert

MP2 Developer

davidf

Retired Team Member

Attachments

Albert

MP2 Developer

Smeulf

Retired Team Member

davidf

Retired Team Member

davidf

Retired Team Member

Albert

MP2 Developer

Smeulf

Retired Team Member

Users who are viewing this thread