home
products
contribute
download
documentation
forum
Home
Forums
New posts
Search forums
What's new
New posts
All posts
Latest activity
Members
Registered members
Current visitors
Donate
Log in
Register
What's new
Search
Search
Search titles only
By:
New posts
Search forums
Search titles only
By:
Menu
Log in
Register
Navigation
Install the app
Install
More options
Contact us
Close Menu
Forums
MediaPortal 1
MediaPortal 1 Plugins
Popular Plugins
My Lyrics Plugin
MyLyrics 0.15 - Comments and bug reporting
Contact us
RSS
JavaScript is disabled. For a better experience, please enable JavaScript in your browser before proceeding.
You are using an out of date browser. It may not display this or other websites correctly.
You should upgrade or use an
alternative browser
.
Reply to thread
Message
<blockquote data-quote="Gaukaren" data-source="post: 129520" data-attributes="member: 29966"><p>It shouldn't be that hard. I'm useless in .NET, but I'm sure there must a common library to do this.</p><p></p><p>I'll give an example, though. <a href="http://www.lyricswiki.org/artist_b/bjorn_eidsvag_lyrics/skyfri_himmel_lyrics.html" target="_blank">Take a look at this page</a>: It's a Norwegian song (and the first i came across that had lots of errors in the lyrics) which uses the 'special' characters æ, ø, and å.</p><p></p><p>The text obviously isn't stored as UTF8 at Lyricswiki, yet the server and metadata says it is. This results in the special characters being misinterpreted. In Unicode, the code for 'unknown character is 0xFFFD, which in Firefox render as a '?' inside a black diamond shape. Now, this string is already compromised, so if we convert it to plain text the unicode character we want to convert isn't there anymore. One guess what happens if we do UTF8->ISO-8859-1 on this? Yes, we get our infamous question mark.</p><p></p><p>What we want is for our program to do the same thing as the browser does if we go back to the page and choose View->Character Encoding->Western (ISO-8859-1) in Firefox. Goodbye ugly black diamond '?', welcome weird Norwegian vowels.</p><p></p><p>As I mentioned, my C# skills are beyond pitiful, but a can post how to do it with a few lines of Perl (in the vague hope that you might gleam something useful from it).</p><p>[CODE]use Unicode::String;</p><p>use LWP::Simple;</p><p> #Grab lyric with ugly and wrong encoding of chars 'æ', 'ø', and 'å', store to a scalar.</p><p>my $string = get("http://www.lyricswiki.org/artist_b/bjorn_eidsvag_lyrics/skyfri_himmel_lyrics.html");</p><p> # Create new unicode string object</p><p>my $us = Unicode::String->new;</p><p> # The lyric in the data grabbed from the internet is a latin1 string mistakenly sent as UTF8</p><p> $us->latin1( $string ); </p><p> # So we turn it back into a proper string using the default conversion encoding (UTF8).</p><p> $string = $us->as_string;</p><p> # Drop the extra HTML. And, no, this isn't a proper parser. Sue me...</p><p> $string =~ s|<pre>(.*?)</pre>|$1|gis;</p><p> $string = $1;</p><p> # Write the fixed lyric to a file</p><p> open (FIXED, ">fixed.txt") or die "Could not write file.";</p><p> print FIXED "$string";</p><p> close (FIXED);[/CODE]Good luck, and thanks for your work so far on a great plugin.</p><p></p><p>Edit: Not using <a href="http://www.lyricwiki.org/Blind_Guardian:Bj%C3%B8Rn_Eidsv%C3%A5g_-_Skyfri_Himmel" target="_blank">the 'proper' Lyricwiki site</a> was by design, as this particular lyric seemed to be genuinely busted there. However, I just noticed that this tag (which had previously eluded me - yes, I'm slow sometimes) <a href="http://www.lyricwiki.org/Category:Requests_For_Edits/Accented_Characters" target="_blank">is on a shitload of songs over there</a>...</p><p>The reason for this is probably that they made the very mistake I attributed to My Lyrics earlier (sorry about that) when they imported these lyrics into their system in the first place. Thus, it's not easily fixable. The information is already lost.</p><p></p><p>I'll check out the other sites available in 0.15 and see what's going on there.</p></blockquote><p></p>
[QUOTE="Gaukaren, post: 129520, member: 29966"] It shouldn't be that hard. I'm useless in .NET, but I'm sure there must a common library to do this. I'll give an example, though. [url=http://www.lyricswiki.org/artist_b/bjorn_eidsvag_lyrics/skyfri_himmel_lyrics.html]Take a look at this page[/url]: It's a Norwegian song (and the first i came across that had lots of errors in the lyrics) which uses the 'special' characters æ, ø, and å. The text obviously isn't stored as UTF8 at Lyricswiki, yet the server and metadata says it is. This results in the special characters being misinterpreted. In Unicode, the code for 'unknown character is 0xFFFD, which in Firefox render as a '?' inside a black diamond shape. Now, this string is already compromised, so if we convert it to plain text the unicode character we want to convert isn't there anymore. One guess what happens if we do UTF8->ISO-8859-1 on this? Yes, we get our infamous question mark. What we want is for our program to do the same thing as the browser does if we go back to the page and choose View->Character Encoding->Western (ISO-8859-1) in Firefox. Goodbye ugly black diamond '?', welcome weird Norwegian vowels. As I mentioned, my C# skills are beyond pitiful, but a can post how to do it with a few lines of Perl (in the vague hope that you might gleam something useful from it). [CODE]use Unicode::String; use LWP::Simple; #Grab lyric with ugly and wrong encoding of chars 'æ', 'ø', and 'å', store to a scalar. my $string = get("http://www.lyricswiki.org/artist_b/bjorn_eidsvag_lyrics/skyfri_himmel_lyrics.html"); # Create new unicode string object my $us = Unicode::String->new; # The lyric in the data grabbed from the internet is a latin1 string mistakenly sent as UTF8 $us->latin1( $string ); # So we turn it back into a proper string using the default conversion encoding (UTF8). $string = $us->as_string; # Drop the extra HTML. And, no, this isn't a proper parser. Sue me... $string =~ s|<pre>(.*?)</pre>|$1|gis; $string = $1; # Write the fixed lyric to a file open (FIXED, ">fixed.txt") or die "Could not write file."; print FIXED "$string"; close (FIXED);[/CODE]Good luck, and thanks for your work so far on a great plugin. Edit: Not using [url=http://www.lyricwiki.org/Blind_Guardian:Bj%C3%B8Rn_Eidsv%C3%A5g_-_Skyfri_Himmel]the 'proper' Lyricwiki site[/url] was by design, as this particular lyric seemed to be genuinely busted there. However, I just noticed that this tag (which had previously eluded me - yes, I'm slow sometimes) [url=http://www.lyricwiki.org/Category:Requests_For_Edits/Accented_Characters]is on a shitload of songs over there[/url]... The reason for this is probably that they made the very mistake I attributed to My Lyrics earlier (sorry about that) when they imported these lyrics into their system in the first place. Thus, it's not easily fixable. The information is already lost. I'll check out the other sites available in 0.15 and see what's going on there. [/QUOTE]
Insert quotes…
Verification
Post reply
Forums
MediaPortal 1
MediaPortal 1 Plugins
Popular Plugins
My Lyrics Plugin
MyLyrics 0.15 - Comments and bug reporting
Contact us
RSS
Top
Bottom