FilmInfo+ - A german movie details scraper with auto grouping (3 Viewers)

Helios61

Retired Team Member
  • Premium Supporter
  • January 30, 2008
    4,587
    873
    62
    NRW
    Home Country
    Germany Germany
    Hallo @Merlyn

    Habe leider Schwierigkeiten "Der Hobbit - Smaugs Einöde" korrekt zu importieren. Der Film wird leider, trotz korrekter IMDB Nummer im Dateinamen, als der nächste (und letzte) Teil der Trilogie "Der Hobbit - Hin und zurück" importiert. Das Logfile findest du als Anhang. Wäre schön, wenn du helfen könntest.

    Gruß
    Helios
     

    Attachments

    • movingpictures.rar
      7.3 KB

    badboyxx

    Portal Pro
    June 15, 2012
    728
    97
    Home Country
    Germany Germany
    Da man zur Kategorie "Kinder-/Familienfilm" kein entsprechendes Image für MovPic erstellen kann, da die Datei genau so heissen muss wie die Kategorie und somit Sonderzeichen enthalten würde, wollte ich das es automatisch als Kinder- & Familienfilm importiert wird.
    Ich habe in der XML-Datei folgendes geändert:
    Code:
    <set name="genre_translation_table">
                    <![CDATA[
                        Adventure#             Abenteuer#               
                        Biography#             Biographie#              
                        Comedy#                Komödie#                 
                        Crime#                 Krimi#                   
                        Documentary#           Dokumentation#           
                        Family#                Kinder- & Familienfilm#    
                        Film-Noir#             Komödie#                 
                        History#               Historienfilm#                
                        Music#                 Musikfilm#               
                        Musical#               Musikfilm#               
                        Romance#               Liebe & Romantik#          
                        Sci-Fi#                Science-Fiction#         
                        Sport#                 Sportfilm#               
                        War#                   Krieg#                   
                        Biografie#             Biographie#              
                        Familie#               Kinder- & Familienfilm#    
                        Martial Arts#          Kampfsport#              
                        Monumentalfilm#        Historienfilm#           
                        Musik#                 Musikfilm#               
                        Romanze#               Liebe & Romantik#          
                        Sci-Fi#                Science-Fiction#         
                        Spionage#              Krimi#                   
                        Tragikomödie#          Komödie#           
                        Zeichentrick#          Zeichentrick# 
                        filmfilm#              film#                    
                        Kinder- & Kinder- & Familienfilmnfilm#     Kinder- & Familienfilm#               
                        Musikfilmal#           Musikfilm#
                        Animation#             Animation#                      
                    ]]>
                </set>


    Habe ich da ein Denkfehler drin oder kann es sein, daß wenn ein Film schon mit "Kinder-/Familienfilm" importiert wurde und ich diesen neu importiere und dann wieder falsch erkannt wird?
    Jedes mal manuel abändern ist schon lästig.
     

    Brainiac

    Member
    August 10, 2014
    2
    6
    54
    Home Country
    Germany Germany
    Hallo Zusammen,

    beim Import aus der IMDB landen in den Felder Directors und Writers neben den Namen öfters einige HTML Fragmente.
    Da Merlyn wohl im Moment nicht dazu kommt, habe ich mir das Script mal näher angesehen und modifiziert.
    Der Fehler scheint jetzt nicht mehr aufzutreten.
     

    Attachments

    • 180736_FilmInfo_V1.3.9.xml
      74 KB

    Helios61

    Retired Team Member
  • Premium Supporter
  • January 30, 2008
    4,587
    873
    62
    NRW
    Home Country
    Germany Germany
    Der Fehler scheint jetzt nicht mehr aufzutreten.

    Das nenn ich mal einen Einstand ;)! Herzlich willkommen und viel Spaß mit MediaPortal, Brainiac! Werde das Script morgen sofort verwenden. Der Fehler war mir auch schon aufgefallen. Vielen Dank für deine Arbeit!

    Helios
     

    badboyxx

    Portal Pro
    June 15, 2012
    728
    97
    Home Country
    Germany Germany
    Kann jemand die Datei erweitern , in dem die Sachen in der Klammern bei Writer nicht mit importiert werden?
    Z. B. Max Mustermann (2 Series...)
    Ich weiss es geht nicht so einfach aber man kann vielleicht sagen, wenn Klammern vorhanden, dann weglassen ab ein Leerzeichen vor Klammer auf.
    Meine Programmierkentnisse sind zu schwach, sonst würde ich es versuchen. Das wäre bis jetzt noch das einzige, was man noch von Hand abändern muss.
     

    RoChess

    Extension Developer
  • Premium Supporter
  • March 10, 2006
    4,434
    1,897
    Z. B. Max Mustermann (2 Series...)

    If you can get me the Moving-Pictures scraper debug results on the raw HTML input before it gets added to $movies.writers, then I can give you the Regular Expression based corrections to fix that. It is actually easy to do, but I need to know the existing expression used and an example of the raw HTML code that it currently gets. All this info is in the MovPic scraper debug results (with the little green bug-icon enabled), so just get me the movingpictures.log file as per the instructions I've made for IMDb+:

    Activate scraper-debug mode = http://code.google.com/p/imdbplus/wiki/WikiInstallScraper
    And use the "Getting the log file" section @ http://code.google.com/p/imdbplus/wiki/DebugIMDb to get me an easier to read log file please.
     

    badboyxx

    Portal Pro
    June 15, 2012
    728
    97
    Home Country
    Germany Germany
    If you can get me the Moving-Pictures scraper debug results on the raw HTML input before it gets added to $movies.writers, then I can give you the Regular Expression based corrections to fix that. It is actually easy to do, but I need to know the existing expression used and an example of the raw HTML code that it currently gets. All this info is in the MovPic scraper debug results (with the little green bug-icon enabled), so just get me the movingpictures.log file as per the instructions I've made for IMDb+:

    Activate scraper-debug mode = http://code.google.com/p/imdbplus/wiki/WikiInstallScraper
    http://code.google.com/p/imdbplus/wiki/WikiInstallScraper
    I hope I did it right. Here is the log (movingpictures.zip)[/QUOTE]


    And use the "Getting the log file" section @ http://code.google.com/p/imdbplus/wiki/DebugIMDb to get me an easier to read log file please.


    I'm not using IMDB+ but I tried it with Filminfo+ (movingpictures2.zip). If I did something wrong, let me know and I'll give you a new log file.
     

    Attachments

    • movingpictures.zip
      188.5 KB
    • movingpictures2.zip
      43.6 KB

    RoChess

    Extension Developer
  • Premium Supporter
  • March 10, 2006
    4,434
    1,897
    movingpictures.zip:

    Code:
    18-Aug-2014 10:45:06 Debug [  ScraperNode]: executing parse: <parse name="writers" input="${details[0].drehbuch}" xpath="//name" />
    18-Aug-2014 10:45:06 Debug [  ScraperNode]: Assigned variable: writers.count = 1
    18-Aug-2014 10:45:06 Debug [  ScraperNode]: Assigned variable: writers[0] = Christopher Miller
    18-Aug-2014 10:45:06 Debug [  ScraperNode]: Assigned variable: movie.writers =
    18-Aug-2014 10:45:06 Debug [  ScraperNode]: Assigned variable: writer = Christopher Miller
    18-Aug-2014 10:45:06 Debug [  ScraperNode]: Assigned variable: count = 0
    18-Aug-2014 10:45:06 Debug [  ScraperNode]: Assigned variable: movie.writers = |Christopher Miller

    So that should result in "Christopher Miller", and does not demonstrate the failure you mentioned.

    movingpictures2.zip gives me:

    Code:
    18-Aug-2014 11:35:08 Debug [  ScraperNode]: Assigned variable: rx_writers_block = (?<=Writing\scredits)(?<WritersBlock>.*?)(?=</table>)
    18-Aug-2014 11:35:08 Debug [  ScraperNode]: Assigned variable: rx_writers = (?:>)([^<]+?)(?:</)
    18-Aug-2014 11:35:08 Debug [  ScraperNode]: Assigned variable: rx_cmnt = (\(\s*WGA\s*\))|(\(in alphabetical order\))|(\sand\s)|(&)
    18-Aug-2014 11:35:08 Debug [  ScraperNode]: executing parse: <parse name="writers_block" input="${cast_page:htmldecode}" regex="${rx_writers_block}" />
    18-Aug-2014 11:35:08 Debug [  ScraperNode]: name: writers_block ||| pattern: (?<=Writing\scredits)(?<WritersBlock>.*?)(?=</table>) ||| input: [not logged due to size]
    18-Aug-2014 11:35:08 Debug [  ScraperNode]: Assigned variable: writers_block.count = 1
    18-Aug-2014 11:35:08 Debug [  ScraperNode]: Assigned variable: movie.writers =
    18-Aug-2014 11:35:08 Debug [  ScraperNode]: executing replace: <replace name="writers" input="${writers_block}" pattern="${rx_tag}" with=" " />
    18-Aug-2014 11:35:08 Debug [  ScraperNode]: Assigned variable: writers =  Erica Rivinoja  ...  (screenplay) &  John Francis Daley  ...  (screenplay) &  Jonathan M. Goldstein  ...  (screenplay) (as Jonathan Goldstein)  Phil Lord  ...  (story) &  Christopher Miller  ...  (story) &  Erica Rivinoja  ...  (story)  Judi Barrett  ...  (characters) &  Ron Barrett  ...  (characters)
    18-Aug-2014 11:35:08 Debug [  ScraperNode]: executing replace: <replace name="writers" input="${writers}" pattern="${rx_cmnt}" with=" " />
    18-Aug-2014 11:35:08 Debug [  ScraperNode]: Assigned variable: writers =  Erica Rivinoja  ...  (screenplay)  John Francis Daley  ...  (screenplay)  Jonathan M. Goldstein  ...  (screenplay) (as Jonathan Goldstein)  Phil Lord  ...  (story)  Christopher Miller  ...  (story)  Erica Rivinoja  ...  (story)  Judi Barrett  ...  (characters)  Ron Barrett  ...  (characters)
    18-Aug-2014 11:35:08 Debug [  ScraperNode]: executing replace: <replace name="writers" input="${writers}" pattern="\s+" with=" " />
    18-Aug-2014 11:35:08 Debug [  ScraperNode]: Assigned variable: writers =  Erica Rivinoja ... (screenplay) John Francis Daley ... (screenplay) Jonathan M. Goldstein ... (screenplay) (as Jonathan Goldstein) Phil Lord ... (story) Christopher Miller ... (story) Erica Rivinoja ... (story) Judi Barrett ... (characters) Ron Barrett ... (characters)
    18-Aug-2014 11:35:08 Debug [  ScraperNode]: executing replace: <replace name="writers" input="${writers}" pattern="\)" with=")|" />
    18-Aug-2014 11:35:08 Debug [  ScraperNode]: Assigned variable: writers =  Erica Rivinoja ... (screenplay)| John Francis Daley ... (screenplay)| Jonathan M. Goldstein ... (screenplay)| (as Jonathan Goldstein)| Phil Lord ... (story)| Christopher Miller ... (story)| Erica Rivinoja ... (story)| Judi Barrett ... (characters)| Ron Barrett ... (characters)|
    18-Aug-2014 11:35:08 Debug [  ScraperNode]: Assigned variable: movie.writers =  Erica Rivinoja ... (screenplay)| John Francis Daley ... (screenplay)| Jonathan M. Goldstein ... (screenplay)| (as Jonathan Goldstein)| Phil Lord ... (story)| Christopher Miller ... (story)| Erica Rivinoja ... (story)| Judi Barrett ... (characters)| Ron Barrett ... (characters)|

    So we got a winner there.

    The Regular Expression "rx_cmnt" is what is used to clean up the results (weird way though), and current value is "(\(\s*WGA\s*\))|(\(in alphabetical order\))|(\sand\s)|(&)"

    No idea why Merlyn made the stuff between parenthasis static, as I can not think of any reason that should be used.

    So change "rx_cmnt" into: (?:\(as[^)]+\))|(?:\([^)]+)|(?:\s*\.{2,}\s*)|(?:&)

    And "<replace name="writers" input="${writers}" pattern="\)" with=")|" />" into: <replace name="writers" input="${writers}" pattern="\)" with="|" />

    And you will be good to go.

    Really weird way that was done on multiple lines, as it can all be done in a single Regular Expression, but that requires more work from me to explain, so you'll have to settle for this easy 'ugly' fix.

    PS: the bold stuff is the only stuff you end up adjusting into.
     

    badboyxx

    Portal Pro
    June 15, 2012
    728
    97
    Home Country
    Germany Germany
    So change "rx_cmnt" into: (?:\(as[^)]+\))|(?:\([^)]+)|(?:\s*\.{2,}\s*)|(?:&)


    When I change that argument in my file, I get an error message in MovPic config Movie Details Data Source: The script is malformed or not a moving pictures script.

    I attached my latest script. Maybe you can take a look.
     

    Attachments

    • FilmInfo.zip
      14 KB

    RoChess

    Extension Developer
  • Premium Supporter
  • March 10, 2006
    4,434
    1,897
    & symbol in XML needs to be inputted as &amp; to prevent XML errors during parsing... did the original "rx_cmnt" one use &amp;? then replace (?:&) in mine for (?:&amp;) as well
     

    Users who are viewing this thread

    Top Bottom