FilmInfo+ - A german movie details scraper with auto grouping (4 Viewers)

fischy667

Super User
  • Team MediaPortal
  • Super User
  • May 5, 2010
    958
    283
    41
    Rostock
    Home Country
    Germany Germany
    Ok,
    actually there are two mirror which should work, and the balancer chooses one of these two.

    balancer: http://ofdbbgw.org
    server 1: http://ofdbgw.scheeper.de
    server 2: http://ofdbgw.home-of-root.de

    But e.g. I tried a lot to get the ofdb id also for the "Dark Knight rises" with the imdb id and I never got a result.

    The address is: http://ofdbgw.org/imdb2ofdb/tt1345836

    The result should be the ofdb id, but I always get errorcode 2 "Timeout"

    And the same with the movie info: http://ofdbgw.org/movie/225533
     
    Last edited:

    RoChess

    Extension Developer
  • Premium Supporter
  • March 10, 2006
    4,434
    1,897
    Ok, from FilmInfo+ code there is:

    1st = http://ofdbgw.home-of-root.de/imdb2ofdb/tt1345836 = FAIL (timeout)
    2nd = http://ofdbgw.lirzg.net/imdb2ofdb/tt1345836 = WORKS
    3rd = http://ofdbgw.scheeper.net/imdb2ofdb/tt1345836 = FAIL (timeout)

    So yes, right now they are having issues, but one is still working. Of course the data is only as relevant as they can obtain it from OFDb and looking at the timeout errors it appears that things are not connecting properly. Perhaps access got blocked, scrape code needs to be adjusted, or some other issue.

    Aside from starting my own mirror there is really not much I can do about this though :(
     

    RoChess

    Extension Developer
  • Premium Supporter
  • March 10, 2006
    4,434
    1,897
    It already does though.

    FilmInfo+ cycles through those 3 in order. It tries them all 3 until eventually it fails on all and then I guess it falls back to IMDb English as final solution.

    So keep a close eye on your logfile, it is of course possible that your German IP gives you a totally different experience or that at that perticular moment the OFDb mirror servers are all having issues.
     

    Merlyn

    Portal Pro
    July 8, 2011
    250
    322
    Home Country
    Germany Germany
    First of all a very hugh Thank you, RoChess, for jumping in and doing all these changes and assisting!

    Second, my appology for being off the screen. I remember a while back getting a single notification email from the forum telling me someone posted something in a thread I didnt bother about so much. But ever since the forum was updated I only get notifications, when I actually open the website, If I skip one I wont get any more emails despite the notification option... So, again I' m sorry!

    Third, I always name my movies title (year) (imdb id).ext so I totally missed the search result changes imdb implemented. Again a big, warm Thank you, Rochess, for jumping in!

    As of the OFDb part in the platest posts. OFDbgw is and was a spare time project of one guy, and is not maintained for a long while now. The timeout issue comes up on both of the remaining mirrors, and sometimes goes away after a few minutes and sometimes persists for days. The GW-admin does not answer on emails, either.
    As RoChess said it would be better if only the load balancer would be used, but that one is buggy, too. I've had testruns on the balancer where it would direct me to the same mirror 20 times in a row, and always give me the timeout. Manually accessing all mirrors like I did gave me better overall results.

    Nonetheless that wasnt enough, so I imlemented filmstarts.de website. Actually I havent checked the code in a while, so it might be possible that there have been changes, too.

    I will review the fixes made by RoChess (did I say thank you, yet?) and check, why no german summary is loaded, as soon as possible!
     

    RoChess

    Extension Developer
  • Premium Supporter
  • March 10, 2006
    4,434
    1,897
    @Merlyn, one solution I thought off for the replication of the load-balancer was to scrape the mediaportal.log file itself (the folder location is now a variable in MovPic). RegExp the timestamp and find a random value of 1, 2 or 3. This will then determine what mirror you hit first, out of an array you can build inside your scraper-script.

    Something like:

    Code:
      <set name="ofdb_mirrors_array">
       <![CDATA[
      #[URL='http://ofdbgw.home-of-root.de/']http://ofdbgw.home-of-root.de[/URL]/#
      #[URL]http://ofdbgw.lirzg.net/[/URL]#
      #[URL]http://ofdbgw.scheeper.net/[/URL]#
      #[URL='http://ofdbgw.home-of-root.de/']http://ofdbgw.home-of-root.de[/URL]/#
      #[URL]http://ofdbgw.lirzg.net/[/URL]#
       ]]>
      </set>
    <parse name="ofdb_mirrors" input="${ofdb_mirrors_array}" regex="#([^#]+)#" />

    and then you can use "${ofdb_mirrors[0]}movie/" and "${ofdb_mirrors[0]}imdb2ofdb/" as before to use "http://ofdbgw.home-of-root.de/" site, but you can also use [1], and [2]. By then replacing the 0, 1, or 2 value (arrays start at '0', so keep that in mind) with the random value obtained from mediaportal.log file you can do ${ofdb_mirrors[${random_value}]} and do random_value + 1 on failure and random_value + 1 once more on another failure.

    That is why I repeated the 1st and 2nd mirror in the array. Of course you can also add smarter logic to the script that when random_value + 1 > 2 to make it '0', but you get the idea.

    Of course you can also stick with what you got if it is no problem to the OFDb mirrors, but the fact they have a load balancer in place makes me think they want to see an even spread of users hitting it. Since the data is not reliable on each mirror you have to hit them all 3 in order, but the above code should help to make that random.

    You will have to figure out a way to get 0/1/2 value out of mediaportal.log file though, my guess is you simply look for (?:0|1|2) at the seconds+miliseconds part of the timestamp, cuz 2012 year will otherwise ruin the random part. Whatever match hits first is your random value to go with then.

    PS: Glad you are back, cuz I didn't see myself having time to dig into the details-node to fix the summary problem.
     

    Merlyn

    Portal Pro
    July 8, 2011
    250
    322
    Home Country
    Germany Germany
    Actually its down to two mirrors by now. The lirzg mirror seems to be dead and was removed from ofdbgw.
    The other two remain somewhat operational, but most times return the timeout error. Actually while I was MIA I was working on an updated scraper that improved the ofdb part. Though I couldnt figure out a way to optain a random number, and hence started going through a loop 20 times contancting ofdbgw each time. But this did not yield the desired result, and also put more load on the balancer so in the end it was not really an improvement. Thats why it was not yet released.

    I did some tests on the details part last night, and ran into another nice problem. OFDb return no error, but an empty ofdb id string. Yay!

    Right now I am very close to ditching ofdbgw altogether and use Filmstarts.de and ofdb.de directly instead. This will prolly be a pretty big hit on scraper speed, but at least it should again return usable results...
     

    fischy667

    Super User
  • Team MediaPortal
  • Super User
  • May 5, 2010
    958
    283
    41
    Rostock
    Home Country
    Germany Germany
    Last edited:

    Merlyn

    Portal Pro
    July 8, 2011
    250
    322
    Home Country
    Germany Germany
    Funny... seems to be a new development, cause it was dead for a while... maybe someone is still maintaining the ofdbgw after all...
     

    Users who are viewing this thread

    Top Bottom