home
products
contribute
download
documentation
forum
Home
Forums
New posts
Search forums
What's new
New posts
All posts
Latest activity
Members
Registered members
Current visitors
Donate
Log in
Register
What's new
Search
Search
Search titles only
By:
New posts
Search forums
Search titles only
By:
Menu
Log in
Register
Navigation
Install the app
Install
More options
Contact us
Close Menu
Forums
MediaPortal 1
MediaPortal 1 Plugins
Popular Plugins
Moving Pictures
CSFD scraper script 0.2.3 [CZ]
Contact us
RSS
JavaScript is disabled. For a better experience, please enable JavaScript in your browser before proceeding.
You are using an out of date browser. It may not display this or other websites correctly.
You should upgrade or use an
alternative browser
.
Reply to thread
Message
<blockquote data-quote="JiRo" data-source="post: 710602" data-attributes="member: 91312"><p><strong>Re: CSFD scraper script 0.1.9 [CZ] - 100% succes hit (558 movies)</strong></p><p></p><ul> <li data-xf-list-type="ul"><br /> 1st of all - Trottel, many <img src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7" class="smilie smilie--sprite smilie--sprite8" alt=":D" title="Big Grin :D" loading="lazy" data-shortname=":D" /> for your perfect work. But...<br /> <br /> When I have used first time your scraper script, I have reached 40% succesfull hits. It was in excess of former version of scraper, but still poor. My friend has 100% hit, but he uses english names of movie files and IMDB scraper. My target was 100% hit with czech names and CSFD scraper too <img src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7" class="smilie smilie--sprite smilie--sprite8" alt=":D" title="Big Grin :D" loading="lazy" data-shortname=":D" />. I have started to read your script and found 1st small problem:<br /> <br /> <set name="rx_search_results_block"><br /> <![CDATA[<br /> >v originálních názvech</td>.+</body><br /> ]]><br /> </set><br /> <br /> expression ">v českých názvech" causes jump of czech movie names. Therefore I have replaced ">v originálních názvech" by ">v českých názvech". Result was much better then before. But some of czech movies that were succesfull before, were without hit now. Then I read your script more carefully and I have tried test on the CSFD web page. Whereon I found out that some czech movies aren't in ">v českých názvech" section but in ">v originálních názvech" <img src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7" class="smilie smilie--sprite smilie--sprite9" alt=":eek:" title="Eek! :eek:" loading="lazy" data-shortname=":eek:" /> and Czech section absent.<br /> Therefore I changed regular expresion part to:<br /> <br /> <strong><set name="rx_search_results_block"><br /> <![CDATA[<br /> >v českých názvech</td>.+</body><br /> ]]><br /> </set><br /> <br /> <set name="rx_search_results_block2"><br /> <![CDATA[<br /> >v originálních názvech</td>.+</body><br /> ]]><br /> </set></strong><br /> <br /> and part of code to:<br /> <br /> ...<br /> <parse name="search_results_block" input="${search_page}" regex="${rx_search_results_block}"/><br /> <strong> <if test="${search_results_block}="><br /> <parse name="search_results_block" input="${search_page}" regex="${rx_search_results_block2}"/><br /> </if></strong><br /> <if test="${search_results_block}!="><br /> <loop name="search_results_verified" on="search_results_block"><br /> ...<br /> <br /> Last change I did by number of searched movie, from previous 20 to 100. Few movies have serch result list very long...<br /> <br /> ...<br /> <set name="movie[${counter}].details_url" value="${site}film/${curr_details[0]}"/><br /> <subtract name="movie[${counter}].popularity" value1="<strong>100</strong>" value2="${counter}" /><br /> </loop><br /> ...<br /> <br /> Now I'm satisfied. The target 100% hit is achived! <img src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7" class="smilie smilie--sprite smilie--sprite7" alt=":p" title="Stick Out Tongue :p" loading="lazy" data-shortname=":p" /> and your condition:<br /> <ul> <li data-xf-list-type="ul"><br /> Movie name should be in original or English language </li> </ul><br /> can be extended to:<br /> <ul> <li data-xf-list-type="ul"><br /> Movie name should be in Czech, original or English language </li> </ul><br /> Maybe we should find out if exist movies with English name only <img src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7" class="smilie smilie--sprite smilie--sprite9" alt=":eek:" title="Eek! :eek:" loading="lazy" data-shortname=":eek:" /><br /> <br /> Curretly I have private 0.1.10 version of CSFD scraper <img src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7" class="smilie smilie--sprite smilie--sprite10" alt=":oops:" title="Oops! :oops:" loading="lazy" data-shortname=":oops:" />, but official release is up to you. You are author!<br /> <br /> JiRo.</li> </ul></blockquote><p></p>
[QUOTE="JiRo, post: 710602, member: 91312"] [b]Re: CSFD scraper script 0.1.9 [CZ] - 100% succes hit (558 movies)[/b] [LIST] 1st of all - Trottel, many :thx: for your perfect work. But... When I have used first time your scraper script, I have reached 40% succesfull hits. It was in excess of former version of scraper, but still poor. My friend has 100% hit, but he uses english names of movie files and IMDB scraper. My target was 100% hit with czech names and CSFD scraper too :D. I have started to read your script and found 1st small problem: <set name="rx_search_results_block"> <![CDATA[ >v originálních názvech</td>.+</body> ]]> </set> expression ">v českých názvech" causes jump of czech movie names. Therefore I have replaced ">v originálních názvech" by ">v českých názvech". Result was much better then before. But some of czech movies that were succesfull before, were without hit now. Then I read your script more carefully and I have tried test on the CSFD web page. Whereon I found out that some czech movies aren't in ">v českých názvech" section but in ">v originálních názvech" :o and Czech section absent. Therefore I changed regular expresion part to: [B]<set name="rx_search_results_block"> <![CDATA[ >v českých názvech</td>.+</body> ]]> </set> <set name="rx_search_results_block2"> <![CDATA[ >v originálních názvech</td>.+</body> ]]> </set>[/B] and part of code to: ... <parse name="search_results_block" input="${search_page}" regex="${rx_search_results_block}"/> [B] <if test="${search_results_block}="> <parse name="search_results_block" input="${search_page}" regex="${rx_search_results_block2}"/> </if>[/B] <if test="${search_results_block}!="> <loop name="search_results_verified" on="search_results_block"> ... Last change I did by number of searched movie, from previous 20 to 100. Few movies have serch result list very long... ... <set name="movie[${counter}].details_url" value="${site}film/${curr_details[0]}"/> <subtract name="movie[${counter}].popularity" value1="[B]100[/B]" value2="${counter}" /> </loop> ... Now I'm satisfied. The target 100% hit is achived! :P and your condition: [LIST] Movie name should be in original or English language [/LIST] can be extended to: [LIST] Movie name should be in Czech, original or English language [/LIST] Maybe we should find out if exist movies with English name only :eek: Curretly I have private 0.1.10 version of CSFD scraper :ooops:, but official release is up to you. You are author! JiRo.[/LIST] [/QUOTE]
Insert quotes…
Verification
Post reply
Forums
MediaPortal 1
MediaPortal 1 Plugins
Popular Plugins
Moving Pictures
CSFD scraper script 0.2.3 [CZ]
Contact us
RSS
Top
Bottom