0

I am trying to learning web scraping I choose https://www.betfair.com as an example, I have successfully get many pages data but when I am going to visit https://www.betfair.com/sport/horse-racing I did not get the full source however if I view page source from the browser its showing me the data, So its out of the question that the contents are generated by JavaScript or similar. Here is my code:

$url ='https://www.betfair.com/sport/horse-racing';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_COOKIEFILE, "cookie.txt");
curl_setopt($ch, CURLOPT_COOKIEJAR, "cookie.txt");
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.3) Gecko/20070309 Firefox/2.0.0.3");
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
$page = curl_exec($ch);
curl_close($ch);
echo $page;

If you can see when viewing the source by the browser you can find this:

<a href="/sport/horse-racing?action=loadRacingSpecials&tab=SPECIALS&  modules=multipick-horse-racing" class="ui-nav link ui-clickselect ui-ga-  click" data-dimension3="sports-header" data-dimension4="Specials"   data-dimension5="Horse Racing" data-gacategory="Interface"   data-gaaction="Clicked Horse Racing Header" data-galabel="Specials"
data-loader=".multipick-content-container > div, .antepost-content-  container > div, .future-racing-content-container > div, .bet-finder-content-  container > div, .racing-specials-content-container > div, .future-racing-  market-content-container > div"
>
Specials</a>

But curl is not getting these elements.

Share a link to this question
CC BY-SA 3.0
2

2 Answers 2

0

Fisrt of all the site betfair doesn't entertain doing spider on them (although people are doing this on a regular basis).

I know that I am expert at javascript of html. But things can happen that it was generated by the ajax call. If you use the firebug tool for mozila the you can see what request is the page making to have the data.

But most of all my suggestion will be to use the API they have. That is legal and have a free version with some limitation as well. Api link https://developer.betfair.com/

Share a link to this answer
CC BY-SA 3.0
1
  • Actually if I see the view page source on the website its written there, so its not that its generated by an ajax call. Feb 26, 2017 at 7:07
0

Try to save that in file you will notice that the code you are looking for is in there.

    $url ='https://www.betfair.com/sport/horse-racing';
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_HEADER, 0);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_COOKIEFILE, "cookie.txt");
    curl_setopt($ch, CURLOPT_COOKIEJAR, "cookie.txt");
    curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.3) Gecko/20070309 Firefox/2.0.0.3");
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
    $page = curl_exec($ch);
    curl_close($ch);

    $file = fopen("1.txt","a");
    fwrite($file,$page);
    fclose($file);
Share a link to this answer
CC BY-SA 3.0
1

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

Not the answer you're looking for? Browse other questions tagged or ask your own question.