Using PHP To Scrape Websites Generated By JavaScript, jQuery, AJAX & JSON

Scraping websites generated by JavaScript or jQuery using PHP is a topic that I’ve received many requests for and one that I’ve been wanting to cover for a while now. More often than not, it’s just a single page or form that people are having issues with, but I wanted to wait until I found an entire site that is generated using JavaScript where at no point would traditional PHP web scraping techniques work.

Today is that day, and the site is NCR Silver, a Point-of-Sale (POS) system with a web management interface generated entirely by JavaScript.

You’ll need to signup for an account where you’ll get a free 14 day trial. More than enough time to work through this material and learn the techniques involved.

NCR POS Signup Form

Signup for a free 14 day trial at NCR POS.

When you receive your welcome email then we’ll be ready to get started!

Now, let’s navigate to the main login page and take a look. At first glance everything looks normal, wouldn’t you agree?

NCR POS JavaScript Login Page

When we take a first glance at the login page, even in the DOM inspector, everything looks normal.

But when we view the page source we see something else entirely.

We see that there are <noscript> </noscript> tags surrounding some HTML to be displayed to clients without JavaScript enabled informing them that they can’t access the website without JavaScript enabled – this could prove a problem for our web bot written in PHP & cURL, since cURL cannot process JavaScript.

For clients with JavaScript enabled we see a series of document.write() statements to display the HTML code for the login page. Now this could cause an issue for us if the HTML was dynamically generated and we needed JavaScript enabled to actually view it (more on this later). But, as it is, the HTML is hardcoded into the page and we can see the HTML that would be displayed if we had JavaScript enabled.

noscript login

Source of page showing what is displayed to clients without JavaScript.

script login

Source of page showing what is displayed to clients with JavaScript.

From studying the HTML login form using the Tools > Web Developer > Inspector we can assertain what information we need in order to submit the login form to authenticate and build an array from the data:

$credentials = array(
	'username' => $userEmail,       // Your email address
	'password' => $userPass,        // Your password
	'RememberMe' => 'true',         // Staying logged in
	'IsAjaxRequest' => 'false'      // Whether request is AJAX

Ar this point I’m going to introduce a new method of determining that data, as it is one we will be using heavily once we get into the admin area. First you need to download and install the Live HTTP Headers plugin: Firefox, Chrome. There are Internet Explorer alternatives, but since I’m not familiar with them and Internet Explorer is a piece of shit, they won’t be covered here.

With the Live HTTP Headers plugin installed we can fire it up from Tools > Live HTTP Headers and make sure the Capture checkbox is selected. Now we manually submit the web login form and you should see the HTTP Headers window begin to fill up with data.

All we have to do now is navigate to the POST request for the login form, POST /app/Account/LogOn HTTP/1.1 and look at the data being submitted.

Live HTTP Headers Login

Live HTTP Headers plugin showing the headers sent when we submit the login form, including our login details.

Now that we have the required info we can just make a simple cURL POST request to get ourselves logged in.

// Function to submit form using cURL POST method
function curlPost($postUrl, $postFields) {
	$useragent = 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv: Gecko/20100401 Firefox/3.6.3';	// Setting useragent of a popular browser
	$cookie = 'cookie.txt';	// Setting a cookie file to store cookie
	$ch = curl_init();	// Initialising cURL session

	// Setting cURL options
	curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);	// Prevent cURL from verifying SSL certificate
	curl_setopt($ch, CURLOPT_FAILONERROR, TRUE);	// Script should fail silently on error
	curl_setopt($ch, CURLOPT_COOKIESESSION, TRUE);	// Use cookies
	curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);	// Follow Location: headers
	curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);	// Returning transfer as a string
	curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie);	// Setting cookiefile
	curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie);	// Setting cookiejar
	curl_setopt($ch, CURLOPT_USERAGENT, $useragent);	// Setting useragent
	curl_setopt($ch, CURLOPT_URL, $postUrl);	// Setting URL to POST to
	curl_setopt($ch, CURLOPT_POST, TRUE);	// Setting method as POST
	curl_setopt($ch, CURLOPT_POSTFIELDS, $postFields);	// Setting POST fields as array
	$results = curl_exec($ch);	// Executing cURL session
	curl_close($ch);	// Closing cURL session
	return $results;

$url = '';	// Login POST URL

// Array built from login credentials
$credentials = array(
	'username' => $userEmail,       // Your email address
	'password' => $userPass,        // Your password
	'RememberMe' => 'true',         // Staying logged in
	'IsAjaxRequest' => 'false'      // Whether request is AJAX

// Performing the login!
$request = curlPost($url, $credentials);
$login = json_decode($request); // Decoding the JSON response
if ($login->success == 1) {
    // Successful login
    $message = 'Successful login.'; // Assigning successful login message
    echo $message . "\n";
} elseif ($login->success == 0) {
    $message = $login->error;    // Assigning login error message returned by server
    echo $message . "\n";
    exit(); // Ending program
} else {
    $message = 'Unknown login error.';  // Assigning unknown login error message
    echo $message . "\n";
    exit(); // Ending program

Now, you may be surprised to find out that what is returned from the server is not the usual web page that we would expect from a form submission. Instead, the response is a JSON encoded string intended for the JavaScript application to handle our login request.

I’ve added a couple of print_r() statements in the code so we can actually see what is being returned by the server.

For an unsuccessful login we should receive:

{"success":false,"errorCode":"I","error":"The User Name or Password you entered is not correct. Please try again."}

For a successful login we should receive:


If you’re not familiar with JSON it’s actually pretty simple, it’s a string of keys and values, much the same as an array. In our instance here it’s the “success” key we are looking for and it’s value of true or false letting us know whether our login was successful or not.

In our PHP script we decode the JSON string using the json_decode() function and store the object in $login. From this we can determine if our login was successful (true / 1) or if it failed (false / 0). With any luck, we should receive a successful login and our PHP scraper script will echo a success message:

Successful login.

…happy fucking days! Now we’re getting to the fun stuff >:)

Now we’re in. What do we want to do? How about get all the customer information?

In your browser navigate to CUSTOMERS > CUSTOMERS or just follow this link.

NCR Silver Customers

The customers admin panel with only one customer in it. This page is generated entirely from JavaScript.

Oh shit, there’s only one customer there, this is going to be boring. I guess we should add a few customers to work with.

Since what we’re really interested in is the scraping of data from a JavaScript page, we’re just going to use the import function of the web site to add bunch of customers. All you have to do is download this csv file and import it on the site.

Import Button

Here is where we import the customers.

Importing Customers Into The POS

Screen showing the importing of our customer base to be scraped.

Now we’ve got some customer data visible in our browser, all displayed by the website using JavaScript and JSON.

Customers To Scrape

Here’s our customer base rendered with JavaScript ready to be scraped.

As you can see by viewing the source code of the page, nowhere can we see the information about the customers, all we see is lots of JavaScript includes which are doing the rendering of the customer information. So where is this information coming from? Well, when the page is loaded in your browser the web page makes a request to get all of the customer data which is returned as a JSON object, which is then rendered in your browser using JavaScript.

Customers Page JavaScript Source Code

All page content is being rendered using a collection of JavaScript applications. Nowhere can we see the actual rendered page content.

You may be thinking, well if we can’t see the customer information on the page then when we request the page using cURL like we usually do, how can we scrape the data? Well, it’s actually quite simple – we pretend to be the JavaScript web application requesting the data and then we have a JSON object of all the data we require returned to us which we can mine and scrape to our hearts content.

In order to do this, we must first figure out the request that is being made by the page which we want to immitate. Back we go to our trusty Live HTTP Headers plugin. I figure the best way to do this is not to ‘load the page’, as this will return lots of erroneous data such as markup and styling. I figure the best way to do this is to mimic the performance of a search as this should only return data about the customers – maybe if we perform a search with no search string we get a list of all of the customers? Let’s give it a shot!

Live HTTP Headers for JavaScript Search Form

From the HTTP headers we can see the POST URL for the search form and the data being sent.

There we have it – our URL to make the POST request and all of the data to pass along with it. Let’s start building this up and hopefully we should see positive results.

    class NCRSilverScraper {

        // Class constructor method
        function __construct() {

            $this->useragent = 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv: Gecko/20100401 Firefox/3.6.3';    // Setting useragent of a popular browser

            $handle = fopen('cookie.txt', 'w') or exit('Unable to create or open cookie.txt file.'."\n");   // Opening or creating cookie file
            fclose($handle);    // Closing cookie file
            $this->cookie = 'cookie.txt';    // Setting a cookie file to store cookie
            $this->timeout = 30; // Setting connection timeout in seconds

            $this->loginUrl = '';


        // User login method
        public function login() {

            // Login values to POST as array
            $postValues = http_build_query(
                    'username' => $emailAddress,
                    'password' => $password,
                    'RememberMe' => 'true',
                    'IsAjaxRequest' => 'false'

            $request = $this->curlPostFields($this->loginUrl, $postValues);   // Making cURL POST request

            $login = json_decode($request); // Decoding the JSON response

            if ($login->success == 1) {
                // Successful login
                    $message = 'Successful login.'; // Assigning successful message
                echo $message;
            } elseif ($login->success == 0) {
                $message = $login->error;    // Assigning login error message returned by server
                echo $message;
                exit(); // Ending program
            } else {
                $message = 'Unknown login error.';  // Assigning unknown login error message
                echo $message;
                exit(); // Ending program

        // User logout method
        public function logout() {
            $request = $this->curlPostFields('', null);  // Logging out

        // Method to search and scrape existing members details
        public function scrapePersons($searchString = '') {

            $searchUrl = '';

            $postValues = array(
                'PageRowCount' => 1000,
                'RequestedPageNum' => 1,
                'TotalRowCount' => -1,
                'SearchArg' => $searchString,
                'SortDirection' => 'ASC',
                'SortColumn' => 'Name',
                'page' => 1,
                'start' => 0,
                'limit' => 1000,
                'sort' => '[{"property":"Name","direction":"ASC"}]',
                'isAjaxRequest' => true,

            $search = $this->curlPostFields($searchUrl, $postValues);

            return $search;

        // Method to make a POST request using form fields
        public function curlPostFields($postUrl, $postValues) {
            $_ch = curl_init(); // Initialising cURL session

            // Setting cURL options
            curl_setopt($_ch, CURLOPT_SSL_VERIFYPEER, FALSE);   // Prevent cURL from verifying SSL certificate
            curl_setopt($_ch, CURLOPT_FAILONERROR, TRUE);   // Script should fail silently on error
            curl_setopt($_ch, CURLOPT_COOKIESESSION, TRUE); // Use cookies
            curl_setopt($_ch, CURLOPT_FOLLOWLOCATION, TRUE);    // Follow Location: headers
            curl_setopt($_ch, CURLOPT_RETURNTRANSFER, TRUE);    // Returning transfer as a string
            curl_setopt($_ch, CURLOPT_COOKIEFILE, $this->cookie);    // Setting cookiefile
            curl_setopt($_ch, CURLOPT_COOKIEJAR, $this->cookie); // Setting cookiejar
            curl_setopt($_ch, CURLOPT_USERAGENT, $this->useragent);  // Setting useragent
            curl_setopt($_ch, CURLOPT_URL, $postUrl);   // Setting URL to POST to
            curl_setopt($_ch, CURLOPT_CONNECTTIMEOUT, $this->timeout);   // Connection timeout
            curl_setopt($_ch, CURLOPT_TIMEOUT, $this->timeout); // Request timeout

            curl_setopt($_ch, CURLOPT_POST, TRUE);  // Setting method as POST
            curl_setopt($_ch, CURLOPT_POSTFIELDS, $postValues); // Setting POST fields (array)

            $results = curl_exec($_ch); // Executing cURL session
            curl_close($_ch);   // Closing cURL session

            return $results;

        // Class destructor method
        function __destruct() {
            // Empty

    // Let's run this baby and scrape us some data!
    $testScrape = new NCRSilverScraper();   // Instantiating new object

    $testScrape->login();    // Logging into server

    $data = json_decode($testScrape->scrapePersons());   // Scraping people records

    $testScrape->logout();   // Logging out


And with that run we should have us some nice data scraped from a JavaScript and JSON website using nothing more than PHP and a little common sense.

Final Scraped JSON Data

Here’s the output of our scraper, printing out the contents of our PHP object.

Here we have the customer’s code, full name, email address and phone number. It’s one small step for web scraping, one giant leap for something or other. I don’t know where I was going with that.

Of course, we don’t have to store it as an object, we could always parse it into an array if you prefer working with your data like that. Or whatever your preferred data structure is.

Now we have the data it’s up to you what to do with it. Personally, and just for the purposes of this post, I’m going to write a little method to format it in a nice HTML table to display below…you might want to do something more useful with your data, like store it in a database, csv or something else, which I might cover in a future post.

Anyways, I hope this post has been somewhat informative and answered most of your questions regarding scraping JavaScript sites and JSON using PHP. As always, comments and questions are always welcome. You know what to do.

Happy web scraping!

User ID Full Name Email Address Phone Number
192 Andrea Fernandez 9-(362)056-0581
142 Ann Thomas 6-(538)141-2725
145 Ann Walker 7-(670)470-3724
203 Anna Carr 1-(382)463-0119
183 Ashley Kelly 1-(112)543-9709
184 Benjamin Dean 9-(780)063-9572
111 Bonnie Alvarez 7-(240)691-0590
141 Brandon Murray 7-(612)179-5480
156 Carolyn Foster 7-(614)558-2275
187 Cheryl Burke 8-(119)283-2599
135 Christine Wells 7-(415)042-8205
130 Craig Harper 3-(092)318-1942
104 Daniel Gonzales 9-(313)370-0380
136 Denise Kelly 2-(435)951-9920
178 Denise Vasquez 6-(800)841-4073
166 Diana Gardner 9-(653)558-6654
200 Diana Nguyen 7-(016)965-4256
198 Diana Richards 4-(783)241-6445
118 Diane Harvey 0-(422)620-9113
128 Diane Porter 8-(493)442-8581
201 Donald Roberts 2-(883)548-2431
115 Donna Reyes 2-(529)344-1126
127 Doris Berry 3-(364)519-6194
106 Dorothy Andrews 5-(727)310-0492
180 Dorothy Kelly 3-(400)221-6843

23 thoughts on “Using PHP To Scrape Websites Generated By JavaScript, jQuery, AJAX & JSON

  1. Hi jacob, its great to following all of your tutorial. Do yo know how to triggering ads from admob sdk that will show up in the web, this image maybe will give you some idea what i mean about it.

    thanks and you know what, i learn about web scraping for my job from your tut majority.

    1. Thanks for your kind remarks.

      With regards to your question, I can’t see the image in question. I get the error message:

      An error occured while processing your request.
      Reference #50.7cbad040.1417643859.1f1bb212

      This is likely because the image was originally cached with akamai and they’ve rebuilt their cache since then.

      If you could save the image or take a screenshot and upload it to somewhere like, I’ll be able to take a look for you.

  2. AdMob ads can appear only in the phone application, but there is someone who managed to bring AdMob ads in web pages with php curl, but she would not tell me how, do you have an idea about this?

    1. I’m not familiar with AdMobs ads, so I can’t tell you for sure, but I would imagine she is just pulling in the script or banners or whatever with cURL and echoing them out on to the page.

      Have you tried that yet?

      For me to really be of any help I need to know URLs, etc… so I can actually see what the problem is and create a solution.

      You can email me direct if you wish to pass these on privately.

  3. Hi, excellent tutorial!
    I found that the cookie.txt will not be created. Does it require absolute path? Another minor mistake: $this->timeout need to be declared in the constructor first in case someone needs to know.

    Keep it up! You are one of the best scrapper on the net!

    1. You’re right, thanks for spotting that and letting me know. I’ve ammended the code to assign the timeout in the constructor.

      cookie.txt should be created in the directory you’re running the script from, assuming it has write permission. You can however use an absolute, or relative path to any directory it has write permission in to create it.

    1. The website is written in .NET and uses the __doPostBack() function, which passes a parameter generated on each page load. You need to scrape that and pass it through with your request.

  4. Hi jacob,
    i still have probs to login and after that to create multiple mail accounts on

    i don’t need captcha, but i tried different curl versions and configurations but still didn’t got the successfull login.

    for my clients a i have register different pages and for the stuff the mail accounts.

    best regards


  5. Hi jacob,
    over all i wish to thank you to clear my doubt about web scraping. You shown topic as clear as well! But i have a doubt yet: i have built a webapp that i wil use only for personal purpose. I have to search a medicine by an external site. I used your classes and methods with live http header Chrome plugin but i didn’t find the word i have to include in the context string (your $credentials array).
    Can you help me, please?
    Thank you so much for this and other job you release

    1. Hi, I’m glad I’ve been able to help you out. Please send me a link to the website and explain to me exactly what it is you need to do and I will get back to you as soon as possible 🙂

  6. Hello this error appears I do not understand you can help me
    erreur:Trying to get property of non-object in ligne 38 (if ($login->success == 1) )

    COde :
    $request = $this->curlPostFields($this->loginUrl, $postValues); // Making cURL POST request

    $login = json_decode($request); // Decoding the JSON response

    if ($login->success == 1)

    1. This would mean that $login is not an object, thus there is a problem with the json_decode($request).

      Could you var_dump or print_r the $request variable and let me see the response please?

Leave a Reply