Scraping Links to Movies

Maybe I was a bit of a bitch, after promising some free scraping classes and then lunching it out, so here’s one while you wait for my next posts. Essentially, it scrapes all of the alternative movie links from

Disclaimer: I in no way endorse the illegal watching or downloading of movies. Go buy the fucking DVD or watch it on NetFlix or whatever…

It’s a very much simplified version of a part of something I was working on for a client recently.


	class Putlocker {
		public function searchMovieLinks($title) {
			$query = str_replace(' ', '+', htmlspecialchars_decode($title, ENT_QUOTES));
			$searchPage = $this->curlGet('' . $query);
			$searchPageXPath = $this->returnXPathObject($searchPage);
			$searchPageLinks = $searchPageXPath->query('//div[@class="content-box"]/table[last()]/*/*/a/@href');
			$searchPageNames = $searchPageXPath->query('//div[@class="content-box"]/table[last()]/*/*/div/a');
			if ($searchPageLinks->length > 0) {
				for ($i = 0; $i <= $searchPageLinks->length; $i++) {
					if (trim(strtolower($searchPageNames->item($i)->nodeValue)) == trim(strtolower(htmlspecialchars_decode($title, ENT_QUOTES)))) {
						$moviePageLink = $searchPageLinks->item($i)->nodeValue;
			$moviePage = $this->curlGet($moviePageLink);
			$moviePageXPath = $this->returnXPathObject($moviePage);
			$moviePageLinks = $moviePageXPath->query('//td[@class="entry"]/a/@href');
			if ($moviePageLinks->length > 2) {
				for ($i = 2; $i < $moviePageLinks->length; $i++) {
					$movieLinks [] = $moviePageLinks->item($i)->nodeValue;
			return $movieLinks;
		// Method to return XPath object
		public function returnXPathObject($item) {
			$xmlPageDom = new DomDocument();	// Instantiating a new DomDocument object
			@$xmlPageDom->loadHTML($item);	// Loading the HTML from downloaded page
			$xmlPageXPath = new DOMXPath($xmlPageDom);	// Instantiating new XPath DOM object
			return $xmlPageXPath;	// Returning XPath object
		// Method for making a GET request using cURL
		public function curlGet($url) {
			$ch = curl_init();	// Initialising cURL session
			// Setting cURL options
			curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);	// Returning transfer as a string
			curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);	// Follow Location: headers
			curl_setopt($ch, CURLOPT_URL, $url);	// Setting URL
			$results = curl_exec($ch);	// Executing cURL session
			curl_close($ch);	// Closing cURL session
			return $results;	// Return the results

Save it as putlocker.php

To use it:



$putlocker = new Putlocker();

$title = 'whatever movie you want';

try {
	$links = $putlocker->searchMovieLinks($title);
} catch (Exception $e) {
	// Add your error handling class in here...

if ($links) {


…and you’ll get an array of the ‘alternative movie links’ from

See, I told you it was pointless posting uncommented code (well, where the only comments are from previously posted methods) without any reference or instructions to the new methods…now you can scrape a few links, but you probably don’t know how you did it.


Leave a Reply