Instant PHP Web Scraping Book Now Available!

If you’ve been following me on Twitter or contacted me privately, it’s likely you know this day has been approaching and, Instant PHP Web Scraping was published on 26th July and is now available to buy!

For those that don’t already know, the content of the book is essentially where I had originally intended to head with the Web Scraping With PHP & CURL series I started. Aimed at novice PHP programmers who are new to web scraping, it will guide readers through the basics and provide a tool set to complete a number of web scraping tasks and give a firm basis for further learning on the subject.

NOTE: This book is intended to serve as a brief introduction to web scraping with PHP. I was under strict instruction and constraints by the publisher. The target audience of this book is the absolute beginner. If you have experience working with PHP, cURL, MySQL, etc… this book is not for you.

The book is available as an ebook from Packt Publishing or as a paperback from Amazon. In addition to the recipes contained in the book, there are also a number of bonus recipes which will be available online for anybody who has purchased the book, providing even more coverage of the subject matter. I will also be setting up an online forum here, where anybody who has read the book can post questions or ask for help from me personally.

Win A Free Copy!

Packt Publishing have 3 free copies of Instant PHP Web Scraping in ebook format which you can win. I will be putting a competition together in the coming days, so stay tuned to find out how to enter and be in with a chance to win!

Own A Website And Want A Free Copy?

If you own a website or blog and would like to review this book, please send me your details via my contact form and I will respond asap with full details.

Book Overview

Who this book is for

This book is aimed at those new to web scraping, with little or no previous programming experience. Basic knowledge of HTML and the Web is useful, but not necessary.

What you will learn from this book

  • Scrape and parse data from web pages using a number of different techniques
  • Create custom scraping functions
  • Download and save images and documents
  • Retrieve and scrape data from emails
  • Save scraped data into a MySQL database
  • Submit login and file upload forms
  • Use regular expressions for pattern matching
  • Process and validate scraped data
  • Crawl and scrape multiple pages of a website

In Detail

With the proliferation of the web, there has never been a larger body of data freely available for common use. Harvesting and processing this data can be a time consuming task if done manually. However, web scraping can provide the tools and framework to accomplish this with the click of a button. It’s no wonder, then, that web scraping is a desirable weapon in any programmer’s arsenal.

Instant Web Scraping With PHP How-to uses practical examples and step-by-step instructions to guide you through the basic techniques required for web scraping with PHP. This will provide the knowledge and foundation upon which to build web scraping applications for a wide variety of situations such as data monitoring, research, data integration relevant to today’s online data-driven economy.

On setting up a suitable PHP development environment, you will quickly move to building web scraping applications. Beginning with a simple task of retrieving a single web page, you will then gradually build on this by learning various techniques for identifying specific data, crawling through numerous web pages to retrieve large volumes of data, and processing then saving it for future use. You will learn how to submit login forms for accessing password protected areas, along with downloading images, documents, and emails. Learning to schedule the execution of scrapers achieves the goal of complete automation, and the final introduction of basic object-oriented programming (OOP) in the development of a scraping class provides the template for future projects.

Armed with the skills learned in the book, you will be set to embark on a wide variety of web scraping projects.

Approach

Filled with practical, step-by-step instructions and clear explanations for the most important and useful tasks. Short, concise recipes to learn a variety of useful web scraping techniques using PHP.

Table of contents

  • Preparing your development environment (Simple)
  • Making a simple cURL request (Simple)
  • Scraping elements using XPath (Simple)
  • The custom scraping function (Simple)
  • Scraping and saving images (Simple)
  • Submitting a form using cURL (Intermediate)
  • Traversing multiple pages (Intermediate)
  • Saving scraped data to a database (Intermediate)
  • Scheduling scrapes (Simple)
  • Building a reusable scraping class (Advanced)
  • + online bonus content covering a number of other topics!

21 thoughts on “Instant PHP Web Scraping Book Now Available!

  1. Jacob,

    I have enjoyed the excellent guides on your site. If you were in the states I would buy you a beer for being such a kick ass guy !

    Your “Hire me” form is on the fritz.

    Will you drop me a line ?

    Bert

  2. Hi Jacob,

    I bought the book yesterday. The excercise 12-saving-database.php uses code from an excercise which is not documented in the book (11-multithreaded-scraping.php) so no explanation for some of the code used in excercise 12.

    Also the code in excercise 11 and therefore 12 has issues (Undefined variable: freeEbooksUrls)

    Please post an errata,

    Thanks,
    Brian.

  3. Jacob,
    I tried to contact you on twitter but no reply. I purchased your book but when you get to number 11 on scraping multiple pages the examples files that you provide don’t work and give several error messages. Can you please help with this?

    Thanks

    Mike

    1. This is due to the code on the target page having changed. I’ve spoken to the publishers and the fix for this will be included in the online errata sometime next week.

  4. Also in this one I am having a notice for the line 162 telling that “authors” is an Undefinex index, next line with a PHP Warning telling I am passing invalid arguments to the implode() function. and finaly another line this time a fatal error, a PDOExeption saying “Column ‘ebook_authors’ cannot be null”

    http://pastebin.com/v7znWyiV

    I have been more than 3 hours comparing with the sample files and the code at the book but everything is similar. May be this needs to be more enhanced? Maybe I am missing something else? please send me tip on how to move forward this errors points. Sorry to write you over here but your contact form at this website doesn’t work.

  5. Jacob, After only a few minutes of viewing your content, I am impressed. These free tutorials has cleared a lot of stuff up for me. I am a basically more of a coding enthusiast, not really advanced by any means. I will probably purchase your book. I am curious since your book was released in 2013, how dated are the examples, are they still pretty much how you would do it today or is there an updated version coming out soon?

    Right now I am mostly trying to learn the methods to navigate through product lists, product details and grab pricing, ranking, reviews etc from sites like Amazon, Walmart and other large online retail spaces. I of course would like to disguise my attempts to avoid IP Blocks. I would especially like to do it in a way that works like a human browsing, such as putting random time delays per page but don’t know if that is possible.

    Will the book give me more details on how to do this?

    1. Hi Timothy, so glad to hear you like my content. In response to your questions:

      Over the past few years, JavaScript soluions have been making a wider than ever appearance in front-end solutions, and this is a topic not covered in the book at all. However, with other sites, where the data is requested and displayed in the ‘traditional’ manner, the recipes in the book are pretty much how I would handle the tasks today.

      Fortunately, you intend to scrape ecommerce stores, where it’s in their best interest to make the data easily crawlable for SEO purposes, and the examples in the book deal with scraping the Packt online store. Exactly the same as how you would go about scraping sites like Amazon or Walmart. The only catch is that the Packt online store has had a little redesign since the publishing of the book, as such there have been a few reports recently of some of the recipes in the book not working ‘out of the box’ and requiring a little common sense to fix – this unfortunately is the way with web scraping, sometimes your target just decides to change something and you have to adapt. I think this is a valuble lesson – one which I have blogged about in the past – and, as such, I will not be updating the book. I will, however, offer help and advice to anybody that has bought the book and is really stuck on something.

      With regards to disguising your bot as a human – this was a topic for the book, but didn’t make it in because it was deemed too unethical, although there are hints in the book, such as: changing your user agent string, random sleeping between requests and there’s posts on here regarding topics like using proxies with cURL.

      I hope I have answered all of your questions, if not, please feel free to let me know.

    1. Great. If you buy it, be sure to let me know how you’re getting along with it and don’t hesiatate to ask any questions in this thread. I’ll do my best to respond in a timely manner.

Leave a Reply