Ideas For Web Scraping & Automation Projects Or Posts

My blog has remained rather stagnant for the last few months, in part due to uni work, but mostly because I can’t think of any interesting topics to cover or tutorials to write. I’ve so far covered some really basic web scraping topics and gone slightly more in depth in my book, but I don’t know what it is you want to learn more about.

So, all suggestions are currently welcome! Please leave them in the comments section and if they’re appropriate I’ll cover them in future posts. So far I have a few ideas (feel free to leave some feedback):

  • Auto-tweeting images from RSS feeds – So far, none of the automation tools out there support natively tweeting images to Twitter. They all use third-party URL shortening services, which means your image doesn’t appear in your Twitter stream or in your ‘Photos and videos’ page of your Twitter profile. This pisses me off and I’m working on something to accomplish this.
  • OOP PHP programming – With the basics down, I think from now on all my posts and tutorials are going to be using Object-Oriented Programming (OOP), as I do in my personal and clients’ projects. It’s far more easier and cleaner when working on larger projects and easy to scale out our applications as we add more features. This will also entail using classes such as DOMDocument(), PDO(), among others to make our applications more robust and easier to maintain.
  • Automating and scraping AJAX – With more websites than ever now using AJAX, I think automating and scraping these using PHP might be an interesting project to cover.
  • Basic captcha ‘cracking’ – I know this may be a somewhat ‘grey area’ topic. But I’ll approach it from a neutral perspective. Using PHP and Optical Character Recogition (OCR) to crack basic, but commonly implemented, captchas.

If any of these topics take your fancy, or there’s something else you want covered, leave your responses in the comments below


Leave a Reply