Chunk Data for Easier Scraping

main

Before you spend an hour writing some elaborate regular expression, try chunking the data and matching several expressions to make for a much simpler (and faster) scrape. So, assuming you're using PHP, after you've pulled the data (e.g. with file_get_contents()), use preg_replace with the following regex to chunk the data into a much easier "soup" to work with.

[ \t\r\n]

Here's an example of how to use this with PHP:

<?php
	$data = file_get_contents("http://www.adammoro.com/");
	$data = preg_replace('~[ \t\r\n]~', '', $data);
	print_r($data);
?>