<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>&#60;title&#62; &#187; scraping</title>
	<atom:link href="http://www.adammoro.com/blog/tag/scraping/feed" rel="self" type="application/rss+xml" />
	<link>http://www.adammoro.com/blog</link>
	<description>Internet Marketing, Web Development and Programming Stuff</description>
	<lastBuildDate>Fri, 19 Nov 2010 19:58:33 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Chunk Data for Easier Scraping</title>
		<link>http://www.adammoro.com/blog/chunk-data-for-easier-scraping.html</link>
		<comments>http://www.adammoro.com/blog/chunk-data-for-easier-scraping.html#comments</comments>
		<pubDate>Sun, 21 Feb 2010 22:41:32 +0000</pubDate>
		<dc:creator>Adam Moro</dc:creator>
				<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[HOWTO]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[regex]]></category>
		<category><![CDATA[scraping]]></category>

		<guid isPermaLink="false">http://www.adammoro.com/blog/?p=56</guid>
		<description><![CDATA[Before you spend an hour writing some elaborate regular expression, try chunking the data and matching several expressions to make for a much simpler (and faster) scrape. So, assuming you're using PHP, after you've pulled the data (e.g. with file_get_contents()), use preg_replace with the following regex to chunk the data into a much easier "soup" ...]]></description>
			<content:encoded><![CDATA[<p>Before you spend an hour writing some elaborate regular expression, try chunking the data and matching several expressions to make for a much simpler (and faster) scrape. So, assuming you're using PHP, after you've pulled the data (e.g. with file_get_contents()), use preg_replace with the following regex to chunk the data into a much easier "soup" to work with.</p>
<pre>[ \t\r\n]</pre>
<p>Here's an example of how to use this with PHP:</p>
<pre>&lt;?php
	$data = file_get_contents("http://www.adammoro.com/");
	$data = preg_replace('~[ \t\r\n]~', '', $data);
	print_r($data);
?&gt;</pre>
]]></content:encoded>
			<wfw:commentRss>http://www.adammoro.com/blog/chunk-data-for-easier-scraping.html/feed</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
	</channel>
</rss>

