Simple HTML DOM baked CakePHP component
Simple HTML DOM Parser ported to CakePHP
While we loved using the Simple HTML DOM PHP class provided here, we wanted to use it on our CakePHP sites as well. Instead of feeling sorry for ourselves, we decided to port the Simple HTML DOM parser (which uses a delightful jQuery-like DOM element filter) into CakePHP!
On top of that, we even added in the ability to use the PHP cURL library instead of the default file_get_contents() calls. Take a look at the ease of use in our Simple HTML DOM baked CakePHP component.
Update – 7/10/11
7/10/2011: Ralf (down in the comments below) made us aware of some changes with the usage of components in CakePHP, so please take a look at the new implementation of our component below (it’s a minor change).
Load the component into your controller like so:
[php]
<?php
class SampleController extends AppController {
var $helpers = array (‘Html’,’Form’);
var $name = ‘Sample’;
var $components = array(‘SimpleHtmlDomBaked’);
?>
[/php]
Which will then allow you to access the component’s functions throughout the controller using:
[php]
$this->SimpleHtmlDomBaked;
[/php]
So altogether you can use it just like this:
[php]
$url = "http://www.lolcats.com";
// you can also just use the class ref directly
// with $this->SimpleHtmlDomBaked if you’d like!
$html = $this->SimpleHtmlDomBaked;
// curl it
$html->curl_and_load($url, true);
// get page title
$title = $html->find(‘title’, 0)->innertext;
// get first picture src
$firstImage= $html->find(‘img’, 0)->src;
[/php]
Using the CakePHP HTML parser to do almost anything!
And any normal call that you would make using the Simple HTML DOM parser still works too!
[php]
// Find all anchors and images with the "title" attribute
$ret = $html->find(‘a[title], img[title]’);
// Example
echo $html->find("#div1", 0)->children(1)->children(1)->children(2)->id;
// or
echo $html->getElementById("div1")->childNodes(1)->getAttribute(‘id’);
[/php]
You can find more calls in their manual.
Update – 1/13/11
1/13/2011: Thanks to a comment from Eeks, we realized that the Simple HTML DOM function set_callback() was broken. We have updated the component with the fix!
Here is the proper usage for the set_callback() function within CakePHP:
[php]
$url = "http://www.lolcats.com";
$html = $this->SimpleHtmlDomBaked;
// curl it
$html->curl_and_load($url, true);
// set callback
// @params
// the first is the object reference or class name of the controller
// the second is the name of the function within that controller
$html->set_callback($this,’my_callback’);
// dump var
echo $html;
[/php]
and upon the statement where it says echo, this function will fire from within the same CakePHP controller as the code above:
[php]
function my_callback($element){
// Hide all <b> tags
if ($element->tag==’b’)
$element->outertext = ”;
}
[/php]
And that should help you get started on the next great search engine or site parser!!
Download Simple HTML DOM baked
Support and questions
We are using CakePHP version 1.3.6 at the time of creation. We expect this to work with most versions of CakePHP starting at 1.2 and upwards.
You can find the CakePHP framework download archive here:
Download CakePHP
Please direct any inquiries for this CakePHP component in the comments below.