Importing Comments From flickr.com

Disclaimer

These instructions/steps worked for me in CentOS 4.0. It may very well work for you on Red Hat-like or other distributions. Please note that if you decide to use these instructions on your machine, you are doing so entirely at your very own discretion and that neither this site, sgowtham.com, nor its author is responsible for any/all damage – intellectual or otherwise.

As much as I would like to see my photographs being commented in my personally satisfying photoblog, I have recently come to realize (thanks to suggestions from several friends – Nagesh & Kyle) that it’s not a bad idea to post the same pictures on flickr.com. For one, the latter is a multi-user platform with users ranging from novice/beginners to advanced professionals. As such, there is a better chance for attracting useful comments. As easy as it might seem to manually enter comments from flickr.com to my photoblog when there are only few comments, it can become quite tedious and time consuming (at least I expect it to be) with time. To that effect, I did some Google! search to find an XML/RSS parser, modified it to meet my requirements. The procedure/edits follow:

XML/RSS Parser with PHP

flickr.com generates an RSS feed for comments that others (or I) make on my photos. This RSS feed contains all the required information – name, date-time, comment, title of the image, etc. The way I designed my photoblog, the image_name is the unique identifier and when I upload the images to flickr.com, I keep the image name as part of the title. The following script – flickr2showcase.php, not originally written by me but parts of it heavily modified, does exactly what I want – extract the information from RSS feed and arrange it in a manner that it can be incorporated into my photoblog (one may refer to the original script & its documentation).

#! /usr/bin/php
<!--?php 
# Connect to the database
$host     = "localhost";
$dbuser   = "MYSQL_USERID";
$dbpasswd = "MYSQL_PASSWD";
$database = "MYSQL_DB";
$connect  = mysql_connect($host, $dbuser, $dbpasswd) or die(mysql_error());
mysql_select_db($database,$connect) or die(mysql_error());
 
class FlickrRSSParser {
 
  # In Flickr's RSS file, all the information needed is contained in the &lt;item&gt; 
  # tags in the document. So the first global variable defined will be $insideitem, 
  # which will be set to true when entering an &lt;item&gt; tag and false when exiting one.
  var $insideitem  = false;
  var $tag         = "";
  var $title       = "";
  var $description = "";
  var $link        = "";
  var $pubdate     = "";
 
  # This function will be called by the XML parser whenever an opening tag is 
  # encountered
  # $parser will be passed a reference to the XML parser that is being used to 
  # parse the document
  # $tagName is the ALL-UPPERCASE (the PHP manual calls this 'case-folded')
  # version of the name of the opening tag that triggered the event
  # $attrs is an associative array of the attributes that are present in the tag
  # that triggered the event
  function startElement($parser, $tagName, $attrs) {
    if ($this-&gt;insideitem) {
      $this-&gt;tag = $tagName;
    } elseif ($tagName == "ITEM") {
      $this-&gt;insideitem = true;
    }
  }
 
  # $parser will be passed a reference to the XML parser that is being used to
  # parse the document
  # $tagName is the case-folded name of the closing tag that triggered the event
  function endElement($parser, $tagName) {
    if ($tagName == "ITEM") {
 
      # Image name: flickr.com displays the title as 'Comment on dsc_100-1234'
      # The filename is the 3rd array element
      $title        = htmlspecialchars(trim($this-&gt;title));
      $title        = explode(" ", $title);
      $filename     = $title[2];
 
      # Date/Time the comment was made (yyyy-mm-dd hh:mm:ss format)
      $pubdate      = htmlspecialchars(trim($this-&gt;pubdate));
      $pubdate      = strtotime($pubdate);
      $pubdate      = date("Y-m-d H:i:s", $pubdate);
 
      $paragraphs   = htmlspecialchars(trim($this-&gt;description));
      $paragraphs   = explode("&amp;lt;/p&amp;gt;", $paragraphs);
 
      # The description contains the link to comment-author's flickr profile and 
      # comment-author's name (first &lt;/p&gt;
&lt;p&gt; section)
      $paragraph0   = $paragraphs[0];
      $authorurl    = explode("&amp;quot;", $paragraph0);
      $author_url   = $authorurl[1];
 
      $authorname0  = explode("&amp;gt;", $paragraph0);
      $authorname1  = $authorname0[2];
      $authorname2  = explode("&amp;lt;", $authorname1);
      $author_name  = $authorname2[0];
 
      # The description also contains the comment-text (second &lt;/p&gt;
&lt;p&gt; section)
      # Basic substitutions are done, via ereg_replace(), to get the appropriate part
      # mysql_real_escape_string() is used to make sure comment_text is in MySQL friendly fashion
      $paragraph1   = $paragraphs[1];
      $commenttext  = explode("&amp;lt;p&amp;gt;", $paragraph1);
      $comment_text = $commenttext[1];
      $comment_text = ereg_replace("&amp;lt;br /&amp;gt;", "&lt;br&gt;\r\n", $comment_text);
      $comment_text = mysql_real_escape_string($comment_text);
 
      # The description also contains a link to image-thumbnail (second &lt;/p&gt;
&lt;p&gt; section)
      # but it's not required in this process - as such, it's ignored

      # flickr.com's IP address
      $flickr_ip    = gethostbyname('www.flickr.com');
 
      # comments_table structure
      # CREATE TABLE IF NOT EXISTS `comments_table` (
      #  `id`              int(11)      NOT NULL auto_increment,
      #  `datetime`        DATETIME     NOT NULL default '0000-00-00 00:00:00',
      #  `imagename`       varchar(254) NOT NULL,
      #  `authorip`        varchar(254) NOT NULL default '000.000.000.000',
      #  `authorhostname`  varchar(254) NOT NULL default 'localhost.localdomain',
      #  `authorname`      varchar(254) NOT NULL default 'Unknown',
      #  `authoremail`     varchar(254) NOT NULL default 'Unknown',
      #  `authorurl`       varchar(254) NOT NULL default 'Unknown',
      #  `comments`        text,
      #  `status`          varchar(3) NOT NULL default 'No',
      #
      #  UNIQUE KEY `id` (`id`)
      # ) ENGINE=MyISAM DEFAULT CHARSET latin1 AUTO_INCREMENT=1 ;
      #
      # CREATE UNIQUE INDEX author_datetime ON `comments_table` (`authorname`,`datetime`);
      #

      # Enter into the comments_table in database
      # INSERT IGNORE makes sure that duplicate entries, when exist, 
      # are ignored during insertion
      $sql_q        = "INSERT IGNORE INTO `MYSQL_DB`.`comments_table` ";
      $sql_q       .= "VALUES ('', '$pubdate', '$filename.jpg', '$flickr_ip', ";
      $sql_q       .= "'flickr.com', '$author_name', 'flickr@your-domain.com', ";
      $sql_q       .= "'$author_url', '$comment_text', 'Yes'); ";
      $result       = mysql_query($sql_q);
 
      if (!$result) {
        die('Invalid query: ' . mysql_error());
      }
 
      $this-&gt;title       = "";
      $this-&gt;description = "";
      $this-&gt;link        = "";
      $this-&gt;pubdate     = "";
      $this-&gt;insideitem  = false;
    }
  }
 
  # $parser will be passed a reference to the XML parser that is being used to
  # parse the document
  # $data is a string of text appearing between XML tags in the document. 
  # The text between two tags will not necessarily trigger a single event. 
  # Blocks of text spread over multiple lines will cause one event per line, 
  # with each event being passed the $data for that line.
  function characterData($parser, $data) {
    if ($this-&gt;insideitem) {
      switch ($this-&gt;tag) {
        case "TITLE":
        $this-&gt;title .= $data;
        break;
        case "DESCRIPTION":
        $this-&gt;description .= $data;
        break;
        case "PUBDATE":
        $this-&gt;pubdate .= $data;
        break;
        case "LINK":
        $this-&gt;link .= $data;
        break;
      }
    }
  }
}
 
# Create an XML parser
# Just as one must create a database connection in PHP to interact with a database, 
# one must create an XML parser to read in an XML file. In this case, a reference to 
# the parser is stored in $xml_parser.
$xml_parser = xml_parser_create();
 
$rss_parser = new FlickrRSSParser();
 
xml_set_object($xml_parser,&amp;$rss_parser);
 
# This function specifies the functions that an XML parser should 
# use to process the events generated opening and closing tags. 
# In this case, the parser is the one stored in our $xml_parser variable, 
# while the functions are called startElement() and endElement()

xml_set_element_handler($xml_parser, "startElement", "endElement");
 
# This function specifies the function that the XML parser should use 
# to process character data appearing between tags in an XML document. 
# The function chosen to process character data is called characterData()
xml_set_character_data_handler($xml_parser, "characterData");
 
# Flickr's RSS Feed Comments URL must be entered here
$rss = "FLICKR_RSS_FEED_FOR_COMMENTS";
 
# Open the specified URL for reading
$fp = fopen($rss, "r")
  or die("Error reading RSS data.");
 
while ($data = fread($fp, 4096)) {
  # This function sends all or part of an XML document to the parser for it 
  # to process. The endOfDocument parameter should be set to true if the 
  # data marks the end of of XML document, or false if more of the document 
  # will follow in a subsequent call to xml_parse(). This allows the parser to 
  # correctly catch unclosed tags at the end of the document and so forth. 
  # In this case, the parser is once again $xml_parser. The $data variable 
  # (up to 4KB in size) retrieved from the file with fread() is passed as the 
  # data to be processed, while the feof() is used to determine whether 
  # PHP has reached the end of the XML file or not, thus providing the 
  # required endOfDocument parameter. If an error occurs in the parsing of 
  # the document, the error message is printed out along with the line of the 
  # file at which it occurs with xml_error_string, xml_get_error_code() and 
  # xml_get_current_line_number()
  xml_parse($xml_parser, $data, feof($fp))
    or die(sprintf("XML error: %s at line %d",
      xml_error_string(xml_get_error_code($xml_parser)),
      xml_get_current_line_number($xml_parser)));
}
 
fclose($fp);
 
# Although all memory resources are freed at the end of a PHP script, 
# one may wish to free up the memory used by the XML parser if the 
# script will perform other potentially memory-intensive tasks after it 
# parses the XML data. This function destroys the specified XML parser, 
# thus freeing up resources and memory it may have allocated for parsing.
xml_parser_free($xml_parser);
 
?-->

#! /usr/bin/php

To make sure that I don’t miss any comments, I run this above script via a cron-job, twice a day. To see this in action, one may compare the same image entries in my photoblog as well as on flickr: Photoblog Entry | Flickr.com Entry

Things seem to be working without any problem so far but that doesn’t necessarily mean the code/work is without errors and/or bugs. As they show up, I will try to post work-around for them.

Disclaimer

XML/RSS Parser with PHP

Related Posts

PHP – Computing Total Travel Distance From GPS Tracks

PHP – GPS Tracks In Google KML Format

MySQL – Finding Locations Nearest To A Given Pair Of GPS Coordinates