Home > Code Snippets > Fun digging into WordPress XML RPC

Fun digging into WordPress XML RPC

October 18th, 2009 Leave a comment Go to comments

Once we achieved 20K visits per day at asianetindia.com (maintained by Saturn SPL), we planned to hammer in and our target was 30k in three months. We have achived that view the stats. For this twitter and blog helped a lot. We had added auto blogging, and wordtwit both to the existing asianetindia.com. It means each post when published will be tweeted, as well as a post to blog, which contains the excerpt with a back link to the original.

We wanted to get maximum links back into the blog also, and thought that a track back would be the best method and choose that against auto commenting systems. And a full time tester was doing linking related news and posts from other sites to our blog posts. It started to turn tiring when thought of the volume that was getting posted in a day. At this point, we at saturn started to think of automating the linking drive. The out come is autolinker, which is run using the random cron.

The policy was to run a script at a random time between 6 and 7 am which would handle linking of all posts which were done on the previous date. The flow of the system should be first to collect latest posts. Pass the title to google blog search asking for the output format as rss and querying only the top 10. Parse the feed, and get title as well as the description, create a weightage of our concerned title in the content returned. Order the links according to the rank. Choose the top five, add to the content in the blog with more tag and list elements.

The system was split out into several functions, for ease of use. My personal preference for fetching web content is the command line curl, and always use that whenever possible. Since that can be spawned into the background if needed. There may be advocates who argue against using the exec, but since we planned to use this only on a dedicated machine we were not too bothered about the security implications. Also the script was strictly command line and would be invoked by the cron.

function getFromWp(){
 
  
$postbody '<' .'?xml version="1.0" encoding="iso-8859-1"?' '>'
        
.'<methodCall>'
        
.'<methodName>mt.getRecentPostTitles</methodName>'
        
.'<params>'
        
.'<param><value><int>1</int></value></param>'
        
.'<param><value><string>[username]</string></value></param>'
        
.'<param><value><string>[password]lt;/string></value></param>'
        
.'<param><value><int>150</int></value></param>'
        
.'</params>'
        
.'</methodCall>';
 
   
$rpcurl BLOG_URL.'/xmlrpc.php';
   
$tmp_in '/dev/shm/' uniqid("in_") . '.dat';
   
$tmp_out '/dev/shm/' uniqid("out_") . '.dat';
 
   
file_put_contents($tmp_in$postbody);
   
exec('/usr/bin/curl -s -A "Saturn Bot; http://www.saturn.in" --data-binary @'.$tmp_in.' '.$rpcurl.' > ' $tmp_out);
 
   
$xml simplexml_load_file($tmp_out);
   
unlink($tmp_out);
   
unlink($tmp_in);
 
   
$data $xml->params->param->value->array->data->value;
   
$rv = array();
   foreach(
$data as $item){
     
$tmp getPostData($item);
     if(
$tmp)
        
$rv[] = $tmp;
   }
   return 
$rv;
}

This function uses the mt.getRecentPostTitles and gets the latest 150 posts from the blog, uses simple_xml library to parse the resulting xml response from wordpress xml rpc. In continuation it calls getPostData which is listed below, to identify the post data, in our case the title, postid and date, if date is not yesterday, it will be skipped by returning false.

function getPostData(&$sXmlObj){
   
$rv = array();
   foreach(
$sXmlObj->struct->member as $props){
       
$p json_decode(json_encode($props), true);
       switch(
$p['name']){
        case 
'dateCreated':
              list(
$ds,$ts) = explode('T'$p['value']['dateTime.iso8601']);
              if(
$ds <> date("Ymd"strtotime("yesterday")))
            return 
false;
        break;
        case 
'postid':
          
$rv['postid'] = $p['value']['string'];
        break;
        case 
'title':
          
$rv['title'] = $p['value']['string'];
        break;
       }
   }
   return 
$rv;
}

The list of posts created using above functions are then iterated through to consider each post on its own. The processing flow is through the code body as shown below.


$cc = 0;
foreach($list as $post){
   $links = getExternLinks($post['title']);
   $content = getPostFromWp($post['postid']);
 
   if(empty($links) or !$content) continue;
 
      echo $post['title'] . "\n";
 
   $extraContent = "\n<!--more-->\n" .'<strong>Possibly Related</strong>' . "\n" .'<ul>';
   foreach($links as $pingb){
        echo "\t" . $pingb['link'] . "\n";
    $extraContent .= '<li><a href="'.$pingb['link'].'">'.$pingb['title'].'</a><!-- score: '.$pingb['rank'].' --></li>' . "\n"; 
   }
   $extraContent .= "</ul>\n";
   $cc++;
 
   editPostWp($post['postid'], $content . $extraContent);
   sleep (2);
 
}
 

The flow above uses some more functions which are listed here. Function getExternLinks passes the title to a google search after sanitising the input and further simple urlencode, ie spaces will be replaced with a ‘+’, and add this to the google blog search, with option to get results as feed, and items perpage 20. Using the rss description and title, a weighted rank is calculated in function getWtRank. The rank mapped array of external links are sorted using a user function cmp.

The function getPostFromWp illustrated lower down, uses XML RPC method blogger.getPost, to get the post contents from our blog. Once the content is retrived, and the links to be added are identified, they are merged together with the <!–more–> tag, and applied to the blog using function editPostWp.


function getExternLinks($title){
 
  $title = str_replace('++','+',str_replace(' ','+',preg_replace("@([^a-zA-Z0-9 ])@",'',$title)));
  $tmp_out = '/dev/shm/' . uniqid("out_") . '.dat';
  $blogSearch = 'http://blogsearch.google.com/blogsearch_feeds?hl=en&q='.$title.'&ie=utf-8&num=20&output=rss';
  exec('/usr/bin/curl -s -A "Saturn Bot; http://www.saturn.in" "'.$blogSearch.'" > ' . $tmp_out);
  $xml = simplexml_load_file($tmp_out);
  unlink($tmp_out);
  
  $links = array();
  $wList = explode('+', $title);
  global $stopWords;
  foreach($wList as $k => $v){
    if(trim($v) == '' or in_array($v, $stopWords))
    unset($wList[$k]);
  }
  global $ddlist;
  foreach($xml->channel->item as $so){
     $link = (string) $so->link;
     if(preg_match('@('.join('|', $ddlist) . ')@', $link)) continue;
     $links[] = array('link' => $link, 'title' => (string) $so->title, 'rank' => getWtRank($wList, $so));
  }
  usort($links, 'cmp');
  $links = array_slice($links, 0, 5);
  return $links;  
}
 
function getWtRank($wList, &$so){
   $txt = (string) $so->title . ' ' . (string) $so->description;
   $wt = 0;
   foreach($wList as $k){
     if(trim($k) == '') continue;
     $wt += substr_count($txt, $k);
   }
   return $wt;
}
 
function cmp($a, $b){
    if ($a['rank'] == $b['rank']) {
        return 0;
    }
    return ($a['rank'] < $b['rank']) ? 1 : -1;
}
 

function getPostFromWp($postId){
  $postbody = '<' .'?xml version="1.0" encoding="iso-8859-1"?' . '>'
        .'<methodCall>'
        .'<methodName>blogger.getPost</methodName>'
        .'<params>'
        .'<param><value><int>1</int></value></param>'
        .'<param><value><int>'.$postId.'</int></value></param>'
        .'<param><value><string>[blog username]</string></value></param>'
        .'<param><value><string>[blog password]</string></value></param>'
        .'</params>'
        .'</methodCall>';
 
   $rpcurl = BLOG_URL . '/xmlrpc.php';
   $tmp_in = '/dev/shm/' . uniqid("in_") . '.dat';
   $tmp_out = '/dev/shm/' . uniqid("out_") . '.dat';
 
   file_put_contents($tmp_in, $postbody);
   exec('/usr/bin/curl -s -A "Saturn Bot; http://www.saturn.in" --data-binary @'.$tmp_in.' '.$rpcurl.' > ' . $tmp_out);
   $xml = simplexml_load_file($tmp_out);
   unlink($tmp_in);
   unlink($tmp_out);
   
   $data = json_decode(json_encode($xml), true); 
 
   $str = false;
   foreach($data['params']['param']['value']['struct']['member'] as $o){
      if($o['name'] == 'content')
    $str = $o['value']['string'];
   }
   return $str;
}
 

function editPostWp($id, $body){

   $XML = htmlentities($body);

   $postbody = '<' .'?xml version="1.0" encoding="iso-8859-1"?' . '>'
        .'<methodCall>'
        .'<methodName>blogger.editPost</methodName>'
        .'<params><param><value><string/></value></param>'
        .'<param><value><int>'.$id.'</int></value></param>'
        .'<param><value><string>[username]</string></value></param>'
        .'<param><value><string>[password]</string></value></param>'
        .'<param><value><string>'.$XML.'</string></value></param>'
        .'<param><value><string>publish</string></value></param></params>'
        .'</methodCall>';

   $rpcurl = BLOG_URL . '/xmlrpc.php';
   $tmpnam = '/dev/shm/' . uniqid("") . '.dat';
   file_put_contents($tmpnam, $postbody);
   exec('(/usr/bin/curl -s -A "Saturn Bot; http://www.saturn.in" --data-binary @'.$tmpnam.' '.$rpcurl.' >> /tmp/autolinker.log; rm -f '.$tmpnam.') &');
}

  1. July 19th, 2012 at 17:04 | #1

    Hi everybody, here every one is sharing such know-how, therefore it’s good to read this website, and I used to pay a quick visit this webpage daily.

  1. No trackbacks yet.

nineteen − = nine