2013-09-23

Splitting XML-reports zip & CIFS copy them to Windows

The other day I had a meeting with the project manager for an application that receives files from the Data Warehouse . The procedure for sending files to this application is interesting. First we create a complex XML file, then we zip it and lastly we connect to a Windows server share and copy the zip file. Three jobs to create the file, zip it and send it. The first job creates the XML file by execute a SQL Select and then hand result over to the converter  sqlconverter_xml02.php .

 

The second job zips the output file report0.xml.

The third job ships the zip file ItemMessage.xml to the Windows share:

The job navettiCopy2WIN.xml mounts the windows share as a CIFS file and the copy the zip file to the share and finally unmounts the Windows share:

 If you follow the init actions  you see how it’s done. Note I put in a sleep before unmount, in the past we have had some timing problems. If something goes wrong we execute exit failure .

XML wrong tool for data-interchange.

This job schedule is complex and I was not happy when the project manager of the receiving application told me ‘ The file is too large, can you please chop it up in smaller chunks ’. It’s probably their XML parser that cannot cope with large input. Most XML encoding/decoding I’ve seen is done in memory, so very large XML files tend to exhaust RAM. Actually XML code large data streams is a bad idea since XML encoding is verbose and parsing inefficient. I had problems with my SQL to XML conversion. First I based it on PHP simplexml  but when I exhausted the memory, I took the very bad decision to create my own XML coder sqlconverter_xml02.php (see above), it’s a piece of crap but it works. But I should have said no to large XML data-interchange, Google Buffers or CSV or even JSON are better alternatives.

Job Iterator to the rescue.

Anyway to make the other application work I had to split the XML file. Due to the XML mess I thought the split would be problematic, but after some thinking I realized I could use a  job iterator  and it turned out to be very easy to chop the file. The XML converter sqlconverter_xml02.php do not append the result to an existing file but creates a new sequenced numbered file. If you look below you see the target for the sqlconverter is  ItemMessage . The sqlconverter_xml02.php will name the first result file ItemMessage 0 .xml, the next ItemMessage 1 .xml etc. Subsequently the zipfiles job sweep them all up by the glob wildcard ‘ItemMessage * ’. This test was  done in minutes and worked right out of the box. In production the iterator would be SQL generated based on the actual result size.

First I created a job iterator <forevery> with offset and limit for each chunk, then I added the limit clause  to the SELECT statement and changed the SQL target to ItemMessage. (This SQL converter adds a sequence number to the filename if the file already exists.)

Then I just had to slurp up all created output tables in the zip job and send the zipfile to the Windows share.

If you scrutinize the schedule XML  you find there are some functionality under the hood. I’m still very happy with my PHP job scheduler  and the Integration Tag Language.

I end this post with the zipfile.php  script. The ZipArchive and Zipper classes I found somewhere on the net but I forgot to credit the authors, if you know who should have the honor for these classes please tell me.

$ziplib = fullPathName($context,$job['library']);

$ziplibcreate = $job['create'];

$log->enter('Info',"Zip library=$ziplib, create=$ziplibcreate");

$zip = new Zipper;

$ok = $zip->open("$ziplib", ZipArchive::CREATE);

if ($ok === FALSE) {

        $log->logit('Error',"Failed to create/open zip library=$ziplib, create=$ziplibcreate");

        return FALSE;

}

foreach($job['file'] as $finx => $file) {

  foreach (glob($file['name']) as $globfilename) {

      if ($file['name'] == "$globfilename") $localname=$file['localname'];

      else $localname = '';

      $path=$globfilename;

      if ($localname == '') {

              $path_parts = pathinfo("$path");

              $localname = $path_parts['filename'].'.'.$path_parts['extension'];

      }

      if (is_file($path)) {

          $log->logit('Note',"Zip $path into $localname");

          $zip->addFile("$path", "$localname");

      } else {

          $zip->addDir("$path");

      }        

  }

}

$zip->close();

$_RESULT = TRUE;

return $_RESULT;

class Zipper extends ZipArchive  {

    public function addDir($path) {

//        print 'adding ' . $path . "\n";

        $this->addEmptyDir($path);

        $nodes = glob($path . '/*');

        foreach ($nodes as $node) {

            print $node . '<br>';

            if (is_dir($node)) {

                $this->addDir($node);

            } else if (is_file($node))  {

                $this->addFile($node);

            }

        }

    }

} // class Zipper

No comments:

Post a Comment