2018-08-27

Importing free text from SAP

The other week I was asked to help with importing a free text column from SAP to The Data Warehouse.
Easy i thought, but it proved to be a nightmare due to the free text, the text included control characters
and ‘LOAD DATA’ delimiters and embedding characters. The SAP connector writes the data to a file
before LOAD data into the Data Warehouse MySQL database. I first tried to squeeze out or
replace problem characters, but all attempts failed one way or another.

Finally I realised the best way to solve the problem was to add embedding characters in the extraction
before I write the data to a file. But the SAP connector is a complex piece of code and I really did not
want change it as testing is very tedious. While scrutinizing the code this line caught my attention:


$collambda = getColumnClosures($context,$job,$fce, $id);
The lambda in there started my memory, didn’t I wrote something to transform columns and sure enough,the getColumnClosures code did create a function from the ITL xml code,

function getColumnClosures($context,$job,$fce, $id) {
$log = $GLOBALS['log'];
$fceDef = $fce->def[$id];
$sapTable = $fceDef['name'];
$log->enter('Trace',"for table=$sapTable, id=$id");
$tabColFunc = [];
if (array_key_exists('columnfunctions',$job['sap'][0])){
 $cft = $job['sap'][0]['columnfunctions'][0]['table'];
 for ($i=0;$i<count($cft);$i++){
   if ($cft[$i]['name'] == "$sapTable"){
     foreach($cft[$i] as $key => $tabcol){
if ($key == 'name') continue;
$colnm = key($tabcol);
$fstr = trim($tabcol[0]['value']);
$tabColFunc[$key] = create_function('$saprow, $value',$fstr.';return $value;');
     }
     break;
   } //table
 }
}
return $tabColFunc;
}
So instead of being ridiculously difficult,it became childs play just add a column function that embedd
the freetext column and replace the embedding character and field delimiter.  


<job name='getZ' type='script' pgm='sap2.php'>
  <tag name='ED' value='"' cmt='The column embedder used in SQL load'/>
  <tag name='DL' value=';' cmt='The column delimiter'/>
   <sap>
     <columnfunctions>
   <table name='DATA'>
<WA><![CDATA[
$va = explode('@DL',$value,5);
$va[4] = str_replace('@DL','?',$va[4]);
$va[4] = str_replace('@ED','?',$va[4]);
$va[4] = '"'.$va[4].'@ED';
return implode('@DL',$va);
]]></WA>
     </table>
</columnfunctions>
  <rfc>
   <name>Z_LJ_READ_TABLE</name>
   <import>
('QUERY_TABLE','ZTDC_DTR_TEXT_XL')
,('DELIMITER','@DL')
,('NO_DATA','')
,('ROWSKIPS',0)
,('ROWCOUNT',0)
,('OPTIONS',array(TEXT => "DTR_NUMBER GT ' ' and MANDT EQ '300'"))
   </import>
</rfc>
   </sap>
   <sql>
<autoload>replace</autoload><database>@C_DB</database><truncate>yes</truncate>
<table>
   <name>ZTDC_DTR_TEXT_XL=ALJZTDC_DTR_TEXT_XL</name>
   <fieldclosure>@ED</fieldclosure>
</table>
   </sql>
</job>

It’s sad though it took me too long to find this out, the guys asking for my help had already created a
workaround when I found my ‘lambda function” but better late than never.  

Finally the right column below is free text from 2011, today there are +500,000 rows in there.

300
0000000004
FIN
54
06.12.2011 :

This part will be in stock 13/12/2...
300
0000000004
NOT
175
01.12.2011 :
Hoi Wendy,

Zou je aub kunnen beve...
300
0000000005
FIN
69
12.12.2011 :

Dear,

Mid next week in our whs....
300
0000000005
INT
39
12.12.2011 : Waiting for info from prod

No comments:

Post a Comment