The other week I was asked to help with importing a free text column from SAP to The Data Warehouse.
Easy i thought, but it proved to be a nightmare due to the free text, the text included control characters
and ‘LOAD DATA’ delimiters and embedding characters. The SAP connector writes the data to a file
before LOAD data into the Data Warehouse MySQL database. I first tried to squeeze out or
replace problem characters, but all attempts failed one way or another.
Finally I realised the best way to solve the problem was to add embedding characters in the extraction
before I write the data to a file. But the SAP connector is a complex piece of code and I really did not
want change it as testing is very tedious. While scrutinizing the code this line caught my attention:
The lambda in there started my memory, didn’t I wrote something to transform columns and sure enough,the getColumnClosures code did create a function from the ITL xml code,
$collambda = getColumnClosures($context,$job,$fce, $id);
|
function getColumnClosures($context,$job,$fce, $id) {
$log = $GLOBALS['log'];
$fceDef = $fce->def[$id];
$sapTable = $fceDef['name'];
$log->enter('Trace',"for table=$sapTable, id=$id");
$tabColFunc = [];
if (array_key_exists('columnfunctions',$job['sap'][0])){
$cft = $job['sap'][0]['columnfunctions'][0]['table'];
for ($i=0;$i<count($cft);$i++){
if ($cft[$i]['name'] == "$sapTable"){
foreach($cft[$i] as $key => $tabcol){
if ($key == 'name') continue;
$colnm = key($tabcol);
$fstr = trim($tabcol[0]['value']);
$tabColFunc[$key] = create_function('$saprow, $value',$fstr.';return $value;');
}
break;
} //table
}
}
return $tabColFunc;
}
|
So instead of being ridiculously difficult,it became childs play just add a column function that embedd
the freetext column and replace the embedding character and field delimiter.
<job name='getZ' type='script' pgm='sap2.php'>
<tag name='ED' value='"' cmt='The column embedder used in SQL load'/>
<tag name='DL' value=';' cmt='The column delimiter'/>
<sap>
<columnfunctions>
<table name='DATA'>
<WA><![CDATA[
$va = explode('@DL',$value,5);
$va[4] = str_replace('@DL','?',$va[4]);
$va[4] = str_replace('@ED','?',$va[4]);
$va[4] = '"'.$va[4].'@ED';
return implode('@DL',$va);
]]></WA>
</table>
</columnfunctions>
<rfc>
<name>Z_LJ_READ_TABLE</name>
<import>
('QUERY_TABLE','ZTDC_DTR_TEXT_XL')
,('DELIMITER','@DL')
,('NO_DATA','')
,('ROWSKIPS',0)
,('ROWCOUNT',0)
,('OPTIONS',array(TEXT => "DTR_NUMBER GT ' ' and MANDT EQ '300'"))
</import>
</rfc>
</sap>
<sql>
<autoload>replace</autoload><database>@C_DB</database><truncate>yes</truncate>
<table>
<name>ZTDC_DTR_TEXT_XL=ALJZTDC_DTR_TEXT_XL</name>
<fieldclosure>@ED</fieldclosure>
</table>
</sql>
</job>
|
It’s sad though it took me too long to find this out, the guys asking for my help had already created a
workaround when I found my ‘lambda function” but better late than never.
Finally the right column below is free text from 2011, today there are +500,000 rows in there.
300
|
0000000004
|
FIN
|
54
|
06.12.2011 :
This part will be in stock 13/12/2...
|
300
|
0000000004
|
NOT
|
175
|
01.12.2011 :
Hoi Wendy,
Zou je aub kunnen beve...
|
300
|
0000000005
|
FIN
|
69
|
12.12.2011 :
Dear,
Mid next week in our whs....
|
300
|
0000000005
|
INT
|
39
|
12.12.2011 : Waiting for info from prod
|