It has been a long time since I wrote here about The Data Warehouse. I have spend time upgrading PHP from 5.6
to 7.4 and Mysql latin1 to utf8. None of this is in production yet. I also spent time on the long overdue Mysql from
5.6 to 5.7, which makes me go bonkers. In the old days when I had full control of the environment I just downloaded
and installed the MySQL version I wanted. Now I try to follow Ubuntu's rules, and I just fail, I can upgrade a test server,
but the prod server refuses to be upgraded. I have no clue what goes wrong. This is really the backside of complex system
procedures that intercept you from compromising the system, in this case the Ubuntu system. I'm not interested in digging into the
finer details of Ubutu's APT system, but it seems I will in the end.
Anyhow I did some SQL tuning lately, this is boring laborious work, but at the same time satisfyingly rewarding.
Some weeks ago I stumbled upon this code:
insert into MAEX_US ( SELECT A.MATNR, A.ALAND, A.GEGRU, A.ALNUM FROM MASTERDATA.MAEX A WHERE A.GEGRU='US' ); insert into MAEX_EU ( SELECT A.MATNR, A.ALAND, A.GEGRU, A.ALNUM FROM MASTERDATA.MAEX A WHERE A.GEGRU='EU' ); insert into MAEX_E1 ( SELECT A.MATNR, A.ALAND, A.GEGRU, A.ALNUM FROM MASTERDATA.MAEX A WHERE A.GEGRU='E1' );
INSERT INTO LEGAL_REGULATION_BOM SELECT A.PLANT ,A.MATNR ,A.DESCRIPTION ,A.EXT_MAT_GRP ,A.MATNR_COST ,A.MATNR_COST_CURR ,A.PARENT ,A.COMPONENT ,A.BOM_LEVEL ,A.QTY ,A.COMPONENT_COST ,A.COMPONENT_COST_CURR ,A.UNIT_OF_MEASURE ,A.COMP_DESCRIPTION ,A.ORIGIN ,A.DUTYNO ,A.BOM_PLANT_IS_PTD_0101_ACQPLANT ,CASE WHEN A.PLANT IN ('0101','0104') THEN B.ALNUM WHEN A.PLANT='6100' THEN C.ALNUM WHEN A.PLANT='6130' THEN D.ALNUM WHEN A.PLANT IN ('1100','1110') THEN E.ALNUM WHEN A.PLANT IN ('3110','3115') THEN N.ALNUM WHEN A.PLANT='3125' THEN Q.ALNUM ELSE 'MISSING_LEGI' END AS 'E1_LEG' ,CASE WHEN A.PLANT IN ('0101','0104') THEN F.ALNUM WHEN A.PLANT='6100' THEN G.ALNUM WHEN A.PLANT='6130' THEN H.ALNUM WHEN A.PLANT IN ('1100','1110') THEN I.ALNUM WHEN A.PLANT IN ('3110','3115') THEN O.ALNUM WHEN A.PLANT='3125' THEN R.ALNUM ELSE 'MISSING_LEGI' END AS 'EU_LEG' ,CASE WHEN A.PLANT IN ('0101','0104') THEN J.ALNUM WHEN A.PLANT='6100' THEN K.ALNUM WHEN A.PLANT='6130' THEN L.ALNUM WHEN A.PLANT IN ('1100','1110') THEN M.ALNUM WHEN A.PLANT IN ('3110','3115') THEN P.ALNUM WHEN A.PLANT='3125' THEN S.ALNUM ELSE 'MISSING_LEGI' END AS 'US_LEG'
FROM LEGAL_REGULATION_TEMP_BOM A LEFT JOIN MAEX_E1 B ON A.COMPONENT=B.MATNR AND B.ALAND='BE' LEFT JOIN MAEX_E1 C ON A.COMPONENT=C.MATNR AND C.ALAND='HU' LEFT JOIN MAEX_E1 D ON A.COMPONENT=D.MATNR AND D.ALAND='IT' LEFT JOIN MAEX_E1 E ON A.COMPONENT=E.MATNR AND E.ALAND='SE' LEFT JOIN MAEX_E1 N ON A.COMPONENT=N.MATNR AND N.ALAND='FR' LEFT JOIN MAEX_E1 Q ON A.COMPONENT=Q.MATNR AND Q.ALAND='US' LEFT JOIN MAEX_EU F ON A.COMPONENT=F.MATNR AND F.ALAND='BE' LEFT JOIN MAEX_EU G ON A.COMPONENT=G.MATNR AND G.ALAND='HU' LEFT JOIN MAEX_EU H ON A.COMPONENT=H.MATNR AND H.ALAND='IT' LEFT JOIN MAEX_EU I ON A.COMPONENT=I.MATNR AND I.ALAND='SE' LEFT JOIN MAEX_EU O ON A.COMPONENT=O.MATNR AND O.ALAND='FR' LEFT JOIN MAEX_EU R ON A.COMPONENT=R.MATNR AND R.ALAND='US' LEFT JOIN MAEX_US J ON A.COMPONENT=J.MATNR AND J.ALAND='BE' LEFT JOIN MAEX_US K ON A.COMPONENT=K.MATNR AND K.ALAND='HU' LEFT JOIN MAEX_US L ON A.COMPONENT=L.MATNR AND L.ALAND='IT' LEFT JOIN MAEX_US M ON A.COMPONENT=M.MATNR AND M.ALAND='SE' LEFT JOIN MAEX_US P ON A.COMPONENT=P.MATNR AND P.ALAND='FR' LEFT JOIN MAEX_US S ON A.COMPONENT=S.MATNR AND S.ALAND='US' |
It took me quite a while to figure out the meaning of all these left joins.It is a nifty way of transpose ALNUM from the MAEX table over MATNR and ALAND into E1_LEG, EU_LEG and US_LEG like:
This code is clever or rather it was clever when constructed but now the data volumes have grown to large numbers
disclosing the intrinsic weakness of the many joins, today they take a very long time to execute (some 1100 seconds).
I failed to come up with a better way to do the transpose in the last SQL. I then ‘parked’ this problem.
Two days ago while jogging I realized I attacked the problem wrongly, it was not the last query that was the root problem,
but the first three queries splitting the MAEX table. If I do a full transpose over MATNR, I end up with one row per MATNR
which is what I need in the next phase.I remembered I read about a similar problem some years ago where the max
function was combined with group by.
insert into MAEX_PIVOT ( SELECT A.MATNR ,max(if (A.GEGRU='E1' and A.ALAND = 'BE', A.ALNUM, NULL)) as E1BE_ALNUM ,max(if (A.GEGRU='E1' and A.ALAND = 'HU', A.ALNUM, NULL)) as E1HU_ALNUM ,max(if (A.GEGRU='E1' and A.ALAND = 'IT', A.ALNUM, NULL)) as E1IT_ALNUM ,max(if (A.GEGRU='E1' and A.ALAND = 'SE', A.ALNUM, NULL)) as E1SE_ALNUM ,max(if (A.GEGRU='E1' and A.ALAND = 'FR', A.ALNUM, NULL)) as E1FR_ALNUM ,max(if (A.GEGRU='E1' and A.ALAND = 'US', A.ALNUM, NULL)) as E1US_ALNUM ,max(if (A.GEGRU='EU' and A.ALAND = 'BE', A.ALNUM, NULL)) as EUBE_ALNUM ,max(if (A.GEGRU='EU' and A.ALAND = 'HU', A.ALNUM, NULL)) as EUHU_ALNUM ,max(if (A.GEGRU='EU' and A.ALAND = 'IT', A.ALNUM, NULL)) as EUIT_ALNUM ,max(if (A.GEGRU='EU' and A.ALAND = 'SE', A.ALNUM, NULL)) as EUSE_ALNUM ,max(if (A.GEGRU='EU' and A.ALAND = 'FR', A.ALNUM, NULL)) as EUFR_ALNUM ,max(if (A.GEGRU='EU' and A.ALAND = 'US', A.ALNUM, NULL)) as EUUS_ALNUM ,max(if (A.GEGRU='US' and A.ALAND = 'BE', A.ALNUM, NULL)) as USBE_ALNUM ,max(if (A.GEGRU='US' and A.ALAND = 'HU', A.ALNUM, NULL)) as USHU_ALNUM ,max(if (A.GEGRU='US' and A.ALAND = 'IT', A.ALNUM, NULL)) as USIT_ALNUM ,max(if (A.GEGRU='US' and A.ALAND = 'SE', A.ALNUM, NULL)) as USSE_ALNUM ,max(if (A.GEGRU='US' and A.ALAND = 'FR', A.ALNUM, NULL)) as USFR_ALNUM ,max(if (A.GEGRU='US' and A.ALAND = 'US', A.ALNUM, NULL)) as USUS_ALNUM FROM MASTERDATA.MAEX A WHERE A.GEGRU in ('E1','EU','US') and A.ALNUM is not NULL group by A.MATNR ); 220720 075316.546624 8628 Note SQLSTATE=00000, ERRORNO=0 220720 075316.546695 8628 Note SQLINFO Records: 32606 |
This request replacing the three original requests splitting the MAEX table gave MATNR with 18 ‘transpose’ columns
on each row:
Then it was just a matter of remove joins and map the ‘transpose’ columns to the right position:
INSERT INTO LEGAL_REGULATION_BOM SELECT A.PLANT ,A.MATNR ,A.DESCRIPTION ,A.EXT_MAT_GRP ,A.MATNR_COST ,A.MATNR_COST_CURR ,A.PARENT ,A.COMPONENT ,A.BOM_LEVEL ,A.QTY ,A.COMPONENT_COST ,A.COMPONENT_COST_CURR ,A.UNIT_OF_MEASURE ,A.COMP_DESCRIPTION ,A.ORIGIN ,A.DUTYNO ,A.BOM_PLANT_IS_PTD_0101_ACQPLANT ,CASE WHEN A.PLANT IN ('0101','0104') THEN B.E1BE_ALNUM WHEN A.PLANT='6100' THEN B.E1HU_ALNUM WHEN A.PLANT='6130' THEN B.E1IT_ALNUM WHEN A.PLANT IN ('1100','1110') THEN B.E1SE_ALNUM WHEN A.PLANT IN ('3110','3115') THEN B.E1FR_ALNUM WHEN A.PLANT='3125' THEN B.E1US_ALNUM ELSE 'MISSING_LEGI' END AS 'E1_LEG' ,CASE WHEN A.PLANT IN ('0101','0104') THEN B.EUBE_ALNUM WHEN A.PLANT='6100' THEN B.EUHU_ALNUM WHEN A.PLANT='6130' THEN B.EUIT_ALNUM WHEN A.PLANT IN ('1100','1110') THEN B.EUSE_ALNUM WHEN A.PLANT IN ('3110','3115') THEN B.EUFR_ALNUM WHEN A.PLANT='3125' THEN B.EUUS_ALNUM ELSE 'MISSING_LEGI' END AS 'EU_LEG' ,CASE WHEN A.PLANT IN ('0101','0104') THEN B.USBE_ALNUM WHEN A.PLANT='6100' THEN B.USHU_ALNUM WHEN A.PLANT='6130' THEN B.USIT_ALNUM WHEN A.PLANT IN ('1100','1110') THEN B.USSE_ALNUM WHEN A.PLANT IN ('3110','3115') THEN B.USFR_ALNUM WHEN A.PLANT='3125' THEN B.USUS_ALNUM ELSE 'MISSING_LEGI' END AS 'US_LEG' FROM LEGAL_REGULATION_TEMP_BOM A LEFT JOIN MAEX_PIVOT B ON B.MATNR = A.COMPONENT ; 220720 081115.168632 19727 Note SQLSTATE=00000, ERRORNO=0 220720 081115.168706 19727 Note SQLINFO Records: 2583726 |
This reduced the time with 1012 seconds (from 1099 seconds to 87)