2018-09-24

Checking the database take 2

Last week I tried to DBCHECK all tables in the Data Warehouse. I did the medium check,, but it took too
long. This time I did a QUICK check, and it was definitely quicker than MEDIUM, just one and a
half hour for 3583 tables. But I had some unexpected errors:
Exec CHECK TABLE PTDDW.PTD_OTIS_OPEN_DOCUMENTS_2018-01-05 QUICK;
   
Note SQLSTATE=42000, ERRORNO=1064    
Note You have an error in your SQL syntax; check the manual that corresponds to your MySQL server
version for the right syntax to use near '-01-05 QUICK' at line 1
There were some 20 tables with hyphens in their name, otherwise the quick check was a success,
no other errors. I have to add missing backticks.
<?xml version='1.0' encoding='UTF-8' standalone='yes'?>
<schedule>
  <job name='checkTables'>
     <forevery chunks='10' parallel='yes' rowdirectory='no'>
<sql>
    SELECT a.table_schema DB,a.table_name TB
    FROM information_schema.tables a
    WHERE  a.TABLE_TYPE = 'BASE TABLE';
</sql>
</forevery>
    <myiterator piggyback='forevery'/>

    <sql>
CHECK TABLE `@DB`.`@TB` QUICK;
    </sql>
  </job>
</schedule>
Now the remaining question is - Is the quick check meaningful for me. I have to read the manual.   

2018-09-20

Checking the database

I’m writing a series of posts about migrating out Data Warehouse Database from MySQL version 5.6 to 5.7
(ultimately 5.8). Yesterday I did a test to check the present database, I wrote a script checking all tables
(little less than 4000) in 10 parallel streams, it did not go well, after some 4 hours and still 2500 tables to
check I cancelled the check tables script. Some tables took more than 30 minutes to medium check.
Maybe I’m checking to many in parallel, anyway I decided not to go for a medium check but a quick check
I test this Sunday night.
<?xml version='1.0' encoding='UTF-8' standalone='yes'?>
<schedule>
  <job name='checkTables'>
     <forevery chunks='10' parallel='yes'>
<sql>SELECT a.table_schema DB,a.table_name TB
   FROM information_schema.tables a
   WHERE  a.TABLE_TYPE = 'BASE TABLE';
</sql>
     </forevery>
     <myiterator piggyback='forevery'/>
     <sql>
CHECK TABLE @DB.@TB MEDIUM;
     </sql>
  </job>
</schedule>


Why do I do this? Before the migrate it is good to know that all tables are ok, but I cannot wait some 10 to
15 hours for the check to finish.

2018-09-10

Upgrade MySql 5.5 temporals

This weekend I converted all temporals in the Data Warehouse from pre-version 5.6 to the version 5.6 internal
format. It affected about 600 tables, the conversion went hassle free ‘ALTER TABLE FORCE’
on every of those 600 did the trick. Happily I went to the office this morning. First thing that happened after
starting the PC  I got a chat message ‘We have a problem with an application all part names are missing
and there are no vendor info either. I almost died, what had I done this time? I had checked as carefully
I could for any errors during the temporals conversion and there was none. When i had calmed down a bit I
digged into the application and found the material master for a factory was empty. At the same time
the BI developer who contacted me called and told me the material master is empty and it is because we
archived the old material master, it is actually used when we update the material master, can we restore it.
Oh that's a relieve, this time I didn’t fuck up.

It turned out the old material master was from my own old ERP system, that was replaced by SAP 2006.
It came as a big surprise it was in use. We restored the old material master and rerun the material master
update job and we were back in business. But we will remove the old material master from all jobs and
then archive it again as part of the preparation for the Mysql 5.6 to 5.7 conversion.


The weekend temporal conversion was simple but it took quite a lot of reading, finding what the problem
was how to identify tables affected and then run this long running conversion. This SQL script found affected
tables and created ‘ALTER TABLE’ statements for each of the tables:
SET show_old_temporals = ON;
SELECT distinct concat('ALTER TABLE ',a.table_schema,'.',a.table_name,' FORCE;')
FROM information_schema.columns a
join information_schema.tables b on b.table_schema = a.table_schema and b.table_name = a.table_name
WHERE a.column_type LIKE '%time% /* 5.5 binary format */' and b.TABLE_TYPE = 'BASE TABLE'
;
Then it was just a matter of cut and paste these statements into an ITL job and run it:
<?xml version='1.0' encoding='UTF-8' standalone='yes'?>
<schedule mustcomplete='yes' logresult='yes' >
  <job name='alterTables' type='sql'>    
   <sql>
     ALTER TABLE FEVENTEST.PD013PTD_STOCK FORCE;                             
     ALTER TABLE FEVENTEST.PD014PTD_OPEN_ORDERS FORCE;                       
     ALTER TABLE FEVENTEST.PD015PTD_REC_ORDERS FORCE;                        
     ALTER TABLE FEVENTEST.PD016PTD_STOCK FORCE;                             
     ALTER TABLE FEVENTEST.PD017PTD_INV_ORDERS FORCE;                        
     ...
   </sql>
  </job>
</schedule>
By this conversion I hope the 5.7 upgrade conversion time is cut by a lot.  

2018-09-09

Upgrade MySql from 5.6 to 5.7

When I first created The Data Warehouse I envisioned it to be a “poor man’s Data Warehouse”,
due to the small budget I had. I expected it to be good and well working but nevertheless on the cheap side.
But it didn’t take long before I realised I could build a top class Data Warehouse on scrapped hardware and
Mandrake Linux. Already after a year I realised that the prototype I had build was in most aspects better than
anything on the market. At that time MySQL MyIsam storage engine was a dream for the Data Warehouse
database, primitive yes, but had all features we needed and it was fast, very fast on the scrapped PCs we run it on.
Easy to copy and backup, actually we didn’t do any database administrative tasks at all apart from backups.
Ulf the first developer never took any notice of my warnings “You must pay attention to indexes they are very important.”  
“I have no time for your diddle-daddle I add single column indexes I know will be used and that’s it.” he replied.
And that’s basically the database administrative we have done now for eighteen years. (There have been
a few cases where we have been forced to create composite indexes.)
We do not have a test environment on purpose, when needed we create a test database on the production
database server. The single production environment is one of the key success factors for the Data Warehouse,
it cuts down development and debugging time a lot. There is a slight drawback with this approach as we never
clean out rubbish and not used tables we have piled up a lot of ‘dead’ databases and tables over the years.
It is much like DNA The Data Warehouse evolves tables are replace or get obsolete, test database are abandoned etc
Of the 1 terabyte data in the Data Warehouse about 30 to 60 percent is not used or rubbish that should not
be used. But who cares it’s just disk space and bigger backups. We do not waste time on administration
we maintain and develop new apps 100 percent of the working hours.  The waste do not do much harm,
or rather up until now, we are now preparing to upgrade MySql from version 5.6 to 5.7, and a test migration took +30 hours,
The mysql-upgrade script just never stopped or so it seemed. More about that in the next post.