In the previous posts I have described how I grabbed the recipients of mails from SharePoint to a database in the Data Warehouse, now it’s time to deal with the mail templates there were five template in eight languages forty in all.. I asked for one english template, to my utmost stupefying horror the reply was “Oh the templates are Outlook documents.”
I began to sweat, “Outlook documents I cannot handle Outlook documents, surly the templates will be simple proper HTML scripts” I stuttered in a whisper. “No we work with Outlook documents, it must be Outlook documents, but they can easily be converted into HTML”. Now when I thought I was more or less done, I realised I was out on a quagmire of mojibake hell. I could well imagine what Outlook html looked like, but of course it was worse, first I could not believe my eyes when I saw the shitload of crap the html conversion cranked out, this is the reason every sane person prefers JSON notation. Structural fascists have kidnapped, raped and abused XML/HTML by introducing an inconceivable complex standard which some interprets as “everything in the standard must be used in every document”. To my surprise the the first test template could be converted to utf-8 by:
iconv -f iso-8859-1 -t utf-8 templatename > utf-8templatename
After many hours and harsh words I had replaced “personal text” with my ‘@’ symbols like ‘@xQname’, removed distorting HTML code and reapplied images I was ready to produce my first mail. Of course the mail looked like shit anything other than english letters was utterly destroyed. I was in deep despair, I went through all I had done to fix the Outlook document and I was sure I had converted the entire document to UTF-8 and I processed it all the way as UTF-8, still it was trashed by some stupid translation process. I called a friend, Andreas he told me there is probably a meta tag in the document specifying the character set or encoding used creating the document, and yes I found charset=windows-1252 a character set I never had heard of “What is that and why is it there?”
“That you have to ask Microsoft about, welcome to the wonderful world of email encodings” Andreas replied. I changed the tag to UTF-8, lo and behold a correct mail entered my Outlook mailbox, I was so happy I almost started to cry. I use PHPmailer for the mail delivery, a fine piece of software, that most likely hide away a lot of encoding details. I not for a second think my ordeal is over yet, each document will certainly expose new HTML features I will trip on, but until then I can enjoy my small victory over encodings.
In this post I describe my encoding problems and how I deal with them. I found some encodings in outlook msg template sent to me, one of them was UTF16le, noone of the guys sending templates to me knew about the concept of text encoding neither did the lady who sent the wonderful UTF16le template but I did not care, I just requested she should send all templates to me no one else With templates in UTF16le I could build a reliable process from outlook msg template to an HTML UTF-8 version I can work with, substitute personalised text as email addresses and deliver to recipients via my mail generator.Here you see part of one template in simplified chinese.
My process is manual:
- Replace ‘personal text’ with my symbolics, the red above is replace by @xQmigrationDate
- Convert the outlook msg template to HTM
- FTP relevant HTM parts to the Data Warehouse
- Convert the template from UTF16le to UTF8.
- Change meta tag charset to UTF8
- Remove some distorting crap from the templates.
- 'Reapply images'
- Save the template in a map
That’s it. Now I only have to create the mail generator, which finally is the topic for the next post.