2014-12-09

Stop write BOM in UTF-8 encoded files


I’m writing my first Powershell script, which converts CSV files to XML files.
So far it has been an easy ride, but this kept me occupied for some hours.
I looked at my newly created XML file in Google Chrome:

Looks quite pretty, doesn't it?
Now look at the same file in Microsoft Internet Explorer:

Yes that's right, just blanks. Why is that?
This is a snippet from my Powershell script, I think the comments say it all:
# Out-File writes a %&@£€ BOM header
# transXSL "$L" "$xsl" | Out-File $out # DOES NOT WORK! since Out-File writes a %&@£€ BOM header
# So I have to write the file like this so MS Internet Explorer can read the file.

$myXML = transXSL "$L" "$xsl"
[System.IO.File]::WriteAllLines($out, $myXML)

# Google Chrome has no problems mit or mitout the %&@£€ BOM header
# Why in Gods name does anyone force a BOM header onto UTF-8 files

The BOM character is valid in UTF-8 encoding. Just because you are allowed to litter does not mean you have to. Stop put crap into UTF-8 files. And when you read a BOM in UTF-8 files, just throw it away and proceed with next character.
I would like to have a Out-File writing UTF-8 files without BOM characters.

Here you can see the bugger, the first characters are the BOM character, confusing lot of softwares.


No comments:

Post a Comment