Hi, I'm ThadeusB.

I code stuff. I raise bees. I game.

The UTF-8 BOM-B

UTF-8 Has this little feature called a byte-order-mark. The thing about this mark is however, that usually its only known that Windows programs actually place this mark into saved files. Leave it to a Windows “feature” to trash my web server =0.

The funny thing? I was using OpenOffice.org Writer to make my blog post instead of my usual gedit! I can't believe OpenOffice.org actually implements Windows only things!

Many Windows programs (including Windows Notepad) add BOMs to UTF-8 files by default. However in Unix-like systems (which make heavy use of text files for file formats as well as for inter-process communication) this practice will interfere with correct processing of important codes such as the shebang at the start of an interpreted script.

Even the UTF-8 spec doesn't recommend using a BOM!!!

While Unicode standard allows BOM in UTF-8, it does not require or recommend it.

Since this site is in python. It is trivial fix, but an annoying one to say the least.

text = open('myBOMbedfile.rst').read().strip('\xef\xbb\xbf')