Python: Read UTF-8 file in Windows environment

Other language site
ja ja
Google Translate
  • -

    シェア
  • ---

    LINEで送る
  • -

    はてなブックマーク
    ブックマーク
  • -

    pocket
  • -

    rss
python logo

"UnicodeEncodeError: 'cp932' codec can not encode character '\ xe2' in position 4703: illegal multibyte sequence" occurs when trying to read a UTF-8 file in Python of the Windows environment.

I will fix it.

In Windows, the character code of the file defaults to 'cp932'.

If you try to read the file 'utf-8' in this state, "UnicodeEncodeError" will be generated.

I will fix this.

f_add = open(out_file + '.tmp', 'a', encoding='utf-8')

This gives the same processing result on Linux (Mac) / Windows.

By the way, when reading utf-8 with bom, use 'utf-8_sig'.

Utf-8 with bom is utf-8 adopted in MSExcel etc. It is almost never used outside windows.

When developing in Windows environment, not only Python, it is recommended to set in advance the character code 'utf-8' and the line feed code 'LF'.

Because MS is more special than that.

First of all, should I use a Mac?

Although it sometimes says, in a Japanese company, Windows owns the company owned PC.

The reason for this is that the mainstream of PCs used by end users is Windows, so it is convenient to consider document creation, testing, etc.

Japanese end users are overwhelmingly seeking document creation with Word, Excel, etc., are not they?

The other reason is that the cost of the Mac is high.

People who have economical freedom to some extent such as freelance, I think that using the Mac is the best.

SNS also distributes articles.
Leave a Reply

*

If you like this article, share it!