define encoding of source code files

  • # -*- coding: latin-1 -*-
  • # -*- coding: utf-8 -*-

Encoding, decoding, oh my

it is OK to print unicode strings. apparently the print statement takes care of encoding the string to the encoding that stdout uses.

it is not OK to write unicode strings to streams; you have to convert your unicode string to a sequence of bytes, by encoding it before writing it to the stream.

There is no such thing as text files; when somebody gives you a "text file", all you get is a sequence of bytes. If you want to get the text in there, you'll have to decode (I prefer thinking of the process as to decrypt) that sequence of bytes, to obtain a sequence of letters.

Regular expressions

The re objects let you search in 2 ways: match and search. match is stupid and only matches at the beginning of the string. What you want most of the time is search.

.findall gets you all the matches in the given string, as strings.

pylint

global ignores: put something like the following at the beggining of the file:

# pylint: disable=R0911,C0302,R0914

for a local ignore, put the comment above the line causing the issue.