What you don't learn in school about production Python

Neel Somani - June 29, 2019

I learned Python in my first computer science course at Berkeley. Through my work experience, I've found that there are some differences between the code that I wrote in college vs. the code that I write in production. This isn't an exhaustive list by any means, but here are some things off the top of my head that stick out to me.

1. Code style: PEP 8 and more

You can (and should) follow a style guide like PEP 8, which is easy enough with an IDE or tools like autopep8. That's often not sufficient to make your code readable, though. Here are some examples:

What's wrong with the following pseudocode?

def check_xyz():
    if cond1:
        ... # some additional computation
        if cond2:
            ... # some additional computation
            if cond3:
                return True
    return False

Answer: In general, you should avoid nested if statements like that. The code should be rewritten like this:

def check_xyz():
    if not cond1:
        return False
    ... # some additional computation
    if not cond2:
        return False
    ... # some additional computation
    if not cond3:
        return False
    return True

Or if you're really just checking those three conditions, with no additional computation:

def check_xyz():
    return cond1 and cond2 and cond3

More examples:

# Rule: For a long list of arguments, one argument per line for better readability.
some_function(
    first_arg, second_arg,
    third_arg,
    fourth_arg
)
# becomes
some_function(
    first_arg,
    second_arg,
    third_arg,
    fourth_arg
)

# Rule: Avoid backslashes to break lines.
def example_of_a_long_fn(count: int, another_arg: float) -> \
    str:
    ...

# becomes
def example_of_a_long_fn(
    count: int,
    another_arg: float
) -> str:
    ...

Some other things I've noticed:

Imports should be sorted so you can easily identify whether something has been imported. In my team at Two Sigma, we split imports into three groups: built-in modules, third-party modules, and our own modules. Within each group, we sort the imports by the module name (not by "from" or "import").
You typically don't use assert unless it's in a testing framework like pytest. Usually there's a more descriptive exception or error that you can raise, e.g., a ValueError for a bad argument. (There are definitely some valid use cases for assert, though I don't think example #1 is a very good one here: https://stackoverflow.com/a/18980471/657200)
Static methods are typically not used in Python. You can just move the function out of the class and make it a module-level function.
A personal preference of mine is to use named arguments for better readability.

2. Type checking in Python: mypy

The typing module includes classes like List, Dict, etc. You can use these to define the types that your function expects and returns.

def add_numbers(a, b):
    return a + b

# becomes

def add_numbers(a: float, b: float) -> float:
    return a + b

There are tons of advantages. When you run mypy on your module (download it via pip), you'll easily catch a lot of errors. The type annotations also make it obvious to other developers how to use your function.

If you want to type check a dictionary, you have a few different options:

	TypedDict	Dataclass	NamedTuple
Mutable	Yes	Yes	No
JSON serializable	Yes	No	Yes
Allows default parameters	No	Yes	Yes

3. Docstrings and comments: Follow a convention

At Two Sigma, I'm using the NumPy docstring guide. It doesn't really matter which one you use as long as you stick to it. PyCharm, the IDE that I use, automatically generates the docstring templates in the correct format.

Not all docstrings are created equal. If a function is user-facing (developer-facing), then it should probably have a detailed docstring complete with parameter descriptions and examples. If it's a helper function or a function that external users won't see or use, then you don't necessarily need to describe the parameters or give examples.

Once you've written your docstrings, you can easily generate Sphinx documentation.

On other types of comments: Comments are typically used if you're doing something unintuitive, to explain the reasoning behind why you did it that way. For example, there might be some error that you want to suppress. You should comment and explain why you're justified in suppressing it, and at what point it should be fixed.

Typically, triple quote comments are only used for docstrings. Even if a comment takes a couple of lines, if it's in the middle of a function, our coding convention is to use a hash symbol.

4. Testing: pytest

This is something that a lot of students are exposed to in school, but I figured I would include it here for completeness just in case. You can read all about using pytest.

5. Printing output: the logging module

Instead of using print statements, in production you'll typically use the logging module.

print('Example logging message')

# becomes

logger = logging.getLogger(__name__)
logger.debug('Example logging message')

You would set the output level appropriately (debug, info, etc.). It's best practice to get the logger as shown above rather than calling logging.debug directly.

This way, users can easily view the output if they want with something like logging.basicConfig(level=logging.INFO).

Conclusion

While this isn't a complete list, I hope that this is helpful to some people!