Trying out mypy

This post is just me trying out various little programs in the mypy playground, to understand the semantics of mypy's type system. Mypy is a type checker for Python. The type system is optional and unsound; reminiscent of TypeScript and other JavaScript type systems. I've been interested in JavaScript type checking for a long time (back in the day I worked on DoctorJS at Mozilla, then on Closure Compiler at Google, and gave this talk at StrangeLoop), so I'm excited to learn about mypy and other Python type systems.

The post is not an intro to mypy. For an intro, see this. Here, we'll try mypy on small examples and see what happens. Note that I am a Python novice, so most programs here are stuff I would have tried with a JS type checker, just translated into Python.

Straight-line code

First, let's try some straight-line code inside a function.

def f(z: int):
    x = 1
    y: str = x # warning

When assigning a number to a string, we get the expected warning. But if z is not annotated, there is no warning. Mypy does not typecheck the bodies of unannotated functions, to help with migrating legacy code.

def f(z):
    x = 1
    y: str = x # no warning

When a variable is declared Any, it keeps that type throughout the body of the function, even if we know a better type later. (Expected behavior.)

def f(z: int):
    x: Any = ''
    x = ''
    y: int = x # no warning

But if x has a union type, you can narrow the type later and mypy tracks that.

def f(z: int):
    x: Union[int, str] = 123
    x = 'asdf'
    y: int = x # warning, x is str

An unannotated formal parameter gets type Any throughout the body of the function. In contrast, an unannotated local variable gets the type of the rhs when it is initialized. The variable keeps the initial type throughout its scope, which is counterintuitive for a dynamic language.

a = 123
a = '' # warning

If the type of the rhs can't be inferred, the variable is typed as Any thereafter.

def id(x):
    return x

a = id(123)
a = 123
y: str = a # no warning

The type of a variable is not inferred using the type of a primitive operator.

def f(u, z: int):
    x = u - 3 # x could be inferred float
    y: str = x # no warning

When initializing a declared variable, the type going forward is the declared type, even if the initialization is narrower. IMO, this is the correct behavior.

def f(z: int):
    x: Union[int,str] = ''
    y: str = x # warning

Flow-sensitive typing / Type specialization

Mypy does not support flow-sensitive typing in the style of TypeScript or Closure Compiler. As we saw earlier, a local variable that is not declared gets a type at the first assignment.

def f(x: int, z):
    if z :
        w = 1
    else:
        w = 'asdf' # warning
    x = w # no warning, w is int

But there is some limited flow sensitivity in mypy. The type of a variable can be narrowed using checks, such as isinstance.

def f():
    return 5

def h(x:int):
    w = f()
    if (isinstance(w, int)):
        z: str = w # warning

An equality check does not narrow the type.

def f():
    return 5
    
def g(x: int):
    w = f()
    if (w == 5):
        z: str = w # no warning

When an IF test narrows the type, it does so in the else branch as well. Notice also the cool utility function provided by mypy, called reveal_type, which prints the type of its argument.

def f(x: Union[int,str]):
    if (isinstance(x, int)):
        y = x
    else:
        reveal_type(x) # x is a string
        z: int = x # warning

Commonly in optional type systems, types form a lattice. When narrowing the type of a variable, we are performing a meet operation on types (taking the greatest lower bound). When narrowing a type with an unrelated type, we go to bottom, as expected. Weirdly, when calling reveal_type on bottom, nothing is printed (a bug or intended behavior?). In the next example, the type of w is bottom in the then branch, so after the IF its type is only influenced by the else branch.

def f(x: int):
    if isinstance(x, str):
        w = x
        reveal_type(x) # no print here
    else:
        w = ''
    reveal_type(w) # w is a string

When types from the branches of an IF flow to a variable after the IF, we are performing a join operation on types (taking the least upper bound). We can see the semantics of join in mypy using the ternary operator. Int and str join to object instead of Union[int, str], which can lead to false positives.

def f(x: int, b: bool):
    if b:
        a = 1
    else:
        c = 'asdf'
    w = a if b else c
    z: Union[int,str] = w # warning

Type checking across scopes

Scopes are typechecked from outer to inner. As in the single-scope case, the initial assignment determines the type of a variable.

def id(x):
    return x

def f(x: int):
    global a
    a = id('')

a = 123
f(5)
reveal_type(a) # typed int, true type is str

def g(x: int):
    y: str = a # warning

Function types

Function subtyping is contravariant in the argument types and covariant in the return type, as expected.

def f(x: Union[int, str]) -> int:
    return 123

x: Callable[[int], Union[int, str]] = f # no warning

If we join a Callable[[int], int] and a Callable[[str], int], the argument type becomes bottom, so the result is an uninhabited type (meaning that there are no function values that have this type). In this case, mypy computes the join as builtins.function (which I'm guessing is the top function type).

def f(x: Callable[[int], int], y: Callable[[str], int], z: bool):
    a = x if z else y
    reveal_type(a)  # Revealed type is 'builtins.function'
    b : Union[Callable[[int], int], Callable[[str], int]] = a  # warning

Classes

When a class Bar extends a class Foo, Bar's attributes are not required to be subtypes of the corresponding attributes of Foo. I'm guessing this is intentional, but I found it surprising.

class Foo:
  def __init__(self, x):
    self.a : int = x

class Bar(Foo):
  def __init__(self):
    self.a : str = 'asdf' # no warning

def f(x: Foo):
    y : int = x.a

f(Bar()) # str will flow to y

reveal_type((Bar()).a) # revealed type is str

When narrowing the type of a class Foo with an unrelated class Bar, mypy creates an object type that is a subtype of both Foo and Bar. This is counterintuitive to me. If there isn't a subclass of Foo and Bar already in the program, I would have expected the meet to go to bottom.

class Foo:
  def __init__(self, x):
    self.a = x
      
class Bar:
  def __init__(self):
      return

def f(x: Foo):
    if (isinstance(x, Bar)):
        reveal_type(x) # x has type 'main.<subclass of "Foo" and "Bar">'

When a class inherits from two classes, the attributes that have common names in the superclasses must have equal types.

class Sup1:
  def __init__(self, x):
    self.a : int = x
    
class Sup2:
  def __init__(self, x):
    self.a : str = x

class Sub1(Sup1, Sup2): # warning
  def __init__(self):
    return

If the common attributes in the superclasses are in the subtype relation (e.g., one is int and the other is Union[int,str]) mypy still warns. But if one of the common attributes is Any, then there is no warning. This causes a subtle issue. In the example below, str flows to the int attribute of Sub1.

class Sup1:
  def __init__(self, x):
    self.a : int = x
    
class Sup2:
  def __init__(self, x):
    self.a : Any = x

class Sub1(Sup1, Sup2):
  def __init__(self):
    return
      
class Sub2(Sup2, Sup1):
  def __init__(self):
    return

def f(x: Sup2):
    x.a = 'asdf'

f(Sub1()) # 'asdf' will be assigned to a.

reveal_type(Sub1().a)  # revealed type is int

Union types

When two elements of a union type are the same, mypy does not warn. This is not a correctness issue, but it would still be good to point out the lint warning to the user. Notice that the revealed type is Union[int, int] instead of int. I don't know if these types are equivalent or if they can be distinguished in some convoluted programs. In general, it is a good idea to have a single canonical representation for each type. 

x: Union[int, int] = 1
reveal_type(x) # x is Union[int, int]

Ideally, a union type is the least upper bound of the types that make up the union. This is not the case in mypy; a union that includes Any is not simplified to just Any.

x: Union[int, Any]
y: str = x  # warning

When taking the join of union types, some elements of the unions may be dropped from the result, which leads to false positives. This is likely a bug, not intended behavior. In the next example, Any is dropped.

def f(x: int, b: bool):
    if b:
        a: Union[int, Any] = 1
    else:
        c: Union[int, str] = 'asdf'
    w = a if b else c
    reveal_type(w) # w is Union[int, str]
    b = w # warning, even though w should be Any

In the next example, bool is dropped. The type of w is a union that includes Any, but does not behave the same as Any.

def f(x: int, b: bool):
    if b:
        a: Union[int, Any] = 1
    else:
        c: Union[bool, str] = 'asdf'
    w = a if b else c
    reveal_type(w) # w is Union[int, Any, str]
    b = w # warning

Some more examples to understand how union types are normalized.

# Nested unions are flattened.
x: Union[int, Union[str, bool]] = 1
reveal_type(x) # x is Union[int, str, bool]

y: Union[int, object]
reveal_type(y) # not normalized to object

class Foo:
  def __init__(self, x):
    self.a : int = x

class Bar(Foo):
  def __init__(self):
    self.a : int = 123
    
z: Union[Foo, Bar]
reveal_type(z) # not normalized to Foo

w : Union[Callable[[int], int], Callable[[int], str]]
reveal_type(w) # not normalized to Callable[int, Union[int, str]]

When a union includes two function types, AFAIK there isn't a type test one can do in the program to tease the two functions apart. So, it is probably better to combine them to a single function type, and enforce the invariant (in the implementation, not as a user-visible warning) that every union contains at most one function type.

Generics

Suppose we create a union with two different instantiations Bar[int] and Bar[str] of the same generic class. As in the previous examples, mypy keeps the two types separate in the union. Impressively, you can use isinstance on the type of the attribute to narrow the type to Bar[int].

T = TypeVar('T')

class Bar(Generic[T]):
    def __init__(self, x: T) -> None:
        self.attr = x

y: Union[Bar[int], Bar[str]]

reveal_type(y) # y is Union[Bar[int], Bar[str]]

if isinstance(y.attr, int):
    reveal_type(y) # y is Bar[int]

reveal_type(y.attr) # Union[int, str]

When a type variable is not instantiated, it defaults to Any. When a bounded type variable is not instantiated, it also defaults to Any. That was surprising to me; I had expected that it would default to the upper bound. Not totally sure what the right thing is here though.

T = TypeVar('T', bound=int)

class Bar(Generic[T]):
    def __init__(self, x: T) -> None:
        self.attr = x

x: Bar
reveal_type(x) # x is Bar[Any]

When calling a generic function with arguments of different types, mypy joins the types to find the right instantiation. For example, if two classes inherit from the same superclass, the type variable is instantiated to the superclass.

class Foo:
  def __init__(self, x):
    self.a : int = x

class Bar(Foo):
    def __init__(self, x) -> None:
        self.attr = x
        

class Baz(Foo):
    def __init__(self):
        self.attr = 123

T = TypeVar('T')

def f(x: T, y: T) -> T:
    return x
    
reveal_type(f(Baz(), Bar(2))) # revealed type is Foo

Misc comments

Mypy performs local type inference, e.g., to find the types of unannotated local variables, but it does not do global type inference; it cannot infer a signature for an unannotated function.

When there is a type mismatch, mypy shows the whole types that don't match. Since types can get large, it would be nice to show which parts of the large types cause the mismatch.

x: Callable[[Union[int, str]], Union[int,str]]
y: Callable[[int], int]
y = x # the warning could point out the mismatch in the return types specifically

You can use protocols (and also define your own) to do structural subtyping for classes. I did not try any examples with protocols.

In JS type systems, the types of the properties of an object literal are usually kept separate. In Python, the most similar thing to an object literal is a dictionary literal. Mypy joins the types of the values of a dictionary literal.

x = {
    'a': 1,
    'b': 'asdf',
}

reveal_type(x) # x is dict[str, object]

On the other hand, the types of the elements of a tuple are kept separate. Subtyping is covariant for tuples; makes sense because tuples are immutable.

Phew, this was a long post :) Thanks for making it this far. I enjoyed playing with mypy and I encourage you to try it if you haven't already. If I find time, I'd like to try similar examples on Pyre, which is another type checker for Python, developed by Facebook.

EDIT(2020/09/07): See also the follow-up discussion on the Typing-sig mailing list.


Comments

  1. I usually avoid writing the type declaration in the first assignment as it should be inferred (habit from TypeScript). You only would need it if your variable is union-typed throughout the function. I tend to avoid this with Python as the union syntax isn't that great right now. The future will be TypeScript-like with a pipe operator.

    There's also the cast() function from Mypy. This is usually how to resolve issues with union types when you try to pass those values to a function that only accepts one of the types in the union.

    The other way to do this is to add assert statements (which can be removed in optimised code to remove the overhead). Instead of using an if statement, you can write `assert x is int` to remove the other types in the union before using the value as the desired type. I like this way as the app will immediately end (raise) if the type is not correct (in not optimised mode).

    There's also the TypedDict class from the type extensions. It allows defining a dict's keys and value types. It assumes string keys. But since this is Python and I prefer dot syntax, I usually use the typed NamedTuple class for random objects I don't want to make a class for.

    ReplyDelete

Post a Comment