This post is just me trying out various little programs in the mypy playground, to understand the semantics of mypy's type system. Mypy is a type checker for Python. The type system is optional and unsound; reminiscent of TypeScript and other JavaScript type systems. I've been interested in JavaScript type checking for a long time (back in the day I worked on DoctorJS at Mozilla, then on Closure Compiler at Google, and gave this talk at StrangeLoop), so I'm excited to learn about mypy and other Python type systems.
The post is not an intro to mypy. For an intro, see this. Here, we'll try mypy on small examples and see what happens. Note that I am a Python novice, so most programs here are stuff I would have tried with a JS type checker, just translated into Python.
Straight-line code
First, let's try some straight-line code inside a function.
def f(z: int):
x = 1
y: str = x # warning
When assigning a number to a string, we get the expected warning. But if z is not annotated, there is no warning. Mypy does not typecheck the bodies of unannotated functions, to help with migrating legacy code.
def f(z):
x = 1
y: str = x # no warning
When a variable is declared Any, it keeps that type throughout the body of the function, even if we know a better type later. (Expected behavior.)
def f(z: int):
x: Any = ''
x = ''
y: int = x # no warning
But if x has a union type, you can narrow the type later and mypy tracks that.
def f(z: int):
x: Union[int, str] = 123
x = 'asdf'
y: int = x # warning, x is str
An unannotated formal parameter gets type Any throughout the body of the function. In contrast, an unannotated local variable gets the type of the rhs when it is initialized. The variable keeps the initial type throughout its scope, which is counterintuitive for a dynamic language.
If the type of the rhs can't be inferred, the variable is typed as Any thereafter.
The type of a variable is not inferred using the type of a primitive operator.
When initializing a declared variable, the type going forward is the declared type, even if the initialization is narrower. IMO, this is the correct behavior.
Flow-sensitive typing / Type specialization
Mypy does not support flow-sensitive typing in the style of TypeScript or Closure Compiler. As we saw earlier, a local variable that is not declared gets a type at the first assignment.
But there is some limited flow sensitivity in mypy. The type of a variable can be narrowed using checks, such as isinstance.
An equality check does not narrow the type.
When an IF test narrows the type, it does so in the else branch as well. Notice also the cool utility function provided by mypy, called reveal_type, which prints the type of its argument.
Commonly in optional type systems, types form a lattice. When narrowing the type of a variable, we are performing a meet operation on types (taking the greatest lower bound). When narrowing a type with an unrelated type, we go to bottom, as expected. Weirdly, when calling reveal_type on bottom, nothing is printed (a bug or intended behavior?). In the next example, the type of w is bottom in the then branch, so after the IF its type is only influenced by the else branch.
When types from the branches of an IF flow to a variable after the IF, we are performing a join operation on types (taking the least upper bound). We can see the semantics of join in mypy using the ternary operator. Int and str join to object instead of Union[int, str], which can lead to false positives.
Type checking across scopes
Scopes are typechecked from outer to inner. As in the single-scope case, the initial assignment determines the type of a variable.
Function types
Function subtyping is contravariant in the argument types and covariant in the return type, as expected.
If we join a Callable[[int], int] and a Callable[[str], int], the argument type becomes bottom, so the result is an uninhabited type (meaning that there are no function values that have this type). In this case, mypy computes the join as builtins.function (which I'm guessing is the top function type).
Classes
When a class Bar extends a class Foo, Bar's attributes are not required to be subtypes of the corresponding attributes of Foo. I'm guessing this is intentional, but I found it surprising.
When narrowing the type of a class Foo with an unrelated class Bar, mypy creates an object type that is a subtype of both Foo and Bar. This is counterintuitive to me. If there isn't a subclass of Foo and Bar already in the program, I would have expected the meet to go to bottom.
When a class inherits from two classes, the attributes that have common names in the superclasses must have equal types.
If the common attributes in the superclasses are in the subtype relation (e.g., one is int and the other is Union[int,str]) mypy still warns. But if one of the common attributes is Any, then there is no warning. This causes a subtle issue. In the example below, str flows to the int attribute of Sub1.
Union types
When two elements of a union type are the same, mypy does not warn. This is not a correctness issue, but it would still be good to point out the lint warning to the user. Notice that the revealed type is Union[int, int] instead of int. I don't know if these types are equivalent or if they can be distinguished in some convoluted programs. In general, it is a good idea to have a single canonical representation for each type.
Ideally, a union type is the least upper bound of the types that make up the union. This is not the case in mypy; a union that includes Any is not simplified to just Any.
When taking the join of union types, some elements of the unions may be dropped from the result, which leads to false positives. This is likely a bug, not intended behavior. In the next example, Any is dropped.
In the next example, bool is dropped. The type of w is a union that includes Any, but does not behave the same as Any.
Some more examples to understand how union types are normalized.
When a union includes two function types, AFAIK there isn't a type test one can do in the program to tease the two functions apart. So, it is probably better to combine them to a single function type, and enforce the invariant (in the implementation, not as a user-visible warning) that every union contains at most one function type.
Generics
Suppose we create a union with two different instantiations Bar[int] and Bar[str] of the same generic class. As in the previous examples, mypy keeps the two types separate in the union. Impressively, you can use isinstance on the type of the attribute to narrow the type to Bar[int].
When a type variable is not instantiated, it defaults to Any. When a bounded type variable is not instantiated, it also defaults to Any. That was surprising to me; I had expected that it would default to the upper bound. Not totally sure what the right thing is here though.
When calling a generic function with arguments of different types, mypy joins the types to find the right instantiation. For example, if two classes inherit from the same superclass, the type variable is instantiated to the superclass.
Misc comments
When there is a type mismatch, mypy shows the whole types that don't match. Since types can get large, it would be nice to show which parts of the large types cause the mismatch.
You can use protocols (and also define your own) to do structural subtyping for classes. I did not try any examples with protocols.
In JS type systems, the types of the properties of an object literal are usually kept separate. In Python, the most similar thing to an object literal is a dictionary literal. Mypy joins the types of the values of a dictionary literal.
On the other hand, the types of the elements of a tuple are kept separate. Subtyping is covariant for tuples; makes sense because tuples are immutable.
Phew, this was a long post :) Thanks for making it this far. I enjoyed playing with mypy and I encourage you to try it if you haven't already. If I find time, I'd like to try similar examples on Pyre, which is another type checker for Python, developed by Facebook.
EDIT(2020/09/07): See also the follow-up discussion on the Typing-sig mailing list.
I usually avoid writing the type declaration in the first assignment as it should be inferred (habit from TypeScript). You only would need it if your variable is union-typed throughout the function. I tend to avoid this with Python as the union syntax isn't that great right now. The future will be TypeScript-like with a pipe operator.
ReplyDeleteThere's also the cast() function from Mypy. This is usually how to resolve issues with union types when you try to pass those values to a function that only accepts one of the types in the union.
The other way to do this is to add assert statements (which can be removed in optimised code to remove the overhead). Instead of using an if statement, you can write `assert x is int` to remove the other types in the union before using the value as the desired type. I like this way as the app will immediately end (raise) if the type is not correct (in not optimised mode).
There's also the TypedDict class from the type extensions. It allows defining a dict's keys and value types. It assumes string keys. But since this is Python and I prefer dot syntax, I usually use the typed NamedTuple class for random objects I don't want to make a class for.
These are great tips. Thanks!
Delete