Introduction
This issue is reserved for substantive work on PEP 526, "Syntax for Variable and Attribute Annotations". For textual nits please comment directly on the latest PR for this PEP in the peps repo.
I sent a strawman proposal to python-ideas. The feedback was mixed but useful -- people tried to poke holes in it from many angles.
In this issue I want to arrive at a more solid specification. I'm out of time right now, but here are some notes:
- Class variables vs. instance variables
- Specify instance variables in class body vs. in
__init__
or __new__
- Thinking with your runtime hat on vs. your type checking hat
- Importance of
a: <type>
vs. how it strikes people the wrong way
- Tuple unpacking is a mess, let's avoid it entirely
- Collecting the types in something similar to
__annotations__
- Cost of doing that for locals
- Cost of introducing a new keywords
Work in progress here!
I'm updating the issue description to avoid spamming subscribers to this tracker. I'll keep doing this until we have reasonable discussion.
Basic proposal
My basic observation is that introducing a new keyword has two downsides: (a) choice of a good keyword is hard (e.g. it can't be 'var' because that is way too common a variable name, and it can't be 'local' if we want to use it for class variables or globals,) and (b) no matter what we choose, we'll still need a __future__
import.
So I'm proposing something keyword-free:
a: List[int] = []
b: Optional[int] = None
The idea is that this is pretty easy to explain to someone who's already familiar with function annotations.
Multiple types/variables
An obvious question is whether to allow combining type declarations with tuple unpacking (e.g. a, b, c = x
). This leads to (real or perceived) ambiguity, and I propose not to support this. If there's a type annotation there can only be one variable to its left, and one value to its right. This still allows tuple packing (just put the tuple in parentheses) but it disallows tuple unpacking. (It's been proposed to allow multiple parenthesized variable names, or types inside parentheses, but none of these look attractive to me.)
There's a similar question about what to about the type of a = b = c = x
. My answer to this is the same: Let's not go there; if you want to add a type you have to split it up.
Omitting the initial value
My next step is to observe that sometimes it's convenient to decouple the type declaration from the initialization. One example is a variable that is initialized in each branch of a big sequence of if
/elif
/etc. blocks, where you want to declare its type before entering the first if
, and there's no convenient initial value (e.g. None
is not valid because the type is not Optional[...]
). So I propose to allow leaving out the assignment:
log: Logger
if develop_mode():
log = heavy_logger()
elif production_mode():
log = fatal_only_logger()
else:
log = default_logger()
log.info("Server starting up...")
The line log: Logger
looks a little odd at first but I believe you can get used to it easily. Also, it is again similar to what you can do in function annotations. (However, don't hyper-generalize. A line containing just log
by itself means something different -- it's probably a NameError
.)
Note that this is something that you currently can't do with # type
comments -- you currently have to put the type on the (lexically) first assignment, like this:
if develop_mode():
log = heavy_logger() # type: Logger
elif production_mode():
log = fatal_only_logger() # (No type declaration here!)
# etc.
(In this particular example, a type declaration may be needed because heavy_logger()
returns a subclass of Logger
, while other branches produce different subclasses; in general the type checker shouldn't just compute the common superclass because then a type error would just infer the type object
.)
What about runtime
Suppose we have a: int
-- what should this do at runtime? Is it ignored, or does it initialize a
to None
, or should we perhaps introduce something new like JavaScript's undefined
? I feel quite strongly that it should leave a
uninitialized, just as if the line was not there at all.
Instance variables and class variables
Based on working with mypy since last December I feel strongly that it's very useful to be able to declare the types of instance variables in class bodies. In fact this is one place where I find the value-less notation (a: int
) particularly useful, to declare instance variables that should always be initialized by __init__
(or __new__
), e.g. variables whose type is mutable or cannot be None
.
We still need a way to declare class variables, and here I propose some new syntax, prefixing the type with a class
keyword:
class Starship:
captain: str # instance variable without default
damage: int = 0 # instance variable with default (stored in class)
stats: class Dict[str, int] = {} # class variable with initialization
I do have to admit that this is entirely unproven. PEP 484 and mypy currently don't have a way to distinguish between instance and class variables, and it hasn't been a big problem (though I think I've seen a few mypy bug reports related to mypy's inability to tell the difference).
Capturing the declared types at runtime
For function annotations, the types are captured in the function's __annotations__
object. It would be an obvious extension of this idea to do the same thing for variable declarations. But where exactly would we store this info? A strawman proposal is to introduce __annotations__
dictionaries at various levels. At each level, the types would go into the __annotations__
dict at that same level. Examples:
Global variables
players: Dict[str, Player]
print(__annotations__)
This would print {'players': Dict[str, Player]}
(where the value is the runtime representation of the type Dict[str, Player]
).
Class and instance variables:
class Starship:
# Class variables
hitpoints: class int = 50
stats: class Dict[str, int] = {}
# Instance variables
damage: int = 0
shield: int = 100
captain: str # no initial value
print(Starship.__annotations__)
This would print a dict with five keys, and corresponding values:
{'hitpoints': ClassVar[int], # I'm making this up as a runtime representation of "class int"
'stats': ClassVar[Dict[str, int]],
'damage': int,
'shield': int,
'captain': str
}
Finally, locals. Here I think we should not store the types -- the value of having the annotations available locally is just not enough to offset the cost of creating and populating the dictionary on each function call.
In fact, I don't even think that the type expression should be evaluated during the function execution. So for example:
def side_effect():
print("Hello world")
def foo():
a: side_effect()
a = 12
return a
foo()
should not print anything. (A type checker would also complain that side_effect()
is not a valid type.)
This is inconsistent with the behavior of
def foo(a: side_effect()):
a = 12
return a
which does print something (at function definition time). But there's a limit to how much consistency I am prepared to propose. (OTOH for globals and class/instance variables I think that there would be some cool use cases for having the information available.)
Effect of presence of a: <type>
The presence of a local variable declaration without initialization still has an effect: it ensures that the variable is considered to be a local variable, and it is given a "slot" as if it was assigned to. So, for example:
def foo():
a: int
print(a)
a = 42
foo()
will raise UnboundLocalError
, not NameError
. It's the same as if the code had read
def foo():
if False: a = 0
print(a)
Instance variables inside methods
Mypy currently supports # type
comments on assignments to instance variables (and other things). At least for __init__
(and __new__
, and functions called from either) this seems useful, in case you prefer a style where instance variables are declared in __init__
(etc.) rather than in the class body.
I'd like to support this, at least for cases that obviously refer to instance variables of self
. In this case we should probably not update __annotations__
.
What about global
or nonlocal
?
We should not change global
and nonlocal
. The reason is that those don't declare new variables, they declare that an existing variable is write-accessible in the current scope. Their type belongs in the scope where they are defined.
Redundant declarations
I propose that the Python compiler should ignore duplicate declarations of the same variable in the same scope. It should also not bother to validate the type expression (other than evaluating it when not in a local scope). It's up to the type checker to complain about this. The following nonsensical snippet should be allowed at runtime:
a: 2+2
b: int = 'hello'
if b:
b: str
a: str