Monday, 15 April 2013

PyCon Update: Python-Compatible Syntax for Mypy?

It's already weeks since PyCon! Phew, I've been busy recently. Anyway, I had an eventful trip to PyCon in Santa Clara, California. PyCon is the biggest Python conference with about 2500 delegates from all around the world (though most seemed to come from the US).

I had chats with Guido, Armin Rigo (PyPy) and many others. After the conference, I stayed around for a few days in the San Francisco Bay Area and gave a talk at Google Mountain View, and also visited Dropbox in San Francisco.

One of my main goals for the trip was trying to gauge whether mypy is going in the right direction in the eyes of the Python community. There was a lot of interest in the project, but some important issues were raised that I need to discuss in more detail.

Ability to do compile-time checking of programs even without a new VM was interesting to many. This would benefit projects and organizations with large existing Python code bases. However, these organizations also manage risks carefully. Currently mypy can be used on top of CPython, but the sources must always be translated to Python before execution. Adding the mypy tool chain to the core build process is something most seem to be reluctant to do. Obviously this is the case now as mypy is still experimental, but I got the impression that even if mypy would be considered stable and mature, relying on a third-party tool to be able to run their code would be a pretty daring and unlikely move. Also, mypy has the problem of being not-quite compatible with many Python tools such as IDEs. This is a chicken-an-egg problem: tool support probably would fix itself if mypy would be widely used, but it's difficult to get wide use without tool support. Library support is similar. However, there may be a way around this dilemma -- just stay with me for a few more paragraphs.

Many organizations using Python are still stuck with 2.x, and find the transition to Python 3 difficult. Even upgrades from 2.x to 2.x+1 have caused a lot of trouble, and the switch to Python 3 is much trickier, in large part due to changes in string representations (str/unicode in Python 2.x versus bytes/str in Python 3.x). Mypy currently only supports Python 3.x syntax, which limits its usefulness to many.

Some also saw the challenge of developing a production-quality mypy VM to be too large for our team. I think this is to a large part down to how previous projects have succeeded (or not), including PyPy: even after many years, and with several talented developers, still their adoption has been pretty slow in the Python community. Unladen Swallow is another example that showed that speeding up Python is not easy. Of course, mypy has goals different from PyPy and other previous projects, and our approach of targeting ahead-of-time compilation slashes development efforts by a large factor. But I agree that I won't be able to it alone, and getting funding for continued development is hard.

Based on suggestions from Guido and the above observations, I've worked now for some time on a pretty big proposal that would help address all of the above issues in some form or another. This is still in a planning stage, and no concrete plans are yet finalized. However, here are the main points:

  1. For mypy to really take off, we need users. In order to realistically get users, there needs to be a low-risk way of adopting mypy incrementally in current projects implemented in Python.
  2. There is a good amount of interest in optional typing in the Python community, but the approach should be non-invasive to current development processes, tool chains, etc.
  3. The pragmatic way to resolve the two above issues is to make mypy syntax 100% compatible with Python, both Python 2.x and 3.x. There would be no need for a Python translation phase, and a normal Python interpreter could be used to run mypy programs directly. Also all Python tools would pretty much Just Work. Note that as this would be a syntactic change, it would have no significant impact on planned efficiency of the new VM compared to the current syntax and plans, though this would likely result in semantic changes as well (see below for more about these). Also, mypy already supports translation to Python. This would just remove the need for the translation step.
  4. We should first focus most resources on the optional typing part instead of the the new VM and compiler in order to make mypy usable as a static type checker for CPython (and PyPy/Jython).
  5. Now mypy would be much easier to adopt in organizations that would like to use optional typing to get better maintainability and productivity. I think that the above changes could speed up the adoption of mypy a lot. Also, the type checker part of mypy is a fairly straightforward project form an engineering point of view and there is no need for a large team of developers.
  6. If mypy gets significant adoption, there would also be demand for the new VM and the compiler, and it would be easier (but still not exactly easy!) to get contributors, maybe even development funding, etc.

The above plan would imply redesigning the type annotation syntax of mypy. I've given it a lot of thought, and perhaps surprisingly, it seems that there would not be need for many compromises. Generally readability would be similar to the current syntax, and sometimes it would be even better. I'm not going to cover this in detail now, but the main difference would be the introduction of Python 3 style annotation syntax (obviously for Python 3.x only; Python 2.x needs a different approach):

  NOW:
    str greeting(str name):
        return 'hello, ' + name
  NEW PROPOSAL:
    def greeting(name:str) -> str:
        return 'hello, ' + name

Mypy uses nominal subtyping, even though structural subtyping would help model 'duck typing' in Python. Many people have expressed their interest in structural subtyping, and I discussed this at PyCon as well. Earlier, I thought that this couldn't be implemented efficiently on platforms that I would eventually like to be able to support, including Dalvik (Android). However, now I think I've figured out how to have efficient structural subtyping on basically any VM than could realistically run mypy, so the main objection is thrown out. Also, with the proposed Python-compatible syntax, structural subtyping could be a win for various reasons. In summary, it now seems likely that mypy will get support for structural subtyping in addition to nominal subtyping. I've started to prepare an enhancement proposal.

There are other, less major changes that Python compatibility would require. Mypy should support multiple inheritance without the current limitations, similar to Python. Again, I previously ruled this out due to efficiency concerns, but I think I was wrong and there is really no technical reason why multiple inheritance needs to be restricted to interfaces like it is now. Also, mypy needs to support metaclasses; this one trickier but I'm optimistic about it as well.

Let me know if you have any opinions on the proposed changes. Write comments below or send me en email.

13 comments:

  1. I definitely agree: if you can retain syntax compatibility with Python that's very, very, huge win. But are you sure Python 3's hinting system provides enough flexibility to annotate all inputs and outputs and there is no conflict with existing code already using these features?

    Though I am pretty sure that there are very little user for this feature as it is new in Python 3k. The only library I know utilizing this is plac.

    ReplyDelete
  2. Python 3's annotation syntax seems to have just enough flexibility when combined with other Python features such as function decorators and operator overloading. I'll write another blog post with more details about the syntax.

    The annotations would only have special meaning in Python files that explicitly use static typing. Other files can use the annotations for their own purposes. Also, there could be an escape that lets you mix different kinds of annotations within a file. This could be a function decorator, for example. Of course, this all would only work if the other library does not assume that nobody else uses function annotations.

    ReplyDelete
  3. I like the new ideas that focus on compatibility and static type checking. I found mypy in my search for static type checking tools. I would really like to see an option for const function arguments. Is this a realistic option?

    ReplyDelete
  4. It depends on what kind of const arguments you have in mind. An option to check that functions don't assign to arguments would be easy to add. Checking that functions don't modify argument objects is tricky; using an abstract type such as Sequence that does not allow direct modification goes a long way but doesn't generalize easily to arbitrary objects.

    ReplyDelete
  5. Hi Jukka,

    This project sounds really really interesting! I was wondering if you are looking for any help, I would love to contribute in any way possible - [I am a developer]. Feel free to contact me on G+.

    Keep on the good work!

    ReplyDelete
  6. So, essentially the plan is to turn mypy into a type-checking library/program, and then build a compiler/translator off that? Neat!

    ReplyDelete
  7. The problem with Python 3's annotation syntax is that it allow everything to be behind that colon i.e. this is allowed:

    def greeting(name:"Your name"):
    return 'hello, ' + name

    So even if this syntax is used compatibility still isn’t satisfying. Personally I don't like things that looks the same when it is partially the same as it creates confusion.

    Also Python doesn’t allow this:
    a:int = 0

    Personally I would like the C style type specification together with Pythons annotation, like this:

    str greeting(str name:"Your name"):
    return 'hello, ' + name

    This allows us to specify the computer type (which is important as other computer types could create a crash) and human type (which is important as other human types wouldn't make any sense).
    This could also preserve Python compatibility as

    def greeting(name:"Your name"):
    return 'hello, ' + name

    would be implicitly dynamically typed. Thus making Python code fully valid Mypy code. One could then start porting a Python application step by step.

    ReplyDelete
  8. Rasmus:

    An important goal of the new syntax is to allow incremental adaptation of Python applications to static typing without a mypy-to-Python translation step. Thus existing Python tools such as IDEs would work. This rules out *any* extra syntax right now.

    I don't think that many programmers use the Python 3 annotations currently. Also, it would be possible to allow the alternative Python 2 annotation syntax (types in comments) to be used even in Python 3 code. This would let you use Python 3 annotations for other purposes.



    ReplyDelete
  9. The problem with any new language intent to be python compatible is that it will eventually be limited to the ultra-dynamic of python.

    You would be able to do "your thing" if you need to be python compatible everywhere. At the end, you just create another half-ass python accelerator like all the other failed attempts which are all killed by "python compatible".

    Python like syntax give you much more freedom to do things beyond Python both feature-wise and performance-wise. Python-compatible syntax is just putting dead weight for short term benefit.

    ReplyDelete
  10. Bob:

    I don't think that Python-compatible syntax really is a significant restriction. The highly dynamic Python semantics, not syntax, make it very hard to run Python code fast. It makes more sense to evolve the semantics separately from the syntax (e.g. by removal of implicit locking).

    The Python syntax is flexible enough to support pretty much all the new mypy features that have been proposed. And the benefits of Python syntax compatibility are huge. Besides, there is still the option of extending the syntax in the future if there is a compelling enough use case. However I can't think of any yet.

    ReplyDelete
  11. I think it very much makes sense what you wrote above - first write the library for static checking which runs on any Python VM and then, later, add the faster, statically-typed, VM.

    I personally also think that Python 2 support is waste of your time but hey, it's your time.

    What the way of adding types goes, if you used comments, how would I annotate lambdas? You might also want to consider this:

    int, int, int
    def adder(a, b):
    return a+b

    PS: there is a mailing is about Python static type checking, maybe you know about it, maybe you don't...:
    https://groups.google.com/forum/?fromgroups#!forum/python-static-type-checking

    ReplyDelete
  12. Tuom:

    Currently you can't directly annotate lambdas, but the type of a lambda can usually be inferred from the context, and when this isn't possible, you can explicitly set the context yourself by using a type declaration, for example like this:

    f = lambda s: 'hi, ' + s # type: Function[[str], str]

    I feel that using just a tuple literal for annotating functions is too "light-weight" syntactically and the return type does not stand out from the argument types.

    And thanks for the link! I didn't know about it.

    ReplyDelete
  13. Jukka, I think Python2 support is unnecessary, as it would waste too much of your time. Even a project as mature as PyPy has been struggling to get adoption, you probably should not expect mypy be vastly adopted in the current Python2 era. Planning for the future and taking this chance to refine your design and implementation may be a wiser choice.

    ReplyDelete