Winds of Change

I’ve left H2O.  I wish them all the best.  I’ve left a longer farewell here.

I’m a loose cannon, at least for a few months, and am looking for (the right kind of) trouble.

So I’m here, posting on my blog again, to see if I can get some suggestions on what to do with my life.  :-)

Here are a few of my crazy Cliff ideas I’m sorta chasing around:

  • Python Go Fast: Do Unto Python as Thou hast Done Unto Java. I hack the guts of Python; add a high power JIT, a full-fledged low-pause GC, true multi-threading support, i.e. make Python as fast and as parallel as Java (about the same speed as C).  This blog is really a request for an open discussion on this topic.  Is the Python community interested?  How does this get funded?  (uber Kickstarter?)  I’ll only go here with the full support of the core Python committers, and general “feel goods” from the general python community – and I’m hoping to start a discussion.  At this point I’m a premier language implementer, and making Python Go Fast is well within my abilities and past experiences. Take about 2 years & $2M for this effort to be self-sustaining (build all the core new tech and hand it off to other contributors).
  • H2O2: I left a lot of unfinished technical work at H2O – and H2O has plenty of technical growing room.  I could continue to contribute to the Open Source side of H2O, with some Big Company footing the dev bill.  Big Company gets kudos for supporting Open Source, H2O gets the next generation of cool new features.
    • Plan B, Big Company funds some new core closed-source innovations to H2O and monetizes that.  H2O still gets some Open Source improvements but not all core tech work is open.
  • Teach: I bow out of the Great Rat Race for a year and go teach Virtual Machines at Stanford or Berkley.  Fun, and makes for a nice sabbatical.  (as a bonus, I’ll probably have 3 kids in college next year, and the whole Stanford Pays Faculty Kids’ College thing sounds really good).  Might could do this while hacking Python guts at the same time.
  • Jetsons: I know how to Do the Flying Car Thing Right.  Million people, million flying cars in the air, all at once, and nobody can crash anything.  Feels like you’re flying but the Autopilot-o-Doom holds it all together.  Got figured out how to handle bad weather, ground infrastructure lossage (e.g. the Big Quake wipes out ground support in 10sec, how do you land 1million cars all at once?), integration into existing transportation fabric, your driveway as a runway, playing nice with the big jets, etc.  Bold statement, needs bold proof.  Lots more stuff here, been thinking on this one for a decade.  Needs 3 or 4 years to put together the eye-popping this-might-actually-work prototype.  Thus needs bigger funding; probably in the $10M range to get serious.
  • Something Random: by which I mean pretty much anything else that’ll pay the bills and is fun to do.  I got some pretty darned broad skillz and interests…

Gonna be a long-needed Summer-o-Crazy-Fun for me.  Heck, maybe I’ll post my 2nd tweet (shudder).  :-)

Cliff

 

32 thoughts on “Winds of Change

  1. Regarding the Python thing, that’s clearly possible in light of the enormous speedups JavaScript has been seen. I always wonder why Python, Ruby and PHP are not pursuing similar approaches. A basic implementation of runtime specialization seems to be not out of reach.

    Here’s a request: Do to .NET what you did to Java :) They have no clue. Look at the coreclr repo. I wonder why they are not at least introducing multi-tier JIT. They always lament the fact that startup time is slow. So add an interpreted tier one and a high quality tier two.

    Their optimizer seems to be tree-based internally which causes innocuous code changes to produce different code because it breaks pattern matching. Introducing a temp variable can actually change code gen meaningfully.

    • Wow, painful. Yeah, I didn’t do trees for the Java-to-machine-code, when straight to a low-level IR (that was easy enough to find the loops in, then do the loop opts).

      Cliff

  2. While Python could certainly use the help, Google tried that already with unladen swallow, and was never able to get the obvious changes accepted. Pythonistas have essentially worked around that with Cython and PyPy, but I never understood why they had to.

      • You might want to look at the .NET on LLVM work called LILC (I believe). They build a .NET JIT on LLVM right now. GC, EH and low throughput seem to be bugging them.

        Andy Ayers did a talk.

        • Interesting. Common themes here are: using LLVM, using tracing JIT instead of HotSpot’s method-style, language doesn’t have strong types, and having trouble with e.g. GC & EH and not getting desired performance levels.
          – Using LLVM, seems like as a compiler it “ought” to be strong enough
          – No strong types – defeats a lot of common compiler optimizations (your alias analysis goes to sh*t in a hurry, HotSpot only ever did basic type-equivalence classes and that got 99% of any uber-alias-analysis ever would. And… HotSpot added on various kinds of type-specific further analysis; not sure if LLVM has e.g. Class Hierarchy Analysis – or if it matters in an untyped world. Failing good basic pointer analysis immediately fails out tons of follow-on optimizations
          – No strong types – requires gobs of type-checks. These need to be baked into the compiler in a truely fundamental way. You need naively lots for correctness; the compiler must totally integrate these into all it’s other optimizations directly – to have a chance at removing enough that you get some “working room” for the optimizer to make progress.
          – GC: Must have it baked into the compiler all the way through. Especially derived pointers (for good array loop performance) have to be baked into the compiler early on. I suspect there’s a bad division of labor here between LLVM and the GC runtime.
          – EH: HotSpot did this custom all the way, and with enough runtime-grief managed to not have it impact JIT’d code quality (unless you actually took the Exception). Again, I suspect a bad division of labor.

          Definitely I think I can do better here!
          :-)
          Cliff

  3. Been using Python as my primary language since around 2001. Production Python code uses A LOT of extensions written in C. General consensus seems to be that C extension API would prevent any meaningful speedups unless the API is dropped and all of the C code is rewritten. Which people seem to believe is not doable. (You can look into how PyPy tries to cope with this).

    Also, from my experience with CPython core devs I think you’ll have a lot of idiotic pushback to your proposals no matter how reasonable.

    So while I think you can definitely do it, but I would not count on “general feel goods”, and it’s probably not worth it in that respect.

    • Java went through that at some point – getting the C-vs-Java GC thing figured out required some painful C-wrapper rewriting, and *some* C code had to be redone (but definitely not all) – if you didn’t “play nice” with GC pointers you lost out when Java moved to a better/faster/lower-latency GC.
      Cliff

      • In Python land the incentives seem to work the other way here: If your implementation doesn’t play nice with the existing C API, you won’t get anybody to use it, because everybody’s using a boatload of them.

        • Which brings us to the question of making forward progress in the land of single-threaded Python.
          Java answered this question something like 10 or 15 years ago.
          Suppose 90% of C packages are not multi-thread-safe, and there’s no way to tell if they are or not up front.
          Suppose there exists a true multi-threaded Python, which happily wants to call native C code in parallel.
          Some packages crash (rarely), some are fine.
          How do you move forward?
          Suppose you could declare, on a package-by-package basis, whether or not that package was thread-safe – perhaps on the command line. Then those that were not thread-safe would take the equivalent of the Global Interpreter Lock (one package at a time, not even 2 unrelated packages), and the safe ones could charge ahead in parallel. Over time thread-safe versions of packages might appear, and then be allowed to run in parallel.

          Cliff

        • There’s the other question then, of having implementation details totally locked down to support the existing C API.
          Java pays a JNI cost: pointers are marshaled, other values expanded to their default primitive values. No objects are passed by structure, and the native code cannot manipulate structures directly, but only through call-backs. Painful – except that nearly all such cases don’t need a high-performance back-n-forth between Java and C – so the clunky indirect calls aren’t a speed issue. Arrays of primitives (not pointers) are special cased for various kinds of bulk operations. Pointers have to obey the rules every time, so that GC implementations can be flipped out from under you.
          Cliff

          • .NET has picked a nice trade-off here. Almost everything is marshaled without any expensive conversion. Primitives are passed as primitives. Value types are passed as pointers. References are passed as the underlying pointer and the object is pinned for the duration.

            Interestingly, strings are also passed by ref. A .NET string is an intrinsic type with a length prefix and a null-terminating char (that char is not observable from managed code).

            You can pass a StringBuilder as a mutable string. Not sure how efficient that is.

            Arrays of value types can also be passed by ref without marshalling work.

            This design was chosen for easy interop with the Win32 API and COM. Works extremely well in those spots.

            OS handles also have a nice marshalling mechanism (Google for SafeHandle; solves finalization problems for handles).

            Mentioning this because a lot of it might apply to Python native bindings.

          • I should mention that I’m an experienced .NET guy. I love .NET. I’m just very envious of the Hotspot JIT :)

            Just to point out how bad the .NET JIT is: When you say a.x + a.x that actually loads x twice. They need your help, Cliff :)

  4. The core Python people are quite attached to their existing approach, but the PyPy people would probably welcome some help

    • I suspect it’s going to have trouble getting traction – even if it gets all the performance of the original compiler. Rewrites that generally don’t add some significant new value generally have problems. I understand that it’s allowing good cross-language performance, so maybe something comes of it yet.
      Cliff

    • Been there, Done That with Azul Systems. Twas a sweet ISA to be sure, fun and easy to JIT to, lots of small shortcuts important to compiler folks…
      Needs a real business model in the X86 Era.
      Cliff

  5. Have you taken a look at PyPy? They are by far the fastest Python JIT available today, and the concept of a meta-interpreter JIT is something I have always found fascinating.

    • I’ll look again. Last I looked (some time ago) they were far far off the HotSpot mark.
      I stared at meta-interpreter JITs for some time, they look really cool in theory.
      Alas, they don’t come close in performance.
      Cliff

  6. I don’t have a lot of faith in a fast python as long as they cling to their C extensions and reference counting. The most productive efforts have had to abandon both, and as a result nobody uses them. The Ruby world has C ext issues as well, but we’ve managed to build enough of a community around JRuby to get most of them replaced.

    What I really want is a JVM that can do the dynamic language optimizations we’ve wanted for years. We need partial escape analysis, better method specialization (especially in light of closures), more flexible method sizing/loading/lifecycle, and a better optimization curve for invokedynamic and method handles. We already beat CRuby (a bytecode interpreter) but we should be 50x faster, not the 3-5x we can boast today.

    • Re:Python & ref-counting – I think I get what is useful out of ref-counting (exact short lifetime management and thus exact destructor execution) and keep the speed – needs compiler hacks to watch and mimic the ref-counting lifetimes, seems doable.

      Re: JVM hacks: I did some of this at Azul, now caught behind their paywall. I could do it again.
      Really needs a biz model, or at least some reasonable payout for me.
      i.e., I can do these things, but who will pay?
      Oracle? Kickstarter from the Ruby community?

      Cliff

  7. How about with Julia lang ? [1]

    Even though Python is the lead running for scientific computation but Julia support for parallel operations and better typed system should get some attention for next generation language for ML or scientific computing.

    Best of luck after H2O.

    [1] http://julialang.org

  8. Cliff, Been enjoying your posts for years.

    I’ve recently been looking at the “modern language” space, with “go”, “rust”, “D”, etc. My bias is that compilers are good at checking types and both rust and go have shown that you can have an expressive and small language do a lot. You should check out what is happening there.

    With new languages, one also needs better database technology. One that strikes a resonance with me is “cockroachDB”. Highly distributed, reliable. Someday might have SQL (not that I care about that).

    There is much for someone with your skills to dive into.

    • I’ll probably talk to the “go” and “rust” folks soon.
      Never heard of cockroachDB, I’ll check it out
      Thanks
      Cliff

Leave a Reply

Your email address will not be published. Required fields are marked *