Winds of Change

Posted on February 19, 2016 by cliffc

I’ve left H2O. I wish them all the best. I’ve left a longer farewell here.

I’m a loose cannon, at least for a few months, and am looking for (the right kind of) trouble.

So I’m here, posting on my blog again, to see if I can get some suggestions on what to do with my life.

Here are a few of my crazy Cliff ideas I’m sorta chasing around:

Python Go Fast: Do Unto Python as Thou hast Done Unto Java. I hack the guts of Python; add a high power JIT, a full-fledged low-pause GC, true multi-threading support, i.e. make Python as fast and as parallel as Java (about the same speed as C). This blog is really a request for an open discussion on this topic. Is the Python community interested? How does this get funded? (uber Kickstarter?) I’ll only go here with the full support of the core Python committers, and general “feel goods” from the general python community – and I’m hoping to start a discussion. At this point I’m a premier language implementer, and making Python Go Fast is well within my abilities and past experiences. Take about 2 years & $2M for this effort to be self-sustaining (build all the core new tech and hand it off to other contributors).
H2O2: I left a lot of unfinished technical work at H2O – and H2O has plenty of technical growing room. I could continue to contribute to the Open Source side of H2O, with some Big Company footing the dev bill. Big Company gets kudos for supporting Open Source, H2O gets the next generation of cool new features.
- Plan B, Big Company funds some new core closed-source innovations to H2O and monetizes that. H2O still gets some Open Source improvements but not all core tech work is open.
Teach: I bow out of the Great Rat Race for a year and go teach Virtual Machines at Stanford or Berkley. Fun, and makes for a nice sabbatical. (as a bonus, I’ll probably have 3 kids in college next year, and the whole Stanford Pays Faculty Kids’ College thing sounds really good). Might could do this while hacking Python guts at the same time.
Jetsons: I know how to Do the Flying Car Thing Right. Million people, million flying cars in the air, all at once, and nobody can crash anything. Feels like you’re flying but the Autopilot-o-Doom holds it all together. Got figured out how to handle bad weather, ground infrastructure lossage (e.g. the Big Quake wipes out ground support in 10sec, how do you land 1million cars all at once?), integration into existing transportation fabric, your driveway as a runway, playing nice with the big jets, etc. Bold statement, needs bold proof. Lots more stuff here, been thinking on this one for a decade. Needs 3 or 4 years to put together the eye-popping this-might-actually-work prototype. Thus needs bigger funding; probably in the $10M range to get serious.
Something Random: by which I mean pretty much anything else that’ll pay the bills and is fun to do. I got some pretty darned broad skillz and interests…

Gonna be a long-needed Summer-o-Crazy-Fun for me. Heck, maybe I’ll post my 2nd tweet (shudder).

Cliff

32 thoughts on “Winds of Change”

tobi on February 20, 2016 at 4:28 am said:

Regarding the Python thing, that’s clearly possible in light of the enormous speedups JavaScript has been seen. I always wonder why Python, Ruby and PHP are not pursuing similar approaches. A basic implementation of runtime specialization seems to be not out of reach.

Here’s a request: Do to .NET what you did to Java They have no clue. Look at the coreclr repo. I wonder why they are not at least introducing multi-tier JIT. They always lament the fact that startup time is slow. So add an interpreted tier one and a high quality tier two.

Their optimizer seems to be tree-based internally which causes innocuous code changes to produce different code because it breaks pattern matching. Introducing a temp variable can actually change code gen meaningfully.

Reply ↓
- cliffc on February 20, 2016 at 1:19 pm said:
  
  Wow, painful. Yeah, I didn’t do trees for the Java-to-machine-code, when straight to a low-level IR (that was easy enough to find the loops in, then do the loop opts).
  
  Cliff
  
  Reply ↓
Max Lybbert on February 20, 2016 at 4:51 am said:

While Python could certainly use the help, Google tried that already with unladen swallow, and was never able to get the obvious changes accepted. Pythonistas have essentially worked around that with Cython and PyPy, but I never understood why they had to.

Reply ↓
- cliffc on February 20, 2016 at 1:18 pm said:
  
  Comments on
  https://en.wikipedia.org/wiki/Unladen_Swallow
  say they didn’t hit their performance mark (5x speedup over CPython).
  Now I’m more curious than ever, given that high performance language implementations are my forte.
  Seems like LLVM would be up to the task, I wonder what the bottleneck was?
  
  Cliff
  
  Reply ↓
  - Petter on February 20, 2016 at 5:08 pm said:
    
    Dropbox is doing the same thing (Python + LLVM) in Pyston: https://github.com/dropbox/pyston
    
    Perhaps they will be more successful than Unladed Swallow.
    
    Reply ↓
  - tobi on February 21, 2016 at 4:13 am said:
    
    You might want to look at the .NET on LLVM work called LILC (I believe). They build a .NET JIT on LLVM right now. GC, EH and low throughput seem to be bugging them.
    
    Andy Ayers did a talk.
    
    Reply ↓
    - cliffc on February 21, 2016 at 9:11 am said:
      
      Interesting. Common themes here are: using LLVM, using tracing JIT instead of HotSpot’s method-style, language doesn’t have strong types, and having trouble with e.g. GC & EH and not getting desired performance levels.
      – Using LLVM, seems like as a compiler it “ought” to be strong enough
      – No strong types – defeats a lot of common compiler optimizations (your alias analysis goes to sh*t in a hurry, HotSpot only ever did basic type-equivalence classes and that got 99% of any uber-alias-analysis ever would. And… HotSpot added on various kinds of type-specific further analysis; not sure if LLVM has e.g. Class Hierarchy Analysis – or if it matters in an untyped world. Failing good basic pointer analysis immediately fails out tons of follow-on optimizations
      – No strong types – requires gobs of type-checks. These need to be baked into the compiler in a truely fundamental way. You need naively lots for correctness; the compiler must totally integrate these into all it’s other optimizations directly – to have a chance at removing enough that you get some “working room” for the optimizer to make progress.
      – GC: Must have it baked into the compiler all the way through. Especially derived pointers (for good array loop performance) have to be baked into the compiler early on. I suspect there’s a bad division of labor here between LLVM and the GC runtime.
      – EH: HotSpot did this custom all the way, and with enough runtime-grief managed to not have it impact JIT’d code quality (unless you actually took the Exception). Again, I suspect a bad division of labor.
      
      Definitely I think I can do better here!
      
      Cliff
      
      Reply ↓
  - peter royal on February 23, 2016 at 6:20 am said:
    
    I don’t know the exact bottleneck, but Azul has been contributing to LLVM to enable better GC’s. See http://llvm.org/docs/Statepoints.html … Makes me wonder if the unladen swallow approach might go better now that LLVM has matured
    
    Reply ↓
Sergey on February 20, 2016 at 12:18 pm said:

Been using Python as my primary language since around 2001. Production Python code uses A LOT of extensions written in C. General consensus seems to be that C extension API would prevent any meaningful speedups unless the API is dropped and all of the C code is rewritten. Which people seem to believe is not doable. (You can look into how PyPy tries to cope with this).

Also, from my experience with CPython core devs I think you’ll have a lot of idiotic pushback to your proposals no matter how reasonable.

So while I think you can definitely do it, but I would not count on “general feel goods”, and it’s probably not worth it in that respect.

Reply ↓
- cliffc on February 20, 2016 at 1:20 pm said:
  
  Java went through that at some point – getting the C-vs-Java GC thing figured out required some painful C-wrapper rewriting, and *some* C code had to be redone (but definitely not all) – if you didn’t “play nice” with GC pointers you lost out when Java moved to a better/faster/lower-latency GC.
  Cliff
  
  Reply ↓
  - Mark Probst on February 22, 2016 at 10:48 am said:
    
    In Python land the incentives seem to work the other way here: If your implementation doesn’t play nice with the existing C API, you won’t get anybody to use it, because everybody’s using a boatload of them.
    
    Reply ↓
    - cliffc on February 22, 2016 at 12:29 pm said:
      
      Which brings us to the question of making forward progress in the land of single-threaded Python.
      Java answered this question something like 10 or 15 years ago.
      Suppose 90% of C packages are not multi-thread-safe, and there’s no way to tell if they are or not up front.
      Suppose there exists a true multi-threaded Python, which happily wants to call native C code in parallel.
      Some packages crash (rarely), some are fine.
      How do you move forward?
      Suppose you could declare, on a package-by-package basis, whether or not that package was thread-safe – perhaps on the command line. Then those that were not thread-safe would take the equivalent of the Global Interpreter Lock (one package at a time, not even 2 unrelated packages), and the safe ones could charge ahead in parallel. Over time thread-safe versions of packages might appear, and then be allowed to run in parallel.
      
      Cliff
      
      Reply ↓
    - cliffc on February 22, 2016 at 12:33 pm said:
      
      There’s the other question then, of having implementation details totally locked down to support the existing C API.
      Java pays a JNI cost: pointers are marshaled, other values expanded to their default primitive values. No objects are passed by structure, and the native code cannot manipulate structures directly, but only through call-backs. Painful – except that nearly all such cases don’t need a high-performance back-n-forth between Java and C – so the clunky indirect calls aren’t a speed issue. Arrays of primitives (not pointers) are special cased for various kinds of bulk operations. Pointers have to obey the rules every time, so that GC implementations can be flipped out from under you.
      Cliff
      
      Reply ↓
      - tobi on February 23, 2016 at 6:49 am said:
        
        .NET has picked a nice trade-off here. Almost everything is marshaled without any expensive conversion. Primitives are passed as primitives. Value types are passed as pointers. References are passed as the underlying pointer and the object is pinned for the duration.
        
        Interestingly, strings are also passed by ref. A .NET string is an intrinsic type with a length prefix and a null-terminating char (that char is not observable from managed code).
        
        You can pass a StringBuilder as a mutable string. Not sure how efficient that is.
        
        Arrays of value types can also be passed by ref without marshalling work.
        
        This design was chosen for easy interop with the Win32 API and COM. Works extremely well in those spots.
        
        OS handles also have a nice marshalling mechanism (Google for SafeHandle; solves finalization problems for handles).
        
        Mentioning this because a lot of it might apply to Python native bindings.
      - tobi on February 23, 2016 at 6:51 am said:
        
        I should mention that I’m an experienced .NET guy. I love .NET. I’m just very envious of the Hotspot JIT
        
        Just to point out how bad the .NET JIT is: When you say a.x + a.x that actually loads x twice. They need your help, Cliff
      - cliffc on February 23, 2016 at 8:07 am said:
        
        Would love any contacts? I have my own, but they’re fairly stale by now.
        Cliff
      - Kris Mok on February 23, 2016 at 4:07 pm said:
        
        https://github.com/CarolEidt
        Carol Eidt is going to do a presentation on RyuJIT (the JIT compiler in CoreCLR) at CGO and PLDI this year. That’d be a good contact: http://blogs.msdn.com/b/clrcodegeneration/archive/2016/02/02/ryujit-tutorial-at-cgo-and-pldi-conferences.aspx
        
        https://github.com/AndyAyersMS
        Andy Ayers who’s leading the LLILC project (CoreCLR / CoreRT compiler based on LLVM) would be another good contact.
      - tobi on February 25, 2016 at 6:26 am said:
        
        Is there a way to reach you by email? Send me an email and I’ll reply.
Noel Grandin on February 20, 2016 at 9:35 pm said:

The core Python people are quite attached to their existing approach, but the PyPy people would probably welcome some help

Reply ↓
Moritz on February 21, 2016 at 4:12 am said:

What do you think about the new Java Compiler called graalVM ?

Reply ↓
- cliffc on February 21, 2016 at 9:02 am said:
  
  I suspect it’s going to have trouble getting traction – even if it gets all the performance of the original compiler. Rewrites that generally don’t add some significant new value generally have problems. I understand that it’s allowing good cross-language performance, so maybe something comes of it yet.
  Cliff
  
  Reply ↓
Eric DeFazio on February 23, 2016 at 4:46 am said:

Any interest in the ISA side of Hardware Architecture? How great would it be to have the father of JITing work with hardware folks building the next great instruction set?
(Agners througts) http://www.agner.org/optimize/blog/read.php?i=421

Seems like fertile ground with FPGAs getting popular, and things like RISC V making waves.

Reply ↓
- cliffc on February 23, 2016 at 8:05 am said:
  
  Been there, Done That with Azul Systems. Twas a sweet ISA to be sure, fun and easy to JIT to, lots of small shortcuts important to compiler folks…
  Needs a real business model in the X86 Era.
  Cliff
  
  Reply ↓
Timothy Baldridge on February 23, 2016 at 7:41 am said:

Have you taken a look at PyPy? They are by far the fastest Python JIT available today, and the concept of a meta-interpreter JIT is something I have always found fascinating.

Reply ↓
- cliffc on February 23, 2016 at 8:09 am said:
  
  I’ll look again. Last I looked (some time ago) they were far far off the HotSpot mark.
  I stared at meta-interpreter JITs for some time, they look really cool in theory.
  Alas, they don’t come close in performance.
  Cliff
  
  Reply ↓
Charles Oliver Nutter on February 23, 2016 at 9:35 am said:

I don’t have a lot of faith in a fast python as long as they cling to their C extensions and reference counting. The most productive efforts have had to abandon both, and as a result nobody uses them. The Ruby world has C ext issues as well, but we’ve managed to build enough of a community around JRuby to get most of them replaced.

What I really want is a JVM that can do the dynamic language optimizations we’ve wanted for years. We need partial escape analysis, better method specialization (especially in light of closures), more flexible method sizing/loading/lifecycle, and a better optimization curve for invokedynamic and method handles. We already beat CRuby (a bytecode interpreter) but we should be 50x faster, not the 3-5x we can boast today.

Reply ↓
- cliffc on February 23, 2016 at 10:17 am said:
  
  Re:Python & ref-counting – I think I get what is useful out of ref-counting (exact short lifetime management and thus exact destructor execution) and keep the speed – needs compiler hacks to watch and mimic the ref-counting lifetimes, seems doable.
  
  Re: JVM hacks: I did some of this at Azul, now caught behind their paywall. I could do it again.
  Really needs a biz model, or at least some reasonable payout for me.
  i.e., I can do these things, but who will pay?
  Oracle? Kickstarter from the Ruby community?
  
  Cliff
  
  Reply ↓
Henry on February 23, 2016 at 2:25 pm said:

How about with Julia lang ? [1]

Even though Python is the lead running for scientific computation but Julia support for parallel operations and better typed system should get some attention for next generation language for ML or scientific computing.

Best of luck after H2O.

[1] http://julialang.org

Reply ↓
Jan Vitek on February 23, 2016 at 6:03 pm said:

All the best for your new adventures, stop by in Boston some time.

As for challenges, Python’s sane. R’s not.

Reply ↓
Thomas on February 24, 2016 at 3:17 pm said:

do the Jetsons thing…oh PLEASE do the Jetsons thing!

Reply ↓
Pat Farrell on March 1, 2016 at 8:35 pm said:

Cliff, Been enjoying your posts for years.

I’ve recently been looking at the “modern language” space, with “go”, “rust”, “D”, etc. My bias is that compilers are good at checking types and both rust and go have shown that you can have an expressive and small language do a lot. You should check out what is happening there.

With new languages, one also needs better database technology. One that strikes a resonance with me is “cockroachDB”. Highly distributed, reliable. Someday might have SQL (not that I care about that).

There is much for someone with your skills to dive into.

Reply ↓
- cliffc on March 1, 2016 at 10:40 pm said:
  
  I’ll probably talk to the “go” and “rust” folks soon.
  Never heard of cockroachDB, I’ll check it out
  Thanks
  Cliff
  
  Reply ↓

Cliff Click's Blog

Hacking {Code, Math, Machines, Life}

Winds of Change

32 thoughts on “Winds of Change”

Leave a Reply Cancel reply