Next steps for unsafe code guidelines

nikomatsakis 2016-08-16 21:34:56 UTC #1

So at long last we have accepted RFC 1643, regarding the formation of an "unsafe code guidelines" strike team. As detailed in the RFC, the idea is that this is a temporary team that will investigate and create recommendations for the rules governing "correct" unsafe code. These rules in turn influence the compiler's ability to optimize.

I wanted to write a post to try and kick off this process. To start, let me announce the initial set of members for the team (I expect we may grow a few more members over time):

As ever, when it comes to making decisions, the goal will be to achieve consensus, not only amongst the subteam (and lang team) but also between the people participating in the discussion. I imagine we'll want to adopt the async decision making protocol that we're working out right now -- in any case, we can hammer it out later, since I don't think we're at the point of making decisions yet anyway.

So how are we to proceed initially? In my view, we're still at an early, exploratory stage in the process. I propose that we center our discussion both here on internals (using this category) and on the repository https://github.com/nikomatsakis/rust-memory-model. It might be nice to rename the repo to rust-lang-nursery/unsafe-code-guidelines.

In this first phase, I think we need to work on collecting a couple of things:

interesting questions that need to be answered (e.g., https://github.com/nikomatsakis/rust-memory-model/issues/2)
- there are already (in the repo) various "litmus tests", summarizing particular situations; some of these issues may be things we can easily resolve; others may turn into litmus tests, if the answer will depend a bit
high-level ideas of different approaches we could take (e.g., the Tootsie Pop or ubariel models)
interesting bits of undefined behavior inherited from C and what its importance is
- collecting academic work and background reading on this topic is also interesting (e.g., Into the Depths of C)
- I'd like to know, for example, a sense of the impact from not assuming all loops terminate (presumably we lose some amount of auto-vectorization)

Here are some ideas for interesting tasks I think we should take on:

examine unsafe code in libstd/libcore and make sure we identify all the dubious/interesting patterns
examine how prominent packages in the wild use unsafe (e.g., rayon
open issues on the repo with interesting questions that come up
- discussion can take place there, and we can move of some of those things into litmus test files
- (or maybe the litmus test files should move to issues?)
try to write-up (possibly as issues?) high-level views of different approaches

Thoughts?

eternaleye 2016-08-16 22:15:04 UTC #2

Sounds like a plan to me!

I think that, in terms of understanding/reasoning about C UB, CompCert's Documentation is likely to be incredibly useful, possibly more so than textual descriptions of the model - for one, it's possible to feed examples through CompCert, and observe how it refines them through its various passes.

It may be viable to weaken various things from "undefined behavior" to "implementation defined behavior", or (on the other axis) from "undefined behavior" to "undefined result" - I suspect CompCert would be very useful ground for mining which cases are amenable to that.

Also, I'd suggest we take a close look at the work that's come out of the VeLLVM Project, as it has significant overlap (and potentially useful tooling), since it specifies a model for exactly what we use as our low-level IR.

RalfJung 2016-08-17 09:56:42 UTC #3

So where would these "resolved issues" be documented? I first thought they would also be "litmus tests", the only different being that we already settled on an answer for their validity.

One thing I am missing here is: What's the right place (in the repo) for "general principles" we'd like to adopt. For example, we may want to decide that removing scopes ({...}) should not affect the validity of a piece of code, so e.g. the following two programs would be equi-valid:

{ let x = ...; let y = ... ; ... ; } let z = ...; ...

 let x = ...; let y = ... ; ... ; let z = ...; ...

There are a whole lot of other source-to-source transformations I can imagine, and it may well turn out that that some of them have surprising effects on what is (not) allowed. I guess we could put them together with the "optimizations", though many of the latter will not even be expressible as source-to-source transformation (they may only by expressible in MIR/LLVM).

We have to be careful here, CompCert makes a whole lot of behavior defined that is undefined in plain C. This is fine for CompCert (the compiler just has less opportunity for optimization), but it makes CompCert unsuited to reason about C UB in general. Maybe a better place for that is this PhD thesis, which makes an actual attempt to define all UB in C: http://robbertkrebbers.nl/thesis.html

Definitely, that's very interesting work! I talked with the authors recently, and it seems there is a new version coming up soon (that's academia-soon ). I also was told that there are efforts underway to change the semantics of LLVM itself (it seems that undef is pretty broken currently; there are optimization which turn one use of a variable into two uses of the same variable assuming it will have the same value, which is just not true if the value is undef). I've been watching LLVM Weekly ever since then, but couldn't find any more details.
However, VeLLVM does not attempt to precisely model LLVM at all cost, so we have to watch out for subtleties here. (I guess typically these will be subtleties which are not covered by the LLVM documentation and which even the LLVM community does not have a coherent view of.) And finally, I was told I shouldn't hold my breath for VeLLVM to support LLVM's aliasing annotations anytime soon, which are unfortunately among the most interesting bits of LLVM semantics when it comes to unsafe code interacting with borrow types -- and again, the documentation doesn't say much either. This is related to C's restrict being poorly understood, on a formal sense.

nikomatsakis 2016-08-17 12:49:11 UTC #4

Great questions. As I was writing I was wondering the same thing.

Perhaps we should set it up like this. In general, files in the repo represent things that have been "decided" in some sense. (Which means the existing files probably should most be removed.) Open issues represent things where discussion is ongoing. Committed files represent "decisions".

Here are some things we might "decide" about at this stage:

Code patterns: This would include things that should or should not be legal.
Optimizations: Things the compiler should be able to do. Closely related to code patterns.
Principles: Things like "results of region inference should not be significant".

One thing I don't quite know how to square is that I think a lot of these answers will ultimately depend on the high-level approach we choose to take. For example, the Tootsie Pop model will make a lot of things legal etc. But maybe that's ok -- perhaps we'll just have a lot of open issues with some tag like "depends"? In particular, from my POV, I feel like we can't make a decision about the set of rules until we have a good idea of the code patterns that get affected.

nikomatsakis 2016-08-17 13:19:58 UTC #5

By the way, I was hoping for some suggestions for good candidates here. I was planning to open an issue with a kind of "hit list", and then encourage people to look at particular projects.

RalfJung 2016-08-17 13:45:33 UTC #6

There are two crates that caught my attention as doing "interesting" things with borrowed data -- "interesting" because they taught me about things you can do with borrows which you couldn't do with safe code:

take_mut lets you temporarily move ownership out of a mutable borrow if you can guarantee that you'll put something back in before we return to the caller. No surprise that this is legal, but its still not possible safely with just libstd as far as I know.

And then there is is (de-)serializer which turns a "shared borrow of a bunch of bytes" into a "shared borrow of a Vec" without actually allocating anything. This is very interesting (at least from my perspective of modelling what borrowing "means") because typically, a Vec will always be backed by a heap-allocated block of memory -- but it seems that actually, if you have a &Vec, you can't make such a statement. In a sense, this shared Vec is not actually a Vec, it's just "enough like a Vec" so that the difference doesn't matter if all you get is shared access. This one has cost me some headache (and still is)

Not sure if that's the kind of example you are looking for, though...

eternaleye 2016-08-17 15:16:58 UTC #7

That's actually exactly what I was getting at when I said:

In particular, if you look at CompCert's information on the performance of generated code, it's very close to that of GCC, even with -O3 - the large averages they quote in the paragraph above, when you look at the graph, can be very clearly attributed to the lack of advanced polyhedral loop optimizations. Benchmarks which don't benefit from those optimizations (which CompCert does not implement at all) are neck-and-neck.

My argument is that CompCert seems to have already made very careful (and successful!) analyses of what UB isn't needed for performance.

EDIT: This may also be of use - kcc is a compiler with the goal that the emitted code errors verbosely if any UB occurs.

arielb1 2016-08-17 16:34:25 UTC #8

Synthetic benchmarks != high-level code, for optimizations. An optimizing compiler is comprised of 2 parts:

A code-generation backend, that transforms "IR" to assembly, scheduling instructions and registers. All decent backends (including CompCert) are within 10% of each-other here.
A middle-end, that transforms code written in an abstract style to code written in an efficient style. UB is important to allow for these transformations.

CompCert basically does not have a middle-end. This means it does fairly well for code written in an efficient style, but when you use it to compile abstract programs it loses a significant amount of performance.

arielb1 2016-08-17 16:38:11 UTC #9

Vec is a library type and Box is almost a library type. From a language/compiler POV, a synthetic &Vec that points to non-heap-allocated-memory is just as valid as a synthetic &Vec that points to heap-allocated memory.

Handling synthetic structs is a bit annoying, but it is something low-level programs do quite a bit, so we have to figure out a way.

jethrogb 2016-08-17 17:17:37 UTC #10

httparse comes to mind.

RalfJung 2016-08-17 17:37:28 UTC #11

I see, good point. I am surprised by that -O3 number, the last I heard was that it is roughly on par with -O1 -- which is way better than the usual compilers used for high-assurance embedded software. My impression also was the the UB in CompCert was mostly driven by a balance of optimization potential and how easy it is to define a formal semantics that one can reason about, where the latter was not a concern at all for plain C. I would be delighted if "formal defineability" would also be a concern in Rust. On the one hand I expect that, in doubt, optimization potential would win, but on the other hand, I believe that formal definition often lead to a clearer design that can also be easier understood even without looking at the formal details, so "programmer understandability" and "formal defineability" strongly correlate. But then, I am heavily biased here.

Well, sure they are (almost) library types, as are Cell and RefCell. But the fact that we consider these types and usage patterns valid still tells us something about the types which are involved. After all, the libstd team (in nailing down the Vec API and guarantees) could also have taken the stanza that such a "synthetic" Vec is invalid, creating and using it is UB (it does take unsafe code to create it, after all).
I don't see why Vec being a library vs. a language type should matter here.

arielb1 2016-08-17 18:04:19 UTC #12

Remember that -O0 rustc is x10 as slow as -O1 rustc (while -O0 gcc is x2 as slow as -O1 gcc on their benchmarks).

Your comment was about a shared Vec not being a real Vec, but syntheticity is orthogonal to standardness. You can create "shared" Vecs using Vec::from_raw_parts and standard Vecs using a transmute.

The Rust line is that if you guess the ABI correctly, synthetic values are equivalent to natural values.

nikomatsakis 2016-08-18 14:21:38 UTC #13

I've been thinking about how to structure the rust-memory-model repository. I am thinking that the best thing is probably to just have a ton of issues for now, particularly in this data-gathering phase, and use labels to categorize them into distinct kinds of questions and proposals.

We can obviously add labels over time, but here is a set that seem like they will be required:

litmus test -- code that may or may not be correct
optimization -- an optimization that may or may not be legal
pre-rfc -- a proposed change to the language or library (example: a modified form of unsafe keyword, or modifying the semantics of offset() as @ubsan has proposed)

There are also some orthogonal attributes:

T-lang -- is this a question of the language itself or...
T-libs -- ...more a question of libraries?
breaking change -- a proposed change that could break existing code

For proposed changes, we should link to the litmus tests and other bits of information that inform why the change may (or may not) make sense.

(We may also want to employ the wiki, but I think I might rather just commit files into the repo instead, e.g. to represent drafts of proposals... not sure about this.)