In this post I'll show you the code path that Rust takes inside its
standard library when you open a file. I wanted to learn how Rust
handles system calls and errno
, and all the little subtleties of the
POSIX API. This is what I learned!
The C side of things
When you open a file, or create a socket, or do anything else that
returns an object that can be accessed like a file, you get a file
descriptor in the form of an int
.
/* All of these return a int with a file descriptor, or
* -1 in case of error.
*/
int open(const char *pathname, int flags, ...);
int socket(int domain, int type, int protocol);
You get a nonnegative integer in case of success, or -1 in case of an
error. If there's an error, you look at errno
, which gives you an
integer error code.
int fd;
retry_open:
fd = open ("/foo/bar/baz.txt", 0);
if (fd == -1) {
if (errno == ENOENT) {
/* File doesn't exist */
} else if (errno == ...) [
...
} else if (errno == EINTR) {
goto retry_open; /* interrupted system call; let's retry */
}
}
Many system calls can return EINTR
, which means "interrupted system
call", which means that something interrupted the kernel while it
was doing your system call and it returned control to userspace, with
the syscall unfinished. For example, your process may have received a
Unix signal (e.g. you send it SIGSTOP
by pressing Ctrl-Z on a
terminal, or you resized the terminal and your process got a
SIGWINCH
). Most of the time EINTR
means simply that you must
retry the operation: if you Control-Z a program to suspend it, and
then fg
to continue it again; and if the program was in the middle
of open()
ing a file, you would expect it to continue at that exact
point and to actually open the file. Software that doesn't check for
EINTR
can fail in very subtle ways!
Once you have an open file descriptor, you can read from it:
ssize_t
read_five_bytes (int fd, void *buf)
{
ssize_t result;
retry:
result = read (fd, buf, 5);
if (result == -1) {
if (errno == EINTR) {
goto retry;
} else {
return -1; /* the caller should cherk errno */
}
} else {
return result; /* success */
}
}
... and one has to remember that if read()
returns 0, it means we
were at the end-of-file; if it returns less than the number of bytes
requested it means we were close to the end of file; if this is a
nonblocking socket and it returns EWOULDBLOCK
or EAGAIN
then one
must decide to retry the operation or actually wait and try again
later.
There is a lot of buggy software written in C that tries to use the POSIX API directly, and gets these subtleties wrong. Most programs written in high-level languages use the I/O facilities provided by their language, which hopefully make things easier.
I/O in Rust
Rust makes error handling convenient and safe. If you decide to
ignore an error, the code looks like it is ignoring the error
(e.g. you can grep for unwrap()
and find lazy code). The
code actually looks better if it doesn't ignore the error and
properly propagates it upstream (e.g. you can use the ?
shortcut to
propagate errors to the calling function).
I keep recommending this article on error models to people; it discusses POSIX-like error codes vs. exceptions vs. more modern approaches like Haskell's and Rust's - definitely worth studying over a few of days (also, see Miguel's valiant effort to move C# I/O away from exceptions for I/O errors).
So, what happens when one opens a file in Rust, from the toplevel API down to the system calls? Let's go down the rabbit hole.
You can open a file like this:
use std::fs::File;
fn main () {
let f = File::open ("foo.txt");
...
}
This does not give you a raw file descriptor; it gives you an
io::Result<fs::File, io::Error>
, which you must pick apart to see if
you actually got back a File that you can operate on, or an error.
Let's look at the implementation of File::open()
and File::create()
.
impl File {
pub fn open<P: AsRef<Path>>(path: P) -> io::Result<File> {
OpenOptions::new().read(true).open(path.as_ref())
}
pub fn create<P: AsRef<Path>>(path: P) -> io::Result<File> {
OpenOptions::new().write(true).create(true).truncate(true).open(path.as_ref())
}
...
}
Here, OpenOptions
is an auxiliary struct that implements a "builder"
pattern. Instead of passing bitflags for the various
O_CREATE/O_APPEND/etc.
flags from the open(2)
system call, one
builds a struct with the desired options, and finally calls .open()
on it.
So, let's look at the implementation of OpenOptions.open()
:
pub fn open<P: AsRef<Path>>(&self, path: P) -> io::Result<File> {
self._open(path.as_ref())
}
fn _open(&self, path: &Path) -> io::Result<File> {
let inner = fs_imp::File::open(path, &self.0)?;
Ok(File { inner: inner })
}
See that fs_imp::File::open()
? That's what we want: it's the
platform-specific wrapper for opening files. Let's look
at its implementation for Unix:
pub fn open(path: &Path, opts: &OpenOptions) -> io::Result<File> {
let path = cstr(path)?;
File::open_c(&path, opts)
}
The first line, let path = cstr(path)?
tries to convert a Path
into a nul-terminated C string. The second line calls the following:
pub fn open_c(path: &CStr, opts: &OpenOptions) -> io::Result<File> {
let flags = libc::O_CLOEXEC |
opts.get_access_mode()? |
opts.get_creation_mode()? |
(opts.custom_flags as c_int & !libc::O_ACCMODE);
let fd = cvt_r(|| unsafe {
open64(path.as_ptr(), flags, opts.mode as c_int)
})?;
let fd = FileDesc::new(fd);
...
Ok(File(fd))
}
Here, let flags = ...
converts the OpenOptions
we had in the
beginning to an int with bit flags.
Then, it does let fd = cvt_r (LAMBDA)
, and that lambda function
calls the actual open64()
from libc (a Rust wrapper for the system's
libc): it returns a file descriptor, or -1 on error. Why is this
done in a lambda? Let's look at cvt_r()
:
pub fn cvt_r<T, F>(mut f: F) -> io::Result<T>
where T: IsMinusOne,
F: FnMut() -> T
{
loop {
match cvt(f()) {
Err(ref e) if e.kind() == ErrorKind::Interrupted => {}
other => return other,
}
}
}
Okay! Here f
is the lambda that calls open64()
; cvt_r()
calls
it in a loop and translates the POSIX-like result into something
friendly to Rust. This loop is where it handles EINTR
, which gets
translated into ErrorKind::Interrupted
. I suppose cvt_r()
stands
for convert_retry()
? Let's look at
the implementation of cvt()
, which fetches the error code:
pub fn cvt<T: IsMinusOne>(t: T) -> io::Result<T> {
if t.is_minus_one() {
Err(io::Error::last_os_error())
} else {
Ok(t)
}
}
(The IsMinusOne
shenanigans are just a Rust-ism to help convert
multiple integer types without a lot of as
casts.)
The above means, if the POSIX-like result was -1, return an Err()
from
the last error returned by the operating system. That should surely
be errno
internally, correct? Let's look at
the implementation for io::Error::last_os_error()
:
pub fn last_os_error() -> Error {
Error::from_raw_os_error(sys::os::errno() as i32)
}
We don't need to look at Error::from_raw_os_error()
; it's just a
conversion function from an errno
value into a Rust enum value.
However, let's look at sys::os::errno()
:
pub fn errno() -> i32 {
unsafe {
(*errno_location()) as i32
}
}
Here, errno_location()
is an extern
function defined in GNU libc
(or whatever C library your Unix uses). It returns a pointer to the
actual int which is the errno
thread-local variable. Since non-C
code can't use libc's global variables directly, there needs to be a
way to get their addresses via function calls - that's what
errno_location()
is for.
And on Windows?
Remember the internal File.open()
? This is what it looks
like on Windows:
pub fn open(path: &Path, opts: &OpenOptions) -> io::Result<File> {
let path = to_u16s(path)?;
let handle = unsafe {
c::CreateFileW(path.as_ptr(),
opts.get_access_mode()?,
opts.share_mode,
opts.security_attributes as *mut _,
opts.get_creation_mode()?,
opts.get_flags_and_attributes(),
ptr::null_mut())
};
if handle == c::INVALID_HANDLE_VALUE {
Err(Error::last_os_error())
} else {
Ok(File { handle: Handle::new(handle) })
}
}
CreateFileW()
is the Windows API function to open files. The
conversion of error codes inside Error::last_os_error()
happens
analogously - it calls GetLastError()
from the Windows API and
converts it.
Can we not call C libraries?
The Rust/Unix code above depends on the system's libc for open()
and
errno
, which are entirely C constructs. Libc is what actually does
the system calls. There are efforts to make the Rust standard library
not use libc and use syscalls directly.
As an example, you can look at the Rust standard library for Redox. Redox is a new operating system kernel entirely written in Rust. Fun times!
Update: If you want to see what a C-less libstd would look like, take a look at steed, an effort to reimplement Rust's libstd without C dependencies.
Conclusion
Rust is very meticulous about error handling, but it succeeds in
making it pleasant to read. I/O functions give you back an
io::Result<>
, which you piece apart to see if it succeeded or got an
error.
Internally, and for each platform it supports, the Rust standard
library translates errno
from libc into an io::ErrorKind
Rust
enum. The standard library also automatically handles Unix-isms like
retrying operations on EINTR
.
I've been enjoying reading the Rust standard library code: it
has taught me many Rust-isms, and it's nice to see how the
hairy/historical libc constructs are translated into clean Rust
idioms. I hope this little trip down the rabbit hole for the
open(2)
system call lets you look in other interesting places, too.