You have 2 free member-only stories left this month.

Why You Should Start Using Pathlib as an Alternative to the OS Module

First reason: object-oriented programming

Image made by the author

As a data scientist, I manipulate paths and files on a daily basis to read and write data.

To do this, I typically use the os.path Python module to perform operations such as joining paths, checking the content of a directory, or creating folders.

In fact, using the os.path module seems like a natural choice to access the filesystem.

In this post, I’m challenging this practice by introducing another path management library called Pathlib.
We’ll see how this library works, how it differs from the os.path module, what features and advantages it provides, and when you should (or not) use it.

Without much further ado, let’s have a look 🔍

The “issue” with the os module

The os module is popular: it’s been around for a while. However, I’ve always thought that it handled paths in an unnatural way.

These are the reasons that made me question its use:

  • os is a large module: it sure has a path submodule to manage paths and join them but once you need to perform system operations over these paths (creating a folder, listing the content inside it, or renaming and deleting a file) you’ll have to use other methods that are either present somewhere else in the package hierarchy: (os.makedirs, os.listdir, os.rename, etc.) or imported from other modules likes shutil
    or glob
    You can still find these methods after some digging, but this seems like an unnecessary effort
  • os represents paths in their rawest format: string values. This is quite limiting: it doesn’t give you direct access to information such as the file properties or its metadata nor does it allow you to perform operations on the filesystem by calling some special methods.
    For example, to check if a path exists, you’d do something like os.path.exists(some_path) which is fine. But wouldn’t be easier if this information was accessed more directly from the path object via a class method or attribute?
  • The os module doesn’t natively allow you to find paths that match a given pattern inside a hierarchy. Let’s say that you want, for example, to recursively find all the __init__.pyfiles inside a very nested folder structure. To do that, you’d have to combine os with another module called glob. You can sure get used to that, but do you really need two modules to perform such a task?
  • This is more of a personal preference, but I’ve always found the os syntax a bit clunky. You can read it, you can write it, but for some reason, I always felt that some improvements could be made to make it lighter.

What is Pathlib?

Pathlib is part of the standard Python library and has been introduced since Python 3.4 (see PEP 428) with the goal of representing paths not as simple strings but as supercharged Python objects with many useful methods and attributes under the hood.

As the official documentation states:

“The aim of this library is to provide a simple hierarchy of classes to handle filesystem paths and the common operations users do over them.”

Pathlib is meant to alleviate the aforementioned frustrations faced when using the os module. Let’s have a look at some of its features.

This post is not an in-depth overview of Pathlib. To learn more about this library, I recommend you check the official documentation or the resources I listed at the end.

👉 Pathlib has a more intuitive syntax

To construct a path with Pathlib, you essentially need to import the Path class and pass it a string. This string points to a path on the filesystem that doesn’t necessarily have to exist.

from pathlib import Pathpath = Path("/Users/ahmed.besbes/projects/posts")path # PosixPath('/Users/ahmed.besbes/projects/posts')print(cwd)# /Users/ahmed.besbes/projects/posts

Now that you have access to a Path object. How would you perform simple operations?

  • Join paths

Pathlibuses the / operator to join paths. This may look funny at first but it really makes the code easier to read.

Let’s make a comparison.

To join paths using with os module, you’d do something like this:

Using Pathlib, the same code translates into:

Essentially, Pathlib has supercharged the / operator to perform path joins.

  • Get the current working directory / the home directory

Methods are already implemented to do that.

from path import Pathlibcwd = Path.cwd()
home = Path.home()
  • Read a file

You can either use open with a context manager like you’d do with a typical path, or use read_text or read_bytes .

>>> path = pathlib.Path.home() / file.txt'
>>> path.read_text()

There’s obviously much more features to cover. Let’s cover the most interesting ones in the next sections.

👉 It easily creates files and directories

Once a Path object is created, it can perform filesystem operations on its own by calling its internal methods. For example, it can create a folder or pop up a file, just by calling the mkdir and touch methods.

Here’s how a Path object can create a folder:

Create a folder

The same goes for files:

Create a file

Of course, you can still perform these operations using the os module, but this requires calling another function like makedirs .

👉 It navigates the filesystem hierarchy by accessing the parents

Each Path object has a property called parent that returns a Path object of the parent folder. This makes it easier to manipulate large folder hierarchies. In fact, since Paths are objects we can chain methods to reach the desired parent.

If you want to avoid chaining the parent properties to access the n-th previous parent you can call the parents property that returns a list of all the parents preceding the current folder.

👉 It allows you to iterate on directories and perform pattern matching

Let’s assume you have a Path object that points to a directory.

Pathlib allows you to easily iterate over that directory’s content and also get files and folders that match a specific pattern.

Remember the globmodule that you used to import along with the os module to get paths that match a pattern?

Well, Path objects have a glob method and a recursive version (called rglob ) to perform similar tasks, but with a much lighter syntax.

Let’s say that I want to count the number of Python files in a given folder, here’s how you can do it:

👉 Each Path object has multiple useful attributes

Each Path object has multiple useful methods and attributes that perform operations previously handled by other libraries than os (think glob or shutil )

  • .exists() : To check if the path really exists on the filesystem
  • .is_dir() : To check if the path corresponds to a directory
  • .is_file() : To check if the path corresponds to a file
  • .is_absolute() : To check if the path is absolute
  • .chmod() : To change the file mode and permissions
  • is_mount() : To check if the path is a mount point
  • .suffix : Get the extension of a file

There’s more methods. You can check all the list here.

Resources

If you wish to learn more about Pathlib and the differences with the native os.path module, you can have a look at this list of curated resources: it’s 100% free of charge, and it’s good quality. I promise.

Thanks for reading 🙏

This was a quick post covering some of Pathlib's features.

If you think of transitioning to this library, and assuming you’re using Python +3.4, go ahead and do it: migrating from os to Pathlib is quite easy.

That’ll be all for me today. Until next time for more programming tips and tutorials. 👋

Photo by Karsten Winegeart on Unsplash

New to Medium? You can subscribe for $5 per month and unlock unlimited articles — click here.

🇫🇷 🇹🇳 Machine Learning engineer with a taste for software and automation | Creative problem solver | Subscribe @ https://ahmedbesbes.medium.com/member

Sign up for The Variable

By Towards Data Science

Every Thursday, the Variable delivers the very best of Towards Data Science: from hands-on tutorials and cutting-edge research to original features you don't want to miss. Take a look.

Your home for data science. A Medium publication sharing concepts, ideas and codes.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store