You have 2 free member-only stories left this month.
Why You Should Start Using Pathlib as an Alternative to the OS Module
First reason: object-oriented programming
As a data scientist, I manipulate paths and files on a daily basis to read and write data.
To do this, I typically use the os.path
Python module to perform operations such as joining paths, checking the content of a directory, or creating folders.
In fact, using the os.path
module seems like a natural choice to access the filesystem.
In this post, I’m challenging this practice by introducing another path management library called Pathlib.
We’ll see how this library works, how it differs from the os.path module, what features and advantages it provides, and when you should (or not) use it.
Without much further ado, let’s have a look 🔍
The “issue” with the os module
The os
module is popular: it’s been around for a while. However, I’ve always thought that it handled paths in an unnatural way.
These are the reasons that made me question its use:
os
is a large module: it sure has apath
submodule to manage paths and join them but once you need to perform system operations over these paths (creating a folder, listing the content inside it, or renaming and deleting a file) you’ll have to use other methods that are either present somewhere else in the package hierarchy: (os.makedirs,
os.listdir,
os.rename,
etc.) or imported from other modules likesshutil
orglob
You can still find these methods after some digging, but this seems like an unnecessary effortos
represents paths in their rawest format: string values. This is quite limiting: it doesn’t give you direct access to information such as the file properties or its metadata nor does it allow you to perform operations on the filesystem by calling some special methods.
For example, to check if a path exists, you’d do something likeos.path.exists(some_path)
which is fine. But wouldn’t be easier if this information was accessed more directly from the path object via a class method or attribute?- The
os
module doesn’t natively allow you to find paths that match a given pattern inside a hierarchy. Let’s say that you want, for example, to recursively find all the__init__.py
files inside a very nested folder structure. To do that, you’d have to combineos
with another module calledglob
. You can sure get used to that, but do you really need two modules to perform such a task? - This is more of a personal preference, but I’ve always found the
os
syntax a bit clunky. You can read it, you can write it, but for some reason, I always felt that some improvements could be made to make it lighter.
What is Pathlib?
Pathlib is part of the standard Python library and has been introduced since Python 3.4 (see PEP 428) with the goal of representing paths not as simple strings but as supercharged Python objects with many useful methods and attributes under the hood.
As the official documentation states:
“The aim of this library is to provide a simple hierarchy of classes to handle filesystem paths and the common operations users do over them.”
Pathlib is meant to alleviate the aforementioned frustrations faced when using the os
module. Let’s have a look at some of its features.
This post is not an in-depth overview of Pathlib. To learn more about this library, I recommend you check the official documentation or the resources I listed at the end.
👉 Pathlib has a more intuitive syntax
To construct a path with Pathlib, you essentially need to import the Path class and pass it a string. This string points to a path on the filesystem that doesn’t necessarily have to exist.
from pathlib import Pathpath = Path("/Users/ahmed.besbes/projects/posts")path # PosixPath('/Users/ahmed.besbes/projects/posts')print(cwd)# /Users/ahmed.besbes/projects/posts
Now that you have access to a Path
object. How would you perform simple operations?
- Join paths
Pathlib
uses the / operator to join paths. This may look funny at first but it really makes the code easier to read.
Let’s make a comparison.
To join paths using with os
module, you’d do something like this:
Using Pathlib, the same code translates into:
Essentially, Pathlib has supercharged the / operator to perform path joins.
- Get the current working directory / the home directory
Methods are already implemented to do that.
from path import Pathlibcwd = Path.cwd()
home = Path.home()
- Read a file
You can either use open
with a context manager like you’d do with a typical path, or use read_text
or read_bytes
.
>>> path = pathlib.Path.home() / file.txt'
>>> path.read_text()
There’s obviously much more features to cover. Let’s cover the most interesting ones in the next sections.
👉 It easily creates files and directories
Once a Path object is created, it can perform filesystem operations on its own by calling its internal methods. For example, it can create a folder or pop up a file, just by calling the mkdir
and touch
methods.
Here’s how a Path object can create a folder:
The same goes for files:
Of course, you can still perform these operations using the os
module, but this requires calling another function like makedirs
.
👉 It navigates the filesystem hierarchy by accessing the parents
Each Path object has a property called parent
that returns a Path object of the parent folder. This makes it easier to manipulate large folder hierarchies. In fact, since Paths are objects we can chain methods to reach the desired parent.
If you want to avoid chaining the parent
properties to access the n-th previous parent you can call the parents
property that returns a list of all the parents preceding the current folder.
👉 It allows you to iterate on directories and perform pattern matching
Let’s assume you have a Path object that points to a directory.
Pathlib allows you to easily iterate over that directory’s content and also get files and folders that match a specific pattern.
Remember the glob
module that you used to import along with the os
module to get paths that match a pattern?
Well, Path objects have a glob
method and a recursive version (called rglob
) to perform similar tasks, but with a much lighter syntax.
Let’s say that I want to count the number of Python files in a given folder, here’s how you can do it:
👉 Each Path object has multiple useful attributes
Each Path object has multiple useful methods and attributes that perform operations previously handled by other libraries than os
(think glob
or shutil
)
.exists()
: To check if the path really exists on the filesystem.is_dir()
: To check if the path corresponds to a directory.is_file()
: To check if the path corresponds to a file.is_absolute()
: To check if the path is absolute.chmod()
: To change the file mode and permissionsis_mount()
: To check if the path is a mount point.suffix
: Get the extension of a file
There’s more methods. You can check all the list here.
Resources
If you wish to learn more about Pathlib and the differences with the native os.path
module, you can have a look at this list of curated resources: it’s 100% free of charge, and it’s good quality. I promise.
- https://docs.python.org/3/library/pathlib.html
- https://treyhunner.com/2018/12/why-you-should-be-using-pathlib/
- https://treyhunner.com/2019/01/no-really-pathlib-is-great/
- python.org/dev/peps/pep-0519/#standard-library-changes
- https://youtu.be/YwhOUyTxXVE
- https://rednafi.github.io/digressions/python/2020/04/13/python-pathlib.html
- https://www.docstring.fr/blog/gerer-des-chemins-de-fichiers-avec-pathlib/ (french 🥐 post)
- https://realpython.com/python-pathlib/
Thanks for reading 🙏
This was a quick post covering some of Pathlib's features.
If you think of transitioning to this library, and assuming you’re using Python +3.4, go ahead and do it: migrating from os to Pathlib is quite easy.
That’ll be all for me today. Until next time for more programming tips and tutorials. 👋