You have 2 free member-only stories left this month.
Why You Should Start Using Pathlib as an Alternative to the OS Module
First reason: object-oriented programming
As a data scientist, I manipulate paths and files on a daily basis to read and write data.
To do this, I typically use the os.path Python module to perform operations such as joining paths, checking the content of a directory, or creating folders.
In fact, using the os.path module seems like a natural choice to access the filesystem.
In this post, I’m challenging this practice by introducing another path management library called Pathlib.
We’ll see how this library works, how it differs from the os.path module, what features and advantages it provides, and when you should (or not) use it.
Without much further ado, let’s have a look 🔍
The “issue” with the os module
The os module is popular: it’s been around for a while. However, I’ve always thought that it handled paths in an unnatural way.
These are the reasons that made me question its use:
osis a large module: it sure has apathsubmodule to manage paths and join them but once you need to perform system operations over these paths (creating a folder, listing the content inside it, or renaming and deleting a file) you’ll have to use other methods that are either present somewhere else in the package hierarchy: (os.makedirs,os.listdir,os.rename,etc.) or imported from other modules likesshutil
orglob
You can still find these methods after some digging, but this seems like an unnecessary effortosrepresents paths in their rawest format: string values. This is quite limiting: it doesn’t give you direct access to information such as the file properties or its metadata nor does it allow you to perform operations on the filesystem by calling some special methods.
For example, to check if a path exists, you’d do something likeos.path.exists(some_path)which is fine. But wouldn’t be easier if this information was accessed more directly from the path object via a class method or attribute?- The
osmodule doesn’t natively allow you to find paths that match a given pattern inside a hierarchy. Let’s say that you want, for example, to recursively find all the__init__.pyfiles inside a very nested folder structure. To do that, you’d have to combineoswith another module calledglob. You can sure get used to that, but do you really need two modules to perform such a task? - This is more of a personal preference, but I’ve always found the
ossyntax a bit clunky. You can read it, you can write it, but for some reason, I always felt that some improvements could be made to make it lighter.
What is Pathlib?
Pathlib is part of the standard Python library and has been introduced since Python 3.4 (see PEP 428) with the goal of representing paths not as simple strings but as supercharged Python objects with many useful methods and attributes under the hood.
As the official documentation states:
“The aim of this library is to provide a simple hierarchy of classes to handle filesystem paths and the common operations users do over them.”
Pathlib is meant to alleviate the aforementioned frustrations faced when using the os module. Let’s have a look at some of its features.
This post is not an in-depth overview of Pathlib. To learn more about this library, I recommend you check the official documentation or the resources I listed at the end.
👉 Pathlib has a more intuitive syntax
To construct a path with Pathlib, you essentially need to import the Path class and pass it a string. This string points to a path on the filesystem that doesn’t necessarily have to exist.
from pathlib import Pathpath = Path("/Users/ahmed.besbes/projects/posts")path # PosixPath('/Users/ahmed.besbes/projects/posts')print(cwd)# /Users/ahmed.besbes/projects/posts
Now that you have access to a Path object. How would you perform simple operations?
- Join paths
Pathlibuses the / operator to join paths. This may look funny at first but it really makes the code easier to read.
Let’s make a comparison.
To join paths using with os module, you’d do something like this:
Using Pathlib, the same code translates into:
Essentially, Pathlib has supercharged the / operator to perform path joins.
- Get the current working directory / the home directory
Methods are already implemented to do that.
from path import Pathlibcwd = Path.cwd()
home = Path.home()
- Read a file
You can either use open with a context manager like you’d do with a typical path, or use read_text or read_bytes .
>>> path = pathlib.Path.home() / file.txt'
>>> path.read_text()There’s obviously much more features to cover. Let’s cover the most interesting ones in the next sections.
👉 It easily creates files and directories
Once a Path object is created, it can perform filesystem operations on its own by calling its internal methods. For example, it can create a folder or pop up a file, just by calling the mkdir and touch methods.
Here’s how a Path object can create a folder:
The same goes for files:
Of course, you can still perform these operations using the os module, but this requires calling another function like makedirs .
👉 It navigates the filesystem hierarchy by accessing the parents
Each Path object has a property called parent that returns a Path object of the parent folder. This makes it easier to manipulate large folder hierarchies. In fact, since Paths are objects we can chain methods to reach the desired parent.
If you want to avoid chaining the parent properties to access the n-th previous parent you can call the parents property that returns a list of all the parents preceding the current folder.
👉 It allows you to iterate on directories and perform pattern matching
Let’s assume you have a Path object that points to a directory.
Pathlib allows you to easily iterate over that directory’s content and also get files and folders that match a specific pattern.
Remember the globmodule that you used to import along with the os module to get paths that match a pattern?
Well, Path objects have a glob method and a recursive version (called rglob ) to perform similar tasks, but with a much lighter syntax.
Let’s say that I want to count the number of Python files in a given folder, here’s how you can do it:
👉 Each Path object has multiple useful attributes
Each Path object has multiple useful methods and attributes that perform operations previously handled by other libraries than os (think glob or shutil )
.exists(): To check if the path really exists on the filesystem.is_dir(): To check if the path corresponds to a directory.is_file(): To check if the path corresponds to a file.is_absolute(): To check if the path is absolute.chmod(): To change the file mode and permissionsis_mount(): To check if the path is a mount point.suffix: Get the extension of a file
There’s more methods. You can check all the list here.
Resources
If you wish to learn more about Pathlib and the differences with the native os.path module, you can have a look at this list of curated resources: it’s 100% free of charge, and it’s good quality. I promise.
- https://docs.python.org/3/library/pathlib.html
- https://treyhunner.com/2018/12/why-you-should-be-using-pathlib/
- https://treyhunner.com/2019/01/no-really-pathlib-is-great/
- python.org/dev/peps/pep-0519/#standard-library-changes
- https://youtu.be/YwhOUyTxXVE
- https://rednafi.github.io/digressions/python/2020/04/13/python-pathlib.html
- https://www.docstring.fr/blog/gerer-des-chemins-de-fichiers-avec-pathlib/ (french 🥐 post)
- https://realpython.com/python-pathlib/
Thanks for reading 🙏
This was a quick post covering some of Pathlib's features.
If you think of transitioning to this library, and assuming you’re using Python +3.4, go ahead and do it: migrating from os to Pathlib is quite easy.
That’ll be all for me today. Until next time for more programming tips and tutorials. 👋