Hacks
First hack of what is likely to become many over time. This one was done because we had an HFS+ filesystem at work that had become corrupted to the point that fsck and DiskWarrior wouldn’t fix it. The problem was that the journal had become corrupted, and fsck.hfs doesn’t know how to actually fix the journal. DiskWarrior probably would’ve worked just fine, but due to the large size of the volume (1.4TB) and the number of files (> 11 million), we repeatedly ran out of memory when trying to use DiskWarrior.
Now, of course, we have a backup of the data. But, copying 1.4TB of data is sloooow. Plus, our backup was located in another state, so we had to make a copy of the backup to send up here. Did I say slow?
So, while waiting for the backup to copy, I investigated a bit further. After looking at the system.log, I found this:
jnl: open: journal magic is bad (0x1fd17 != 0x4a4e4c78)
hfs: late jnl init: failed to open/create the journal (retval 0).
hfs(3): Journal replay fail. Writing lastMountVersion as FSK!
jnl: is_clean: journal magic is bad (0x1fd17 != 0x4a4e4c78)
hfs: late journal init: volume on disk3s3 is read-only and journal is dirty. Can not mount volume.
That’s when I realized that the journal was corrupt. Quickly searching on google, I found that you can turn off the journal on an HFS filesystem — if it is already mounted. Ouch.
So, I turned to the filesystem spec. Looks like there are two fields in the header that journaling uses — one that has a bit for journaling, the other that says which block the journal is located in. Ah-ha!
I did some experimentation on some HFS+ disc images, and found that I could successfully turn off journaling by setting that bit to zero and resetting the journal location to zero as well. I wrapped that into a quick little hack, crossed my fingers, and ran it on the corrupted filesystem. And it mounted. Just for safety, I also ran fsck on it again, and then later re-enabled journaling.
So, while this worked fine in this instance, I don’t know if there were any other side effects. One potential side effect is that the old journal is still reserved — that space might be gone forever. I haven’t looked it, but really, the journal shouldn’t be taking too much space out of 1.4TB.
Here is the code to that hack — feel free to use it, but please don’t come after me if it kills the filesystem. This should really only be used if you either have a good backup or you have no backup and you’ve tried everything else to get your data back. Also, I should point out that if your journal isn’t corrupt, this isn’t likely to do anything for you.
Update Dec 8 2007:
I’ve made some changes to make this compatible with Leopard — I had two problems. One was that I wasn’t checking the return value from mmap() correctly, which was masking the other problem I had which was my offset to mmap() wasn’t a multiple of page size. Instead of trying to figure up a multiple to use, I instead just set the offset to zero and adjusted where I was reading and writing to. I tested with a HFS+ disk image and not a real disk, but it should still work in theory. Same warnings as above — I hope it works for you, but really only use this if it’s your last hope or you like living dangerously with your data.
41 Comments
Jump to comment form | comments rss [?] | trackback uri [?]