0

I am trying to decompress a WARC ZST file that I downloaded from here: https://archive.org/details/archiveteam_yahooanswers_20210422220546_c4fac540

I tried the command zstd -d yahooanswers_20210422220546_c4fac540.1619026173.megawarc.warc.zst but I got this error: 73.megawarc.warc.zst : 0 MB... 73.megawarc.warc.zst : Decoding error (36) : Dictionary mismatch How can I find the said dictionary or are there any alternatives to this?

0

The dictionary can be found inside the first skippable frame of the warc.

To extract the dictionary OrIdow6 write this to extract it: https://transfer.notkiska.pw/inline/TXlRo/xtract.py

You'll require python3, zstd and zstandard

python ./xtract.py /path/to/megawarc.warc.zst > dict

Then you can

zstd -d /path/to/megawarc.warc.zst -D dict

And you should be able to view the megawarc with your standard warc viewing tools

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

Not the answer you're looking for? Browse other questions tagged or ask your own question.