7-Zip options for ultimate compression
For large text files:
7z a -mx=9 -mfb=273 -ms=on archive.7z <input files>
Warning: Compression will be very slow, and files created using these options may require a lot of memory to decompress.
For small text files:
7z a -m0=PPMd -mx=9 -ms=on archive.7z <input files>
If you need compatibility with other tools, here is the maximum ZIP file compression (note mpass can be made as large as you want, but rapidly becomes very slow):
7z a -tzip -mx=9 -mfb=258 -mpass=20 archive.zip <input files>
With the default options (just “7z a”), a 916 MB test log file compressed from 916 MB to 1656 KB. With the above large file options it compressed to 1210 KB, or 26.3% smaller. No other combination of options tried improved upon this.
The small files option compressed the HTML version of War and Peace (3.64 MB) to 725 KB, which is 22.4% smaller than the 934 KB with the default options.
The ZIP options compressed War and Peace from 3.64 MB to 1169 KB, which is 4.4% less than the 1223 KB achieved with “zip -9".
Background
7-Zip is a modern tool for lossless text compression based on the LZMA algorithm (Lempel–Ziv–Markov chain algorithm), a variant of the classic LZ77 algorithm used in ZIP and gzip. Although LZMA requires using its dedicated 7z file format, it can also produce ZIP, gzip, and bzip2 files compatible with standard decompressors but with slightly better compression than standard compressors. It also supports methods other than LZMA like PPMd (a fast compressor for text files), and BCJ2 (a compressor for x86 executables). Documentation is scarce, but a full summary of its compression options as of version 9.23 alpha are here:
http://sevenzip.sourceforge.jp/chm/cmdline/switches/method.htm
Although the documentation describes limits on the range of the options, experimentation and examining the source code shows that the situation is actually more complex.
Keep in mind that although these options worked best in my tests, in general there may be inputs that will cause them to yield inferior results. In particular they’re less suitable for already-compressed data, for which “-mo=lzma2" tends to perform better. I also only tried a single test involving compressing a single text file (a server log file); a more thorough investigation would look at a lot more examples.
Explanation of options
- -mx=9 : Sets compression level to maximum. This option has little effect in my tests (compresses by only 14 bytes more), but is good to include in case of improvements in future versions.
- -mfb=273 : Sets number of fast bytes to maximum value. Without this option our test archive is 22% larger. Documentation suggests it’s only helpful for input files containing long runs of identical characters.
- -ms=on : Solid mode. Instructs 7-Zip to group small files together into blocks which are compressed together. Improves compression, but makes archives slow to update or extract subsets of files.
Options that didn’t help
- -t7z, -m0=lzma : These are the default.
- -m0=lzma2, -m0=PPDm : These compressors are not as good as LZMA.
- -md=2147483647 : Maximizing the dictionary size only made the file a few bytes larger. It might be of benefit for very large archives.
- -mmt=off : No effect.
- -mmc=2147483647 : No effect, not sure why.
- -mlc=4 : Made file larger.