| inikep | 6416b0d | 2016-08-29 13:04:26 +0200 | [diff] [blame] | 1 | Command Line Interface for Zstandard library |
| 2 | ============================================ |
| inikep | ab2f770 | 2016-08-25 10:07:20 +0200 | [diff] [blame] | 3 | |
| 4 | Command Line Interface (CLI) can be created using the `make` command without any additional parameters. |
| 5 | There are however other Makefile targets that create different variations of CLI: |
| Yann Collet | 23706fb | 2017-08-19 01:14:36 -0700 | [diff] [blame] | 6 | - `zstd` : default CLI supporting gzip-like arguments; includes dictionary builder, benchmark, and support for decompression of legacy zstd formats |
| 7 | - `zstd_nolegacy` : Same as `zstd` but without support for legacy zstd formats |
| 8 | - `zstd-small` : CLI optimized for minimal size; no dictionary builder, no benchmark, and no support for legacy zstd formats |
| 9 | - `zstd-compress` : version of CLI which can only compress into zstd format |
| 10 | - `zstd-decompress` : version of CLI which can only decompress zstd format |
| inikep | ab2f770 | 2016-08-25 10:07:20 +0200 | [diff] [blame] | 11 | |
| 12 | |
| Yann Collet | 710497d | 2017-05-02 17:18:24 -0700 | [diff] [blame] | 13 | #### Compilation variables |
| Yann Collet | cb5eba8 | 2018-01-19 11:26:35 -0800 | [diff] [blame] | 14 | `zstd` scope can be altered by modifying the following `make` variables : |
| Yann Collet | 710497d | 2017-05-02 17:18:24 -0700 | [diff] [blame] | 15 | |
| 16 | - __HAVE_THREAD__ : multithreading is automatically enabled when `pthread` is detected. |
| Yann Collet | cb5eba8 | 2018-01-19 11:26:35 -0800 | [diff] [blame] | 17 | It's possible to disable multithread support, by setting `HAVE_THREAD=0`. |
| 18 | Example : `make zstd HAVE_THREAD=0` |
| 19 | It's also possible to force multithread support, using `HAVE_THREAD=1`. |
| 20 | In which case, linking stage will fail if neither `pthread` nor `windows.h` library can be found. |
| 21 | This is useful to ensure this feature is not silently disabled. |
| W. Felix Handte | dc27c36 | 2017-09-28 19:34:39 -0400 | [diff] [blame] | 22 | |
| Yann Collet | 1c108c8 | 2017-08-19 13:33:50 -0700 | [diff] [blame] | 23 | - __ZSTD_LEGACY_SUPPORT__ : `zstd` can decompress files compressed by older versions of `zstd`. |
| 24 | Starting v0.8.0, all versions of `zstd` produce frames compliant with the [specification](../doc/zstd_compression_format.md), and are therefore compatible. |
| 25 | But older versions (< v0.8.0) produced different, incompatible, frames. |
| 26 | By default, `zstd` supports decoding legacy formats >= v0.4.0 (`ZSTD_LEGACY_SUPPORT=4`). |
| 27 | This can be altered by modifying this compilation variable. |
| 28 | `ZSTD_LEGACY_SUPPORT=1` means "support all formats >= v0.1.0". |
| 29 | `ZSTD_LEGACY_SUPPORT=2` means "support all formats >= v0.2.0", and so on. |
| 30 | `ZSTD_LEGACY_SUPPORT=0` means _DO NOT_ support any legacy format. |
| 31 | if `ZSTD_LEGACY_SUPPORT >= 8`, it's the same as `0`, since there is no legacy format after `7`. |
| 32 | Note : `zstd` only supports decoding older formats, and cannot generate any legacy format. |
| 33 | |
| Yann Collet | cb5eba8 | 2018-01-19 11:26:35 -0800 | [diff] [blame] | 34 | - __HAVE_ZLIB__ : `zstd` can compress and decompress files in `.gz` format. |
| 35 | This is ordered through command `--format=gzip`. |
| 36 | Alternatively, symlinks named `gzip` or `gunzip` will mimic intended behavior. |
| 37 | `.gz` support is automatically enabled when `zlib` library is detected at build time. |
| 38 | It's possible to disable `.gz` support, by setting `HAVE_ZLIB=0`. |
| 39 | Example : `make zstd HAVE_ZLIB=0` |
| 40 | It's also possible to force compilation with zlib support, `using HAVE_ZLIB=1`. |
| 41 | In which case, linking stage will fail if `zlib` library cannot be found. |
| 42 | This is useful to prevent silent feature disabling. |
| 43 | |
| 44 | - __HAVE_LZMA__ : `zstd` can compress and decompress files in `.xz` and `.lzma` formats. |
| 45 | This is ordered through commands `--format=xz` and `--format=lzma` respectively. |
| 46 | Alternatively, symlinks named `xz`, `unxz`, `lzma`, or `unlzma` will mimic intended behavior. |
| 47 | `.xz` and `.lzma` support is automatically enabled when `lzma` library is detected at build time. |
| 48 | It's possible to disable `.xz` and `.lzma` support, by setting `HAVE_LZMA=0` . |
| 49 | Example : `make zstd HAVE_LZMA=0` |
| 50 | It's also possible to force compilation with lzma support, using `HAVE_LZMA=1`. |
| 51 | In which case, linking stage will fail if `lzma` library cannot be found. |
| 52 | This is useful to prevent silent feature disabling. |
| 53 | |
| 54 | - __HAVE_LZ4__ : `zstd` can compress and decompress files in `.lz4` formats. |
| 55 | This is ordered through commands `--format=lz4`. |
| 56 | Alternatively, symlinks named `lz4`, or `unlz4` will mimic intended behavior. |
| 57 | `.lz4` support is automatically enabled when `lz4` library is detected at build time. |
| 58 | It's possible to disable `.lz4` support, by setting `HAVE_LZ4=0` . |
| 59 | Example : `make zstd HAVE_LZ4=0` |
| 60 | It's also possible to force compilation with lz4 support, using `HAVE_LZ4=1`. |
| 61 | In which case, linking stage will fail if `lz4` library cannot be found. |
| 62 | This is useful to prevent silent feature disabling. |
| 63 | |
| Yann Collet | 710497d | 2017-05-02 17:18:24 -0700 | [diff] [blame] | 64 | |
| inikep | ab2f770 | 2016-08-25 10:07:20 +0200 | [diff] [blame] | 65 | #### Aggregation of parameters |
| Yann Collet | 710497d | 2017-05-02 17:18:24 -0700 | [diff] [blame] | 66 | CLI supports aggregation of parameters i.e. `-b1`, `-e18`, and `-i1` can be joined into `-b1e18i1`. |
| inikep | ab2f770 | 2016-08-25 10:07:20 +0200 | [diff] [blame] | 67 | |
| 68 | |
| Yann Collet | cb5eba8 | 2018-01-19 11:26:35 -0800 | [diff] [blame] | 69 | #### Symlink shortcuts |
| 70 | It's possible to invoke `zstd` through a symlink. |
| 71 | When the name of the symlink has a specific value, it triggers an associated behavior. |
| 72 | - `zstdmt` : compress using all cores available on local system. |
| 73 | - `zcat` : will decompress and output target file using any of the supported formats. `gzcat` and `zstdcat` are also equivalent. |
| 74 | - `gzip` : if zlib support is enabled, will mimic `gzip` by compressing file using `.gz` format, removing source file by default (use `--keep` to preserve). If zlib is not supported, triggers an error. |
| 75 | - `xz` : if lzma support is enabled, will mimic `xz` by compressing file using `.xz` format, removing source file by default (use `--keep` to preserve). If xz is not supported, triggers an error. |
| 76 | - `lzma` : if lzma support is enabled, will mimic `lzma` by compressing file using `.lzma` format, removing source file by default (use `--keep` to preserve). If lzma is not supported, triggers an error. |
| 77 | - `lz4` : if lz4 support is enabled, will mimic `lz4` by compressing file using `.lz4` format. If lz4 is not supported, triggers an error. |
| 78 | - `unzstd` and `unlz4` will decompress any of the supported format. |
| 79 | - `ungz`, `unxz` and `unlzma` will do the same, and will also remove source file by default (use `--keep` to preserve). |
| 80 | |
| 81 | |
| inikep | ab2f770 | 2016-08-25 10:07:20 +0200 | [diff] [blame] | 82 | #### Dictionary builder in Command Line Interface |
| 83 | Zstd offers a training mode, which can be used to tune the algorithm for a selected |
| 84 | type of data, by providing it with a few samples. The result of the training is stored |
| 85 | in a file selected with the `-o` option (default name is `dictionary`), |
| 86 | which can be loaded before compression and decompression. |
| 87 | |
| 88 | Using a dictionary, the compression ratio achievable on small data improves dramatically. |
| 89 | These compression gains are achieved while simultaneously providing faster compression and decompression speeds. |
| Yann Collet | 710497d | 2017-05-02 17:18:24 -0700 | [diff] [blame] | 90 | Dictionary work if there is some correlation in a family of small data (there is no universal dictionary). |
| inikep | ab2f770 | 2016-08-25 10:07:20 +0200 | [diff] [blame] | 91 | Hence, deploying one dictionary per type of data will provide the greater benefits. |
| 92 | Dictionary gains are mostly effective in the first few KB. Then, the compression algorithm |
| 93 | will rely more and more on previously decoded content to compress the rest of the file. |
| 94 | |
| 95 | Usage of the dictionary builder and created dictionaries with CLI: |
| inikep | 0132375 | 2016-08-25 12:20:38 +0200 | [diff] [blame] | 96 | |
| Yann Collet | 23706fb | 2017-08-19 01:14:36 -0700 | [diff] [blame] | 97 | 1. Create the dictionary : `zstd --train PathToTrainingSet/* -o dictionaryName` |
| inikep | de9d130 | 2016-08-25 14:59:08 +0200 | [diff] [blame] | 98 | 2. Compress with the dictionary: `zstd FILE -D dictionaryName` |
| 99 | 3. Decompress with the dictionary: `zstd --decompress FILE.zst -D dictionaryName` |
| inikep | ab2f770 | 2016-08-25 10:07:20 +0200 | [diff] [blame] | 100 | |
| 101 | |
| inikep | ab2f770 | 2016-08-25 10:07:20 +0200 | [diff] [blame] | 102 | #### Benchmark in Command Line Interface |
| 103 | CLI includes in-memory compression benchmark module for zstd. |
| inikep | 637d335 | 2016-08-25 10:42:49 +0200 | [diff] [blame] | 104 | The benchmark is conducted using given filenames. The files are read into memory and joined together. |
| inikep | ab2f770 | 2016-08-25 10:07:20 +0200 | [diff] [blame] | 105 | It makes benchmark more precise as it eliminates I/O overhead. |
| Yann Collet | 23706fb | 2017-08-19 01:14:36 -0700 | [diff] [blame] | 106 | Multiple filenames can be supplied, as multiple parameters, with wildcards, |
| 107 | or names of directories can be used as parameters with `-r` option. |
| inikep | ab2f770 | 2016-08-25 10:07:20 +0200 | [diff] [blame] | 108 | |
| 109 | The benchmark measures ratio, compressed size, compression and decompression speed. |
| 110 | One can select compression levels starting from `-b` and ending with `-e`. |
| 111 | The `-i` parameter selects minimal time used for each of tested levels. |
| 112 | |
| 113 | |
| inikep | ab2f770 | 2016-08-25 10:07:20 +0200 | [diff] [blame] | 114 | #### Usage of Command Line Interface |
| 115 | The full list of options can be obtained with `-h` or `-H` parameter: |
| 116 | ``` |
| 117 | Usage : |
| 118 | zstd [args] [FILE(s)] [-o file] |
| 119 | |
| 120 | FILE : a filename |
| 121 | with no FILE, or when FILE is - , read standard input |
| 122 | Arguments : |
| ne-sted | 50aea2f | 2018-01-24 14:59:44 +0200 | [diff] [blame] | 123 | -# : # compression level (1-19, default: 3) |
| inikep | ab2f770 | 2016-08-25 10:07:20 +0200 | [diff] [blame] | 124 | -d : decompression |
| 125 | -D file: use `file` as Dictionary |
| 126 | -o file: result stored into `file` (only if 1 input file) |
| Yann Collet | 710497d | 2017-05-02 17:18:24 -0700 | [diff] [blame] | 127 | -f : overwrite output without prompting and (de)compress links |
| inikep | ab2f770 | 2016-08-25 10:07:20 +0200 | [diff] [blame] | 128 | --rm : remove source file(s) after successful de/compression |
| 129 | -k : preserve source file(s) (default) |
| 130 | -h/-H : display help/long help and exit |
| 131 | |
| 132 | Advanced arguments : |
| 133 | -V : display Version number and exit |
| Yann Collet | 710497d | 2017-05-02 17:18:24 -0700 | [diff] [blame] | 134 | -v : verbose mode; specify multiple times to increase verbosity |
| inikep | ab2f770 | 2016-08-25 10:07:20 +0200 | [diff] [blame] | 135 | -q : suppress warnings; specify twice to suppress errors too |
| 136 | -c : force write to standard output, even if it is the console |
| Yann Collet | 23706fb | 2017-08-19 01:14:36 -0700 | [diff] [blame] | 137 | -l : print information about zstd compressed files |
| inikep | ab2f770 | 2016-08-25 10:07:20 +0200 | [diff] [blame] | 138 | --ultra : enable levels beyond 19, up to 22 (requires more memory) |
| Stella Lau | 8c33cfe | 2017-09-06 11:03:35 -0700 | [diff] [blame] | 139 | --long : enable long distance matching (requires more memory) |
| inikep | ab2f770 | 2016-08-25 10:07:20 +0200 | [diff] [blame] | 140 | --no-dictID : don't write dictID into header (dictionary compression) |
| ne-sted | 50aea2f | 2018-01-24 14:59:44 +0200 | [diff] [blame] | 141 | --[no-]check : integrity check (default: enabled) |
| Yann Collet | 710497d | 2017-05-02 17:18:24 -0700 | [diff] [blame] | 142 | -r : operate recursively on directories |
| 143 | --format=gzip : compress files to the .gz format |
| Yann Collet | 23706fb | 2017-08-19 01:14:36 -0700 | [diff] [blame] | 144 | --format=xz : compress files to the .xz format |
| 145 | --format=lzma : compress files to the .lzma format |
| inikep | ab2f770 | 2016-08-25 10:07:20 +0200 | [diff] [blame] | 146 | --test : test compressed file integrity |
| ne-sted | 50aea2f | 2018-01-24 14:59:44 +0200 | [diff] [blame] | 147 | --[no-]sparse : sparse mode (default: disabled) |
| Yann Collet | 710497d | 2017-05-02 17:18:24 -0700 | [diff] [blame] | 148 | -M# : Set a memory usage limit for decompression |
| 149 | -- : All arguments after "--" are treated as files |
| inikep | ab2f770 | 2016-08-25 10:07:20 +0200 | [diff] [blame] | 150 | |
| 151 | Dictionary builder : |
| 152 | --train ## : create a dictionary from a training set of files |
| Jennifer Liu | 8afcb8e | 2018-07-01 19:59:37 -0700 | [diff] [blame] | 153 | --train-cover[=k=#,d=#,steps=#,split=#] : use the cover algorithm with optional args |
| Yann Collet | 710497d | 2017-05-02 17:18:24 -0700 | [diff] [blame] | 154 | --train-legacy[=s=#] : use the legacy algorithm with selectivity (default: 9) |
| inikep | ab2f770 | 2016-08-25 10:07:20 +0200 | [diff] [blame] | 155 | -o file : `file` is dictionary name (default: dictionary) |
| ne-sted | 50aea2f | 2018-01-24 14:59:44 +0200 | [diff] [blame] | 156 | --maxdict=# : limit dictionary to specified size (default: 112640) |
| Yann Collet | 710497d | 2017-05-02 17:18:24 -0700 | [diff] [blame] | 157 | --dictID=# : force dictionary ID to specified value (default: random) |
| inikep | ab2f770 | 2016-08-25 10:07:20 +0200 | [diff] [blame] | 158 | |
| 159 | Benchmark arguments : |
| ne-sted | 50aea2f | 2018-01-24 14:59:44 +0200 | [diff] [blame] | 160 | -b# : benchmark file(s), using # compression level (default: 3) |
| inikep | ab2f770 | 2016-08-25 10:07:20 +0200 | [diff] [blame] | 161 | -e# : test all compression levels from -bX to # (default: 1) |
| ne-sted | 50aea2f | 2018-01-24 14:59:44 +0200 | [diff] [blame] | 162 | -i# : minimum evaluation time in seconds (default: 3s) |
| inikep | ab2f770 | 2016-08-25 10:07:20 +0200 | [diff] [blame] | 163 | -B# : cut file into independent blocks of size # (default: no block) |
| Yann Collet | 710497d | 2017-05-02 17:18:24 -0700 | [diff] [blame] | 164 | --priority=rt : set process priority to real-time |
| 165 | ``` |
| Stella Lau | 8c33cfe | 2017-09-06 11:03:35 -0700 | [diff] [blame] | 166 | |
| 167 | |
| 168 | #### Long distance matching mode |
| 169 | The long distance matching mode, enabled with `--long`, is designed to improve |
| 170 | the compression ratio for files with long matches at a large distance (up to the |
| Yann Collet | cb5eba8 | 2018-01-19 11:26:35 -0800 | [diff] [blame] | 171 | maximum window size, `128 MiB`) while still maintaining compression speed. |
| Stella Lau | 8c33cfe | 2017-09-06 11:03:35 -0700 | [diff] [blame] | 172 | |
| 173 | Enabling this mode sets the window size to `128 MiB` and thus increases the memory |
| 174 | usage for both the compressor and decompressor. Performance in terms of speed is |
| 175 | dependent on long matches being found. Compression speed may degrade if few long |
| 176 | matches are found. Decompression speed usually improves when there are many long |
| 177 | distance matches. |
| 178 | |
| 179 | Below are graphs comparing the compression speed, compression ratio, and |
| 180 | decompression speed with and without long distance matching on an ideal use |
| 181 | case: a tar of four versions of clang (versions `3.4.1`, `3.4.2`, `3.5.0`, |
| 182 | `3.5.1`) with a total size of `244889600 B`. This is an ideal use case as there |
| 183 | are many long distance matches within the maximum window size of `128 MiB` (each |
| Yann Collet | cb5eba8 | 2018-01-19 11:26:35 -0800 | [diff] [blame] | 184 | version is less than `128 MiB`). |
| Stella Lau | 8c33cfe | 2017-09-06 11:03:35 -0700 | [diff] [blame] | 185 | |
| 186 | Compression Speed vs Ratio | Decompression Speed |
| 187 | ---------------------------|--------------------- |
| Yann Collet | 51e71a5 | 2018-08-09 12:28:25 -0700 | [diff] [blame^] | 188 |  |  |
| Stella Lau | 8c33cfe | 2017-09-06 11:03:35 -0700 | [diff] [blame] | 189 | |
| 190 | | Method | Compression ratio | Compression speed | Decompression speed | |
| 191 | |:-------|------------------:|-------------------------:|---------------------------:| |
| 192 | | `zstd -1` | `5.065` | `284.8 MB/s` | `759.3 MB/s` | |
| 193 | | `zstd -5` | `5.826` | `124.9 MB/s` | `674.0 MB/s` | |
| 194 | | `zstd -10` | `6.504` | `29.5 MB/s` | `771.3 MB/s` | |
| 195 | | `zstd -1 --long` | `17.426` | `220.6 MB/s` | `1638.4 MB/s` | |
| 196 | | `zstd -5 --long` | `19.661` | `165.5 MB/s` | `1530.6 MB/s`| |
| 197 | | `zstd -10 --long`| `21.949` | `75.6 MB/s` | `1632.6 MB/s`| |
| 198 | |
| 199 | On this file, the compression ratio improves significantly with minimal impact |
| 200 | on compression speed, and the decompression speed doubles. |
| 201 | |
| 202 | On the other extreme, compressing a file with few long distance matches (such as |
| 203 | the [Silesia compression corpus]) will likely lead to a deterioration in |
| 204 | compression speed (for lower levels) with minimal change in compression ratio. |
| 205 | |
| 206 | The below table illustrates this on the [Silesia compression corpus]. |
| 207 | |
| 208 | [Silesia compression corpus]: http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia |
| 209 | |
| 210 | | Method | Compression ratio | Compression speed | Decompression speed | |
| Yann Collet | 51e71a5 | 2018-08-09 12:28:25 -0700 | [diff] [blame^] | 211 | |:-------|------------------:|------------------:|---------------------:| |
| 212 | | `zstd -1` | `2.878` | `231.7 MB/s` | `594.4 MB/s` | |
| 213 | | `zstd -1 --long` | `2.929` | `106.5 MB/s` | `517.9 MB/s` | |
| 214 | | `zstd -5` | `3.274` | `77.1 MB/s` | `464.2 MB/s` | |
| 215 | | `zstd -5 --long` | `3.319` | `51.7 MB/s` | `371.9 MB/s` | |
| 216 | | `zstd -10` | `3.523` | `16.4 MB/s` | `489.2 MB/s` | |
| 217 | | `zstd -10 --long`| `3.566` | `16.2 MB/s` | `415.7 MB/s` | |
| 218 | |
| 219 | |
| 220 | #### zstdgrep |
| 221 | |
| 222 | `zstdgrep` is a utility which makes it possible to `grep` directly a `.zst` compressed file. |
| 223 | It's used the same way as normal `grep`, for example : |
| 224 | `zstdgrep pattern file.zst` |
| 225 | |
| 226 | `zstdgrep` is _not_ compatible with dictionary compression. |
| 227 | |
| 228 | To search into a file compressed with a dictionary, |
| 229 | it's necessary to decompress it using `zstd` or `zstdcat`, |
| 230 | and then pipe the result to `grep`. For example : |
| 231 | `zstdcat -D dictionary -qc -- file.zst | grep pattern` |