Blame - zstd_compression_format.md - external_zstd

blob: 13c4ace18d5cd3c73552d5dc605fa15c5cf47479 [file] [log] [blame] [view]

Yann Collet	5cc1882	2016-07-03 19:03:13 +0200	[diff] [blame]	1	Zstandard Compression Format
				2	============================
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	3
				4	### Notices
				5
				6	Copyright (c) 2016 Yann Collet
				7
				8	Permission is granted to copy and distribute this document
				9	for any purpose and without charge,
				10	including translations into other languages
				11	and incorporation into compilations,
				12	provided that the copyright notice and this notice are preserved,
				13	and that any substantive changes or deletions from the original
				14	are clearly marked.
				15	Distribution of this document is unlimited.
				16
				17	### Version
				18
Yann Collet	f6ff53c	2016-07-15 17:03:38 +0200	[diff] [blame]	19	0.1.1 (15/07/16)
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	20
				21
				22	Introduction
				23	------------
				24
				25	The purpose of this document is to define a lossless compressed data format,
				26	that is independent of CPU type, operating system,
				27	file system and character set, suitable for
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	28	file compression, pipe and streaming compression,
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	29	using the [Zstandard algorithm](http://www.zstandard.org).
				30
				31	The data can be produced or consumed,
				32	even for an arbitrarily long sequentially presented input data stream,
				33	using only an a priori bounded amount of intermediate storage,
				34	and hence can be used in data communications.
				35	The format uses the Zstandard compression method,
				36	and optional [xxHash-64 checksum method](http://www.xxhash.org),
				37	for detection of data corruption.
				38
				39	The data format defined by this specification
				40	does not attempt to allow random access to compressed data.
				41
				42	This specification is intended for use by implementers of software
				43	to compress data into Zstandard format and/or decompress data from Zstandard format.
				44	The text of the specification assumes a basic background in programming
				45	at the level of bits and other primitive data representations.
				46
				47	Unless otherwise indicated below,
				48	a compliant compressor must produce data sets
				49	that conform to the specifications presented here.
				50	It doesn’t need to support all options though.
				51
				52	A compliant decompressor must be able to decompress
				53	at least one working set of parameters
				54	that conforms to the specifications presented here.
				55	It may also ignore informative fields, such as checksum.
				56	Whenever it does not support a parameter defined in the compressed stream,
				57	it must produce a non-ambiguous error code and associated error message
				58	explaining which parameter is unsupported.
				59
				60
				61	Definitions
				62	-----------
				63	A content compressed by Zstandard is transformed into a Zstandard __frame__.
				64	Multiple frames can be appended into a single file or stream.
				65	A frame is totally independent, has a defined beginning and end,
				66	and a set of parameters which tells the decoder how to decompress it.
				67
				68	A frame encapsulates one or multiple __blocks__.
				69	Each block can be compressed or not,
				70	and has a guaranteed maximum content size, which depends on frame parameters.
				71	Unlike frames, each block depends on previous blocks for proper decoding.
				72	However, each block can be decompressed without waiting for its successor,
				73	allowing streaming operations.
				74
				75
				76	General Structure of Zstandard Frame format
				77	-------------------------------------------
				78
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	79	\| MagicNb \| Frame Header \| Block \| (More blocks) \| EndMark \|
				80	\|:-------:\|:-------------:\| ----- \| ------------- \| ------- \|
				81	\| 4 bytes \| 2-14 bytes \| \| \| 3 bytes \|
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	82
				83	__Magic Number__
				84
				85	4 Bytes, Little endian format.
				86	Value : 0xFD2FB527
				87
				88	__Frame Header__
				89
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	90	2 to 14 Bytes, detailed in [next part](#frame-header).
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	91
				92	__Data Blocks__
				93
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	94	Detailed in [next chapter](#data-blocks).
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	95	That’s where compressed data is stored.
				96
				97	__EndMark__
				98
				99	The flow of blocks ends when the last block header brings an _end signal_ .
				100	This last block header may optionally host a __Content Checksum__ .
				101
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	102	##### __Content Checksum__
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	103
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	104	Content Checksum verify that frame content has been regenerated correctly.
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	105	The content checksum is the result
				106	of [xxh64() hash function](https://www.xxHash.com)
				107	digesting the original (decoded) data as input, and a seed of zero.
				108	Bits from 11 to 32 (included) are extracted to form a 22 bits checksum
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	109	stored into the endmark body.
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	110	```
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	111	mask22bits = (1<<22)-1;
				112	contentChecksum = (XXH64(content, size, 0) >> 11) & mask22bits;
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	113	```
				114	Content checksum is only present when its associated flag
				115	is set in the frame descriptor.
				116	Its usage is optional.
				117
				118	__Frame Concatenation__
				119
				120	In some circumstances, it may be required to append multiple frames,
				121	for example in order to add new data to an existing compressed file
				122	without re-framing it.
				123
				124	In such case, each frame brings its own set of descriptor flags.
				125	Each frame is considered independent.
				126	The only relation between frames is their sequential order.
				127
				128	The ability to decode multiple concatenated frames
				129	within a single stream or file is left outside of this specification.
				130	As an example, the reference `zstd` command line utility is able
				131	to decode all concatenated frames in their sequential order,
				132	delivering the final decompressed result as if it was a single content.
				133
				134
				135	Frame Header
				136	-------------
				137
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	138	\| FHD \| (WD) \| (dictID) \| (Content Size) \|
				139	\| ------- \| --------- \| --------- \|:--------------:\|
				140	\| 1 byte \| 0-1 byte \| 0-4 bytes \| 0 - 8 bytes \|
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	141
				142	Frame header has a variable size, which uses a minimum of 2 bytes,
				143	and up to 14 bytes depending on optional parameters.
				144
				145	__FHD byte__ (Frame Header Descriptor)
				146
				147	The first Header's byte is called the Frame Header Descriptor.
				148	It tells which other fields are present.
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	149	Decoding this byte is enough to tell the size of Frame Header.
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	150
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	151	\| BitNb \| 7-6 \| 5 \| 4 \| 3 \| 2 \| 1-0 \|
				152	\| ------- \| ------ \| ------- \| ------ \| -------- \| -------- \| ------ \|
				153	\|FieldName\| FCSize \| Segment \| Unused \| Reserved \| Checksum \| dictID \|
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	154
				155	In this table, bit 7 is highest bit, while bit 0 is lowest.
				156
				157	__Frame Content Size flag__
				158
				159	This is a 2-bits flag (`= FHD >> 6`),
				160	specifying if decompressed data size is provided within the header.
				161
				162	\| Value \| 0 \| 1 \| 2 \| 3 \|
				163	\| ------- \| --- \| --- \| --- \| --- \|
				164	\|FieldSize\| 0-1 \| 2 \| 4 \| 8 \|
				165
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	166	Value 0 meaning depends on _single segment_ mode :
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	167	it either means `0` (size not provided) _if_ the `WD` byte is present,
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	168	or `1` (frame content size <= 255 bytes) otherwise.
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	169
				170	__Single Segment__
				171
				172	If this flag is set,
				173	data shall be regenerated within a single continuous memory segment.
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	174
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	175	In which case, `WD` byte __is not present__,
				176	but `Frame Content Size` field necessarily is.
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	177	As a consequence, the decoder must allocate a memory segment
				178	of size `>= Frame Content Size`.
				179
				180	In order to preserve the decoder from unreasonable memory requirement,
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	181	a decoder can reject a compressed frame
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	182	which requests a memory size beyond decoder's authorized range.
				183
				184	For broader compatibility, decoders are recommended to support
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	185	memory sizes of at least 8 MB.
				186	This is just a recommendation,
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	187	each decoder is free to support higher or lower limits,
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	188	depending on local limitations.
				189
				190	__Unused bit__
				191
Yann Collet	f0bc673	2016-07-13 17:30:21 +0200	[diff] [blame]	192	The value of this bit should be set to zero.
				193	A decoder compliant with this specification version should not interpret it.
				194	It might be used in a future version,
				195	to signal a property which is not mandatory to properly decode the frame.
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	196
				197	__Reserved bit__
				198
				199	This bit is reserved for some future feature.
				200	Its value _must be zero_.
				201	A decoder compliant with this specification version must ensure it is not set.
				202	This bit may be used in a future revision,
				203	to signal a feature that must be interpreted in order to decode the frame.
				204
				205	__Content checksum flag__
				206
				207	If this flag is set, a content checksum will be present into the EndMark.
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	208	The checksum is a 22 bits value extracted from the XXH64() of data,
				209	and stored into endMark. See [__Content Checksum__](#content-checksum) .
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	210
				211	__Dictionary ID flag__
				212
				213	This is a 2-bits flag (`= FHD & 3`),
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	214	telling if a dictionary ID is provided within the header.
				215	It also specifies the size of this field.
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	216
				217	\| Value \| 0 \| 1 \| 2 \| 3 \|
				218	\| ------- \| --- \| --- \| --- \| --- \|
				219	\|FieldSize\| 0 \| 1 \| 2 \| 4 \|
				220
				221	__WD byte__ (Window Descriptor)
				222
				223	Provides guarantees on maximum back-reference distance
				224	that will be present within compressed data.
				225	This information is useful for decoders to allocate enough memory.
				226
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	227	`WD` byte is optional. It's not present in `single segment` mode.
				228	In which case, the maximum back-reference distance is the content size itself,
				229	which can be any value from 1 to 2^64-1 bytes (16 EB).
				230
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	231	\| BitNb \| 7-3 \| 0-2 \|
				232	\| --------- \| -------- \| -------- \|
				233	\| FieldName \| Exponent \| Mantissa \|
				234
				235	Maximum distance is given by the following formulae :
				236	```
				237	windowLog = 10 + Exponent;
				238	windowBase = 1 << windowLog;
				239	windowAdd = (windowBase / 8) * Mantissa;
				240	windowSize = windowBase + windowAdd;
				241	```
				242	The minimum window size is 1 KB.
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	243	The maximum size is `15*(1<<38)` bytes, which is 1.875 TB.
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	244
				245	To properly decode compressed data,
				246	a decoder will need to allocate a buffer of at least `windowSize` bytes.
				247
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	248	In order to preserve decoder from unreasonable memory requirements,
				249	a decoder can refuse a compressed frame
				250	which requests a memory size beyond decoder's authorized range.
				251
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	252	For improved interoperability,
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	253	decoders are recommended to be compatible with window sizes of 8 MB.
				254	Encoders are recommended to not request more than 8 MB.
				255	It's merely a recommendation though,
				256	decoders are free to support larger or lower limits,
				257	depending on local limitations.
				258
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	259	__Dictionary ID__
				260
Yann Collet	f6ff53c	2016-07-15 17:03:38 +0200	[diff] [blame]	261	This is a variable size field, which contains
				262	the ID of the dictionary required to properly decode the frame.
				263	Note that this field is optional. When it's not present,
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	264	it's up to the caller to make sure it uses the correct dictionary.
				265
				266	Field size depends on __Dictionary ID flag__.
				267	1 byte can represent an ID 0-255.
				268	2 bytes can represent an ID 0-65535.
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	269	4 bytes can represent an ID 0-4294967295.
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	270
				271	It's allowed to represent a small ID (for example `13`)
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	272	with a large 4-bytes dictionary ID, losing some compacity in the process.
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	273
Yann Collet	f6ff53c	2016-07-15 17:03:38 +0200	[diff] [blame]	274	_Reserved ranges :_
				275	If the frame is going to be distributed in a private environment,
				276	any dictionary ID can be used.
				277	However, for public distribution of compressed frames using a dictionary,
				278	some ranges are reserved for future use :
				279	- low : 1 - 32767 : reserved
				280	- high : >= (2^31) : reserved
				281
				282
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	283	__Frame Content Size__
				284
				285	This is the original (uncompressed) size.
				286	This information is optional, and only present if associated flag is set.
				287	Content size is provided using 1, 2, 4 or 8 Bytes.
				288	Format is Little endian.
				289
				290	\| Field Size \| Range \|
				291	\| ---------- \| ---------- \|
				292	\| 0 \| 0 \|
				293	\| 1 \| 0 - 255 \|
				294	\| 2 \| 256 - 65791\|
				295	\| 4 \| 0 - 2^32-1 \|
				296	\| 8 \| 0 - 2^64-1 \|
				297
				298	When field size is 1, 4 or 8 bytes, the value is read directly.
				299	When field size is 2, _an offset of 256 is added_.
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	300	It's allowed to represent a small size (ex: `18`) using any compatible variant.
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	301	A size of `0` means `content size is unknown`.
				302	In which case, the `WD` byte will necessarily be present,
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	303	and becomes the only hint to guide memory allocation.
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	304
				305	In order to preserve decoder from unreasonable memory requirement,
				306	a decoder can refuse a compressed frame
				307	which requests a memory size beyond decoder's authorized range.
				308
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	309
				310	Data Blocks
				311	-----------
				312
				313	\| B. Header \| data \|
				314	\|:---------:\| ------ \|
				315	\| 3 bytes \| \|
				316
				317
				318	__Block Header__
				319
				320	This field uses 3-bytes, format is __big-endian__.
				321
				322	The 2 highest bits represent the `block type`,
				323	while the remaining 22 bits represent the (compressed) block size.
				324
				325	There are 4 block types :
				326
				327	\| Value \| 0 \| 1 \| 2 \| 3 \|
				328	\| ---------- \| ---------- \| --- \| --- \| ------- \|
				329	\| Block Type \| Compressed \| Raw \| RLE \| EndMark \|
				330
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	331	- Compressed : this is a [Zstandard compressed block](#compressed-block-format),
				332	detailed in another section of this specification.
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	333	"block size" is the compressed size.
				334	Decompressed size is unknown,
				335	but its maximum possible value is guaranteed (see below)
				336	- Raw : this is an uncompressed block.
				337	"block size" is the number of bytes to read and copy.
				338	- RLE : this is a single byte, repeated N times.
				339	In which case, "block size" is the size to regenerate,
				340	while the "compressed" block is just 1 byte (the byte to repeat).
				341	- EndMark : this is not a block. Signal the end of the frame.
				342	The rest of the field may be optionally filled by a checksum
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	343	(see [Content Checksum](#content-checksum)).
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	344
				345	Block sizes must respect a few rules :
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	346	- In compressed mode, compressed size if always strictly `< decompressed size`.
				347	- Block decompressed size is always <= maximum back-reference distance .
				348	- Block decompressed size is always <= 128 KB
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	349
				350
				351	__Data__
				352
				353	Where the actual data to decode stands.
				354	It might be compressed or not, depending on previous field indications.
				355	A data block is not necessarily "full" :
				356	since an arbitrary “flush” may happen anytime,
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	357	block decompressed content can be any size,
				358	up to Block Maximum Decompressed Size, which is the smallest of :
				359	- Maximum back-reference distance
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	360	- 128 KB
				361
				362
				363	Skippable Frames
				364	----------------
				365
				366	\| Magic Number \| Frame Size \| User Data \|
				367	\|:------------:\|:----------:\| --------- \|
				368	\| 4 bytes \| 4 bytes \| \|
				369
				370	Skippable frames allow the insertion of user-defined data
				371	into a flow of concatenated frames.
				372	Its design is pretty straightforward,
				373	with the sole objective to allow the decoder to quickly skip
				374	over user-defined data and continue decoding.
				375
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	376	Skippable frames defined in this specification are compatible with [LZ4] ones.
				377
				378	[LZ4]:http://www.lz4.org
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	379
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	380	__Magic Number__ :
				381
				382	4 Bytes, Little endian format.
				383	Value : 0x184D2A5X, which means any value from 0x184D2A50 to 0x184D2A5F.
				384	All 16 values are valid to identify a skippable frame.
				385
				386	__Frame Size__ :
				387
				388	This is the size, in bytes, of the following User Data
				389	(without including the magic number nor the size field itself).
				390	4 Bytes, Little endian format, unsigned 32-bits.
				391	This means User Data can’t be bigger than (2^32-1) Bytes.
				392
				393	__User Data__ :
				394
				395	User Data can be anything. Data will just be skipped by the decoder.
				396
				397
				398	Compressed block format
				399	-----------------------
				400	This specification details the content of a _compressed block_.
Yann Collet	00d44ab	2016-07-04 01:29:47 +0200	[diff] [blame]	401	A compressed block has a size, which must be known.
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	402	It also has a guaranteed maximum regenerated size,
				403	in order to properly allocate destination buffer.
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	404	See [Data Blocks](#data-blocks) for more details.
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	405
				406	A compressed block consists of 2 sections :
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	407	- [Literals section](#literals-section)
				408	- [Sequences section](#sequences-section)
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	409
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	410	### Prerequisites
				411	To decode a compressed block, the following elements are necessary :
Yann Collet	698cb63	2016-07-03 18:49:35 +0200	[diff] [blame]	412	- Previous decoded blocks, up to a distance of `windowSize`,
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	413	or all previous blocks in "single segment" mode.
Yann Collet	698cb63	2016-07-03 18:49:35 +0200	[diff] [blame]	414	- List of "recent offsets" from previous compressed block.
Yann Collet	00d44ab	2016-07-04 01:29:47 +0200	[diff] [blame]	415	- Decoding tables of previous compressed block for each symbol type
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	416	(literals, litLength, matchLength, offset).
Yann Collet	698cb63	2016-07-03 18:49:35 +0200	[diff] [blame]	417
				418
Yann Collet	00d44ab	2016-07-04 01:29:47 +0200	[diff] [blame]	419	### Literals section
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	420
Yann Collet	c40ba71	2016-07-08 15:39:02 +0200	[diff] [blame]	421	Literals are compressed using Huffman prefix codes.
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	422	During sequence phase, literals will be entangled with match copy operations.
				423	All literals are regrouped in the first part of the block.
				424	They can be decoded first, and then copied during sequence operations,
Yann Collet	00d44ab	2016-07-04 01:29:47 +0200	[diff] [blame]	425	or they can be decoded on the flow, as needed by sequence commands.
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	426
				427	\| Header \| (Tree Description) \| Stream1 \| (Stream2) \| (Stream3) \| (Stream4) \|
				428	\| ------ \| ------------------ \| ------- \| --------- \| --------- \| --------- \|
				429
				430	Literals can be compressed, or uncompressed.
				431	When compressed, an optional tree description can be present,
				432	followed by 1 or 4 streams.
				433
Yann Collet	00d44ab	2016-07-04 01:29:47 +0200	[diff] [blame]	434	#### Literals section header
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	435
Yann Collet	00d44ab	2016-07-04 01:29:47 +0200	[diff] [blame]	436	Header is in charge of describing how literals are packed.
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	437	It's a byte-aligned variable-size bitfield, ranging from 1 to 5 bytes,
				438	using big-endian convention.
				439
				440	\| BlockType \| sizes format \| (compressed size) \| regenerated size \|
				441	\| --------- \| ------------ \| ----------------- \| ---------------- \|
				442	\| 2 bits \| 1 - 2 bits \| 0 - 18 bits \| 5 - 20 bits \|
				443
				444	__Block Type__ :
				445
				446	This is a 2-bits field, describing 4 different block types :
				447
				448	\| Value \| 0 \| 1 \| 2 \| 3 \|
				449	\| ---------- \| ---------- \| ------ \| --- \| ------- \|
				450	\| Block Type \| Compressed \| Repeat \| Raw \| RLE \|
				451
				452	- Compressed : This is a standard huffman-compressed block,
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	453	starting with a huffman tree description.
				454	See details below.
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	455	- Repeat Stats : This is a huffman-compressed block,
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	456	using huffman tree _from previous huffman-compressed literals block_.
				457	Huffman tree description will be skipped.
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	458	- Raw : Literals are stored uncompressed.
				459	- RLE : Literals consist of a single byte value repeated N times.
				460
				461	__Sizes format__ :
				462
				463	Sizes format are divided into 2 families :
				464
				465	- For compressed block, it requires to decode both the compressed size
				466	and the decompressed size. It will also decode the number of streams.
				467	- For Raw or RLE blocks, it's enough to decode the size to regenerate.
				468
				469	For values spanning several bytes, convention is Big-endian.
				470
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	471	__Sizes format for Raw or RLE literals block__ :
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	472
				473	- Value : 0x : Regenerated size uses 5 bits (0-31).
				474	Total literal header size is 1 byte.
				475	`size = h[0] & 31;`
				476	- Value : 10 : Regenerated size uses 12 bits (0-4095).
				477	Total literal header size is 2 bytes.
				478	`size = ((h[0] & 15) << 8) + h[1];`
				479	- Value : 11 : Regenerated size uses 20 bits (0-1048575).
Yann Collet	d916c90	2016-07-04 00:42:58 +0200	[diff] [blame]	480	Total literal header size is 3 bytes.
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	481	`size = ((h[0] & 15) << 16) + (h[1]<<8) + h[2];`
				482
				483	Note : it's allowed to represent a short value (ex : `13`)
				484	using a long format, accepting the reduced compacity.
				485
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	486	__Sizes format for Compressed literals block__ :
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	487
				488	Note : also applicable to "repeat-stats" blocks.
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	489	- Value : 00 : 4 streams.
				490	Compressed and regenerated sizes use 10 bits (0-1023).
				491	Total literal header size is 3 bytes.
				492	- Value : 01 : _Single stream_.
				493	Compressed and regenerated sizes use 10 bits (0-1023).
				494	Total literal header size is 3 bytes.
				495	- Value : 10 : 4 streams.
				496	Compressed and regenerated sizes use 14 bits (0-16383).
				497	Total literal header size is 4 bytes.
				498	- Value : 10 : 4 streams.
				499	Compressed and regenerated sizes use 18 bits (0-262143).
				500	Total literal header size is 5 bytes.
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	501
Yann Collet	698cb63	2016-07-03 18:49:35 +0200	[diff] [blame]	502	Compressed and regenerated size fields follow big endian convention.
				503
				504	#### Huffman Tree description
				505
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	506	This section is only present when literals block type is `Compressed` (`0`).
Yann Collet	00d44ab	2016-07-04 01:29:47 +0200	[diff] [blame]	507
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	508	Prefix coding represents symbols from an a priori known alphabet
Yann Collet	b21e9cb	2016-07-15 17:31:13 +0200	[diff] [blame]	509	by bit sequences (codewords), one codeword for each symbol,
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	510	in a manner such that different symbols may be represented
				511	by bit sequences of different lengths,
				512	but a parser can always parse an encoded string
				513	unambiguously symbol-by-symbol.
Yann Collet	00d44ab	2016-07-04 01:29:47 +0200	[diff] [blame]	514
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	515	Given an alphabet with known symbol frequencies,
				516	the Huffman algorithm allows the construction of an optimal prefix code
				517	using the fewest bits of any possible prefix codes for that alphabet.
Yann Collet	00d44ab	2016-07-04 01:29:47 +0200	[diff] [blame]	518
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	519	Prefix code must not exceed a maximum code length.
Yann Collet	00d44ab	2016-07-04 01:29:47 +0200	[diff] [blame]	520	More bits improve accuracy but cost more header size,
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	521	and require more memory for decoding operations.
Yann Collet	00d44ab	2016-07-04 01:29:47 +0200	[diff] [blame]	522
				523	The current format limits the maximum depth to 15 bits.
Yann Collet	b21e9cb	2016-07-15 17:31:13 +0200	[diff] [blame]	524	The reference decoder goes further, by limiting it to 12 bits.
Yann Collet	00d44ab	2016-07-04 01:29:47 +0200	[diff] [blame]	525	It is recommended to remain compatible with reference decoder.
				526
Yann Collet	698cb63	2016-07-03 18:49:35 +0200	[diff] [blame]	527
				528	##### Representation
				529
Yann Collet	00d44ab	2016-07-04 01:29:47 +0200	[diff] [blame]	530	All literal values from zero (included) to last present one (excluded)
Yann Collet	698cb63	2016-07-03 18:49:35 +0200	[diff] [blame]	531	are represented by `weight` values, from 0 to `maxBits`.
				532	Transformation from `weight` to `nbBits` follows this formulae :
				533	`nbBits = weight ? maxBits + 1 - weight : 0;` .
				534	The last symbol's weight is deduced from previously decoded ones,
				535	by completing to the nearest power of 2.
				536	This power of 2 gives `maxBits`, the depth of the current tree.
				537
				538	__Example__ :
				539	Let's presume the following huffman tree must be described :
				540
Yann Collet	d916c90	2016-07-04 00:42:58 +0200	[diff] [blame]	541	\| literal \| 0 \| 1 \| 2 \| 3 \| 4 \| 5 \|
				542	\| ------- \| --- \| --- \| --- \| --- \| --- \| --- \|
				543	\| nbBits \| 1 \| 2 \| 3 \| 0 \| 4 \| 4 \|
Yann Collet	698cb63	2016-07-03 18:49:35 +0200	[diff] [blame]	544
				545	The tree depth is 4, since its smallest element uses 4 bits.
				546	Value `5` will not be listed, nor will values above `5`.
				547	Values from `0` to `4` will be listed using `weight` instead of `nbBits`.
				548	Weight formula is : `weight = nbBits ? maxBits + 1 - nbBits : 0;`
				549	It gives the following serie of weights :
				550
Yann Collet	d916c90	2016-07-04 00:42:58 +0200	[diff] [blame]	551	\| weights \| 4 \| 3 \| 2 \| 0 \| 1 \|
				552	\| ------- \| --- \| --- \| --- \| --- \| --- \|
				553	\| literal \| 0 \| 1 \| 2 \| 3 \| 4 \|
Yann Collet	698cb63	2016-07-03 18:49:35 +0200	[diff] [blame]	554
				555	The decoder will do the inverse operation :
Yann Collet	d916c90	2016-07-04 00:42:58 +0200	[diff] [blame]	556	having collected weights of literals from `0` to `4`,
				557	it knows the last literal, `5`, is present with a non-zero weight.
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	558	The weight of `5` can be deducted by joining to the nearest power of 2.
Yann Collet	698cb63	2016-07-03 18:49:35 +0200	[diff] [blame]	559	Sum of 2^(weight-1) (excluding 0) is :
Yann Collet	d916c90	2016-07-04 00:42:58 +0200	[diff] [blame]	560	`8 + 4 + 2 + 0 + 1 = 15`
Yann Collet	698cb63	2016-07-03 18:49:35 +0200	[diff] [blame]	561	Nearest power of 2 is 16.
				562	Therefore, `maxBits = 4` and `weight[5] = 1`.
Yann Collet	698cb63	2016-07-03 18:49:35 +0200	[diff] [blame]	563
				564	##### Huffman Tree header
				565
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	566	This is a single byte value (0-255),
				567	which tells how to decode the list of weights.
Yann Collet	698cb63	2016-07-03 18:49:35 +0200	[diff] [blame]	568
				569	- if headerByte >= 242 : this is one of 14 pre-defined weight distributions :
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	570
				571	\| value \|242\|243\|244\|245\|246\|247\|248\|249\|250\|251\|252\|253\|254\|255\|
Yann Collet	e0ce5b0	2016-07-06 01:50:44 +0200	[diff] [blame]	572	\| -------- \|---\|---\|---\|---\|---\|---\|---\|---\|---\|---\|---\|---\|---\|---\|
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	573	\| Nb of 1s \| 1 \| 2 \| 3 \| 4 \| 7 \| 8 \| 15\| 16\| 31\| 32\| 63\| 64\|127\|128\|
				574	\|Complement\| 1 \| 2 \| 1 \| 4 \| 1 \| 8 \| 1 \| 16\| 1 \| 32\| 1 \| 64\| 1 \|128\|
				575
Yann Collet	26f6814	2016-07-08 10:42:59 +0200	[diff] [blame]	576	_Note_ : complement is found by using "join to nearest power of 2" rule.
Yann Collet	698cb63	2016-07-03 18:49:35 +0200	[diff] [blame]	577
				578	- if headerByte >= 128 : this is a direct representation,
				579	where each weight is written directly as a 4 bits field (0-15).
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	580	The full representation occupies `((nbSymbols+1)/2)` bytes,
Yann Collet	698cb63	2016-07-03 18:49:35 +0200	[diff] [blame]	581	meaning it uses a last full byte even if nbSymbols is odd.
Yann Collet	26f6814	2016-07-08 10:42:59 +0200	[diff] [blame]	582	`nbSymbols = headerByte - 127;`.
				583	Note that maximum nbSymbols is 241-127 = 114.
				584	A larger serie must necessarily use FSE compression.
Yann Collet	698cb63	2016-07-03 18:49:35 +0200	[diff] [blame]	585
				586	- if headerByte < 128 :
				587	the serie of weights is compressed by FSE.
Yann Collet	26f6814	2016-07-08 10:42:59 +0200	[diff] [blame]	588	The length of the FSE-compressed serie is `headerByte` (0-127).
Yann Collet	698cb63	2016-07-03 18:49:35 +0200	[diff] [blame]	589
				590	##### FSE (Finite State Entropy) compression of huffman weights
				591
Yann Collet	26f6814	2016-07-08 10:42:59 +0200	[diff] [blame]	592	The serie of weights is compressed using FSE compression.
Yann Collet	698cb63	2016-07-03 18:49:35 +0200	[diff] [blame]	593	It's a single bitstream with 2 interleaved states,
Yann Collet	26f6814	2016-07-08 10:42:59 +0200	[diff] [blame]	594	sharing a single distribution table.
Yann Collet	698cb63	2016-07-03 18:49:35 +0200	[diff] [blame]	595
				596	To decode an FSE bitstream, it is necessary to know its compressed size.
				597	Compressed size is provided by `headerByte`.
Yann Collet	26f6814	2016-07-08 10:42:59 +0200	[diff] [blame]	598	It's also necessary to know its maximum decompressed size,
				599	which is `255`, since literal values span from `0` to `255`,
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	600	and last symbol value is not represented.
Yann Collet	698cb63	2016-07-03 18:49:35 +0200	[diff] [blame]	601
				602	An FSE bitstream starts by a header, describing probabilities distribution.
Yann Collet	00d44ab	2016-07-04 01:29:47 +0200	[diff] [blame]	603	It will create a Decoding Table.
Yann Collet	26f6814	2016-07-08 10:42:59 +0200	[diff] [blame]	604	Table must be pre-allocated, which requires to support a maximum accuracy.
				605	For a list of huffman weights, recommended maximum is 7 bits.
Yann Collet	698cb63	2016-07-03 18:49:35 +0200	[diff] [blame]	606
Yann Collet	26f6814	2016-07-08 10:42:59 +0200	[diff] [blame]	607	FSE header is [described in relevant chapter](#fse-distribution-table--condensed-format),
				608	and so is [FSE bitstream](#bitstream).
				609	The main difference is that Huffman header compression uses 2 states,
				610	which share the same FSE distribution table.
				611	Bitstream contains only FSE symbols, there are no interleaved "raw bitfields".
				612	The number of symbols to decode is discovered
				613	by tracking bitStream overflow condition.
				614	When both states have overflowed the bitstream, end is reached.
				615
Yann Collet	698cb63	2016-07-03 18:49:35 +0200	[diff] [blame]	616
				617	##### Conversion from weights to huffman prefix codes
				618
Yann Collet	d916c90	2016-07-04 00:42:58 +0200	[diff] [blame]	619	All present symbols shall now have a `weight` value.
Yann Collet	d916c90	2016-07-04 00:42:58 +0200	[diff] [blame]	620	Symbols are sorted by weight.
Yann Collet	b21e9cb	2016-07-15 17:31:13 +0200	[diff] [blame]	621	Symbols with a weight of zero are removed.
Yann Collet	d916c90	2016-07-04 00:42:58 +0200	[diff] [blame]	622	Within same weight, symbols keep natural order.
				623	Starting from lowest weight,
Yann Collet	b21e9cb	2016-07-15 17:31:13 +0200	[diff] [blame]	624	symbols are being allocated to a `range`.
				625	A `weight` directly represents a `range`,
				626	following the formulae : `range = weight ? 1 << (weight-1) : 0 ;`
				627	Similarly, it is possible to transform weights into nbBits :
Yann Collet	d916c90	2016-07-04 00:42:58 +0200	[diff] [blame]	628	`nbBits = nbBits ? maxBits + 1 - weight : 0;` .
				629
				630
				631	__Example__ :
Yann Collet	b21e9cb	2016-07-15 17:31:13 +0200	[diff] [blame]	632	Let's presume the following list of weights has been decoded :
Yann Collet	d916c90	2016-07-04 00:42:58 +0200	[diff] [blame]	633
				634	\| Literal \| 0 \| 1 \| 2 \| 3 \| 4 \| 5 \|
				635	\| ------- \| --- \| --- \| --- \| --- \| --- \| --- \|
				636	\| weight \| 4 \| 3 \| 2 \| 0 \| 1 \| 1 \|
				637
				638	Sorted by weight and then natural order,
				639	it gives the following distribution :
				640
				641	\| Literal \| 3 \| 4 \| 5 \| 2 \| 1 \| 0 \|
				642	\| ------------ \| --- \| --- \| --- \| --- \| --- \| ---- \|
				643	\| weight \| 0 \| 1 \| 1 \| 2 \| 3 \| 4 \|
				644	\| range \| 0 \| 1 \| 1 \| 2 \| 4 \| 8 \|
Yann Collet	b21e9cb	2016-07-15 17:31:13 +0200	[diff] [blame]	645	\| table entries\| N/A \| 0 \| 1 \| 2-3 \| 4-7 \| 8-15 \|
Yann Collet	d916c90	2016-07-04 00:42:58 +0200	[diff] [blame]	646	\| nb bits \| 0 \| 4 \| 4 \| 3 \| 2 \| 1 \|
Yann Collet	b21e9cb	2016-07-15 17:31:13 +0200	[diff] [blame]	647	\| prefix codes \| N/A \| 0000\| 0001\| 001 \| 01 \| 1 \|
Yann Collet	d916c90	2016-07-04 00:42:58 +0200	[diff] [blame]	648
				649
Yann Collet	d916c90	2016-07-04 00:42:58 +0200	[diff] [blame]	650	#### Literals bitstreams
				651
				652	##### Bitstreams sizes
				653
				654	As seen in a previous paragraph,
				655	there are 2 flavors of huffman-compressed literals :
				656	single stream, and 4-streams.
				657
				658	4-streams is useful for CPU with multiple execution units and OoO operations.
				659	Since each stream can be decoded independently,
				660	it's possible to decode them up to 4x faster than a single stream,
				661	presuming the CPU has enough parallelism available.
				662
				663	For single stream, header provides both the compressed and regenerated size.
				664	For 4-streams though,
				665	header only provides compressed and regenerated size of all 4 streams combined.
Yann Collet	d916c90	2016-07-04 00:42:58 +0200	[diff] [blame]	666	In order to properly decode the 4 streams,
				667	it's necessary to know the compressed and regenerated size of each stream.
				668
				669	Regenerated size is easiest :
				670	each stream has a size of `(totalSize+3)/4`,
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	671	except the last one, which is up to 3 bytes smaller, to reach `totalSize`.
Yann Collet	d916c90	2016-07-04 00:42:58 +0200	[diff] [blame]	672
				673	Compressed size must be provided explicitly : in the 4-streams variant,
Yann Collet	00d44ab	2016-07-04 01:29:47 +0200	[diff] [blame]	674	bitstreams are preceded by 3 unsigned Little Endian 16-bits values.
				675	Each value represents the compressed size of one stream, in order.
Yann Collet	d916c90	2016-07-04 00:42:58 +0200	[diff] [blame]	676	The last stream size is deducted from total compressed size
				677	and from already known stream sizes :
				678	`stream4CSize = totalCSize - 6 - stream1CSize - stream2CSize - stream3CSize;`
				679
Yann Collet	00d44ab	2016-07-04 01:29:47 +0200	[diff] [blame]	680	##### Bitstreams read and decode
Yann Collet	d916c90	2016-07-04 00:42:58 +0200	[diff] [blame]	681
				682	Each bitstream must be read _backward_,
				683	that is starting from the end down to the beginning.
				684	Therefore it's necessary to know the size of each bitstream.
				685
				686	It's also necessary to know exactly which _bit_ is the latest.
				687	This is detected by a final bit flag :
				688	the highest bit of latest byte is a final-bit-flag.
				689	Consequently, a last byte of `0` is not possible.
				690	And the final-bit-flag itself is not part of the useful bitstream.
				691	Hence, the last byte contain between 0 and 7 useful bits.
				692
				693	Starting from the end,
				694	it's possible to read the bitstream in a little-endian fashion,
				695	keeping track of already used bits.
				696
Yann Collet	00d44ab	2016-07-04 01:29:47 +0200	[diff] [blame]	697	Reading the last `maxBits` bits,
Yann Collet	b21e9cb	2016-07-15 17:31:13 +0200	[diff] [blame]	698	it's then possible to compare extracted value to decoding table,
Yann Collet	d916c90	2016-07-04 00:42:58 +0200	[diff] [blame]	699	determining the symbol to decode and number of bits to discard.
				700
				701	The process continues up to reading the required number of symbols per stream.
				702	If a bitstream is not entirely and exactly consumed,
Yann Collet	b21e9cb	2016-07-15 17:31:13 +0200	[diff] [blame]	703	hence reaching exactly its beginning position with _all_ bits consumed,
Yann Collet	d916c90	2016-07-04 00:42:58 +0200	[diff] [blame]	704	the decoding process is considered faulty.
				705
Yann Collet	698cb63	2016-07-03 18:49:35 +0200	[diff] [blame]	706
Yann Collet	00d44ab	2016-07-04 01:29:47 +0200	[diff] [blame]	707	### Sequences section
				708
				709	A compressed block is a succession of _sequences_ .
				710	A sequence is a literal copy command, followed by a match copy command.
				711	A literal copy command specifies a length.
				712	It is the number of bytes to be copied (or extracted) from the literal section.
				713	A match copy command specifies an offset and a length.
				714	The offset gives the position to copy from,
Yann Collet	b21e9cb	2016-07-15 17:31:13 +0200	[diff] [blame]	715	which can be within a previous block.
Yann Collet	00d44ab	2016-07-04 01:29:47 +0200	[diff] [blame]	716
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	717	There are 3 symbol types, `literalLength`, `matchLength` and `offset`,
Yann Collet	00d44ab	2016-07-04 01:29:47 +0200	[diff] [blame]	718	which are encoded together, interleaved in a single _bitstream_.
				719
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	720	Each symbol is a _code_ in its own context,
				721	which specifies a baseline and a number of bits to add.
Yann Collet	00d44ab	2016-07-04 01:29:47 +0200	[diff] [blame]	722	_Codes_ are FSE compressed,
				723	and interleaved with raw additional bits in the same bitstream.
				724
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	725	The Sequences section starts by a header,
				726	followed by optional Probability tables for each symbol type,
Yann Collet	00d44ab	2016-07-04 01:29:47 +0200	[diff] [blame]	727	followed by the bitstream.
				728
Yann Collet	c40ba71	2016-07-08 15:39:02 +0200	[diff] [blame]	729	\| Header \| (LitLengthTable) \| (OffsetTable) \| (MatchLengthTable) \| bitStream \|
				730	\| ------ \| ---------------- \| ------------- \| ------------------ \| --------- \|
				731
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	732	To decode the Sequence section, it's required to know its size.
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	733	This size is deducted from `blockSize - literalSectionSize`.
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	734
				735
Yann Collet	00d44ab	2016-07-04 01:29:47 +0200	[diff] [blame]	736	#### Sequences section header
				737
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	738	Consists in 2 items :
				739	- Nb of Sequences
				740	- Flags providing Symbol compression types
				741
				742	__Nb of Sequences__
				743
				744	This is a variable size field, `nbSeqs`, using between 1 and 3 bytes.
				745	Let's call its first byte `byte0`.
				746	- `if (byte0 == 0)` : there are no sequences.
				747	The sequence section stops there.
				748	Regenerated content is defined entirely by literals section.
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	749	- `if (byte0 < 128)` : `nbSeqs = byte0;` . Uses 1 byte.
				750	- `if (byte0 < 255)` : `nbSeqs = ((byte0-128) << 8) + byte1;` . Uses 2 bytes.
				751	- `if (byte0 == 255)`: `nbSeqs = byte1 + (byte2<<8) + 0x7F00;` . Uses 3 bytes.
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	752
				753	__Symbol compression modes__
				754
				755	This is a single byte, defining the compression mode of each symbol type.
				756
				757	\| BitNb \| 7-6 \| 5-4 \| 3-2 \| 1-0 \|
				758	\| ------- \| ------ \| ------ \| ------ \| -------- \|
				759	\|FieldName\| LLtype \| OFType \| MLType \| Reserved \|
				760
				761	The last field, `Reserved`, must be all-zeroes.
				762
				763	`LLtype`, `OFType` and `MLType` define the compression mode of
				764	Literal Lengths, Offsets and Match Lengths respectively.
				765
				766	They follow the same enumeration :
				767
				768	\| Value \| 0 \| 1 \| 2 \| 3 \|
				769	\| ---------------- \| ------ \| --- \| ------ \| --- \|
				770	\| Compression Mode \| predef \| RLE \| Repeat \| FSE \|
				771
				772	- "predef" : uses a pre-defined distribution table.
				773	- "RLE" : it's a single code, repeated `nbSeqs` times.
				774	- "Repeat" : re-use distribution table from previous compressed block.
				775	- "FSE" : standard FSE compression.
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	776	A distribution table will be present.
				777	It will be described in [next part](#distribution-tables).
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	778
				779	#### Symbols decoding
				780
				781	##### Literal Lengths codes
				782
				783	Literal lengths codes are values ranging from `0` to `35` included.
				784	They define lengths from 0 to 131071 bytes.
				785
				786	\| Code \| 0-15 \|
				787	\| ------ \| ---- \|
Yann Collet	c40ba71	2016-07-08 15:39:02 +0200	[diff] [blame]	788	\| length \| Code \|
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	789	\| nbBits \| 0 \|
				790
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	791
				792	\| Code \| 16 \| 17 \| 18 \| 19 \| 20 \| 21 \| 22 \| 23 \|
				793	\| -------- \| ---- \| ---- \| ---- \| ---- \| ---- \| ---- \| ---- \| ---- \|
				794	\| Baseline \| 16 \| 18 \| 20 \| 22 \| 24 \| 28 \| 32 \| 40 \|
				795	\| nb Bits \| 1 \| 1 \| 1 \| 1 \| 2 \| 2 \| 3 \| 3 \|
				796
				797	\| Code \| 24 \| 25 \| 26 \| 27 \| 28 \| 29 \| 30 \| 31 \|
				798	\| -------- \| ---- \| ---- \| ---- \| ---- \| ---- \| ---- \| ---- \| ---- \|
				799	\| Baseline \| 48 \| 64 \| 128 \| 256 \| 512 \| 1024 \| 2048 \| 4096 \|
				800	\| nb Bits \| 4 \| 6 \| 7 \| 8 \| 9 \| 10 \| 11 \| 12 \|
				801
				802	\| Code \| 32 \| 33 \| 34 \| 35 \|
				803	\| -------- \| ---- \| ---- \| ---- \| ---- \|
				804	\| Baseline \| 8192 \|16384 \|32768 \|65536 \|
				805	\| nb Bits \| 13 \| 14 \| 15 \| 16 \|
				806
				807	__Default distribution__
				808
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	809	When "compression mode" is "predef"",
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	810	a pre-defined distribution is used for FSE compression.
				811
Yann Collet	c40ba71	2016-07-08 15:39:02 +0200	[diff] [blame]	812	Below is its definition. It uses an accuracy of 6 bits (64 states).
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	813	```
				814	short literalLengths_defaultDistribution[36] =
				815	{ 4, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1,
				816	2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 1, 1, 1, 1, 1,
				817	-1,-1,-1,-1 };
				818	```
				819
				820	##### Match Lengths codes
				821
				822	Match lengths codes are values ranging from `0` to `52` included.
				823	They define lengths from 3 to 131074 bytes.
				824
				825	\| Code \| 0-31 \|
				826	\| ------ \| -------- \|
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	827	\| value \| Code + 3 \|
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	828	\| nbBits \| 0 \|
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	829
				830	\| Code \| 32 \| 33 \| 34 \| 35 \| 36 \| 37 \| 38 \| 39 \|
				831	\| -------- \| ---- \| ---- \| ---- \| ---- \| ---- \| ---- \| ---- \| ---- \|
				832	\| Baseline \| 35 \| 37 \| 39 \| 41 \| 43 \| 47 \| 51 \| 59 \|
				833	\| nb Bits \| 1 \| 1 \| 1 \| 1 \| 2 \| 2 \| 3 \| 3 \|
				834
				835	\| Code \| 40 \| 41 \| 42 \| 43 \| 44 \| 45 \| 46 \| 47 \|
				836	\| -------- \| ---- \| ---- \| ---- \| ---- \| ---- \| ---- \| ---- \| ---- \|
				837	\| Baseline \| 67 \| 83 \| 99 \| 131 \| 258 \| 514 \| 1026 \| 2050 \|
				838	\| nb Bits \| 4 \| 4 \| 5 \| 7 \| 8 \| 9 \| 10 \| 11 \|
				839
				840	\| Code \| 48 \| 49 \| 50 \| 51 \| 52 \|
				841	\| -------- \| ---- \| ---- \| ---- \| ---- \| ---- \|
				842	\| Baseline \| 4098 \| 8194 \|16486 \|32770 \|65538 \|
				843	\| nb Bits \| 12 \| 13 \| 14 \| 15 \| 16 \|
				844
				845	__Default distribution__
				846
Yann Collet	c40ba71	2016-07-08 15:39:02 +0200	[diff] [blame]	847	When "compression mode" is defined as "predef",
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	848	a pre-defined distribution is used for FSE compression.
				849
				850	Here is its definition. It uses an accuracy of 6 bits (64 states).
				851	```
				852	short matchLengths_defaultDistribution[53] =
				853	{ 1, 4, 3, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1,
				854	1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
				855	1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,-1,-1,
				856	-1,-1,-1,-1,-1 };
				857	```
				858
				859	##### Offset codes
				860
				861	Offset codes are values ranging from `0` to `N`,
				862	with `N` being limited by maximum backreference distance.
				863
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	864	A decoder is free to limit its maximum `N` supported.
				865	Recommendation is to support at least up to `22`.
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	866	For information, at the time of this writing.
				867	the reference decoder supports a maximum `N` value of `28` in 64-bits mode.
				868
				869	An offset code is also the nb of additional bits to read,
				870	and can be translated into an `OFValue` using the following formulae :
				871
				872	```
				873	OFValue = (1 << offsetCode) + readNBits(offsetCode);
				874	if (OFValue > 3) offset = OFValue - 3;
				875	```
				876
				877	OFValue from 1 to 3 are special : they define "repeat codes",
				878	which means one of the previous offsets will be repeated.
				879	They are sorted in recency order, with 1 meaning the most recent one.
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	880	See [Repeat offsets](#repeat-offsets) paragraph.
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	881
				882	__Default distribution__
				883
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	884	When "compression mode" is defined as "predef",
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	885	a pre-defined distribution is used for FSE compression.
				886
				887	Here is its definition. It uses an accuracy of 5 bits (32 states),
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	888	and supports a maximum `N` of 28, allowing offset values up to 536,870,908 .
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	889
				890	If any sequence in the compressed block requires an offset larger than this,
				891	it's not possible to use the default distribution to represent it.
				892
				893	```
				894	short offsetCodes_defaultDistribution[53] =
				895	{ 1, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1,
				896	1, 1, 1, 1, 1, 1, 1, 1,-1,-1,-1,-1,-1 };
				897	```
				898
				899	#### Distribution tables
				900
				901	Following the header, up to 3 distribution tables can be described.
				902	They are, in order :
				903	- Literal lengthes
				904	- Offsets
				905	- Match Lengthes
				906
				907	The content to decode depends on their respective compression mode :
				908	- Repeat mode : no content. Re-use distribution from previous compressed block.
				909	- Predef : no content. Use pre-defined distribution table.
				910	- RLE : 1 byte. This is the only code to use across the whole compressed block.
				911	- FSE : A distribution table is present.
				912
				913	##### FSE distribution table : condensed format
				914
				915	An FSE distribution table describes the probabilities of all symbols
				916	from `0` to the last present one (included)
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	917	on a normalized scale of `1 << AccuracyLog` .
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	918
				919	It's a bitstream which is read forward, in little-endian fashion.
				920	It's not necessary to know its exact size,
				921	since it will be discovered and reported by the decoding process.
				922
				923	The bitstream starts by reporting on which scale it operates.
				924	`AccuracyLog = low4bits + 5;`
				925	In theory, it can define a scale from 5 to 20.
				926	In practice, decoders are allowed to limit the maximum supported `AccuracyLog`.
				927	Recommended maximum are `9` for literal and match lengthes, and `8` for offsets.
				928	The reference decoder uses these limits.
				929
				930	Then follow each symbol value, from `0` to last present one.
				931	The nb of bits used by each field is variable.
				932	It depends on :
				933
				934	- Remaining probabilities + 1 :
				935	__example__ :
				936	Presuming an AccuracyLog of 8,
				937	and presuming 100 probabilities points have already been distributed,
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	938	the decoder may read any value from `0` to `255 - 100 + 1 == 156` (included).
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	939	Therefore, it must read `log2sup(156) == 8` bits.
				940
				941	- Value decoded : small values use 1 less bit :
				942	__example__ :
				943	Presuming values from 0 to 156 (included) are possible,
				944	255-156 = 99 values are remaining in an 8-bits field.
				945	They are used this way :
				946	first 99 values (hence from 0 to 98) use only 7 bits,
				947	values from 99 to 156 use 8 bits.
				948	This is achieved through this scheme :
				949
				950	\| Value read \| Value decoded \| nb Bits used \|
				951	\| ---------- \| ------------- \| ------------ \|
				952	\| 0 - 98 \| 0 - 98 \| 7 \|
				953	\| 99 - 127 \| 99 - 127 \| 8 \|
				954	\| 128 - 226 \| 0 - 98 \| 7 \|
				955	\| 227 - 255 \| 128 - 156 \| 8 \|
				956
				957	Symbols probabilities are read one by one, in order.
				958
				959	Probability is obtained from Value decoded by following formulae :
				960	`Proba = value - 1;`
				961
				962	It means value `0` becomes negative probability `-1`.
				963	`-1` is a special probability, which means `less than 1`.
Yann Collet	c40ba71	2016-07-08 15:39:02 +0200	[diff] [blame]	964	Its effect on distribution table is described in [next paragraph].
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	965	For the purpose of calculating cumulated distribution, it counts as one.
				966
Yann Collet	c40ba71	2016-07-08 15:39:02 +0200	[diff] [blame]	967	[next paragraph]:#fse-decoding--from-normalized-distribution-to-decoding-tables
				968
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	969	When a symbol has a probability of `zero`,
				970	it is followed by a 2-bits repeat flag.
				971	This repeat flag tells how many probabilities of zeroes follow the current one.
				972	It provides a number ranging from 0 to 3.
				973	If it is a 3, another 2-bits repeat flag follows, and so on.
				974
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	975	When last symbol reaches cumulated total of `1 << AccuracyLog`,
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	976	decoding is complete.
				977	Then the decoder can tell how many bytes were used in this process,
				978	and how many symbols are present.
				979
				980	The bitstream consumes a round number of bytes.
				981	Any remaining bit within the last byte is just unused.
				982
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	983	If the last symbol makes cumulated total go above `1 << AccuracyLog`,
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	984	distribution is considered corrupted.
				985
				986	##### FSE decoding : from normalized distribution to decoding tables
				987
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	988	The distribution of normalized probabilities is enough
				989	to create a unique decoding table.
				990
				991	It follows the following build rule :
				992
				993	The table has a size of `tableSize = 1 << AccuracyLog;`.
				994	Each cell describes the symbol decoded,
				995	and instructions to get the next state.
				996
				997	Symbols are scanned in their natural order for `less than 1` probabilities.
				998	Symbols with this probability are being attributed a single cell,
				999	starting from the end of the table.
				1000	These symbols define a full state reset, reading `AccuracyLog` bits.
				1001
				1002	All remaining symbols are sorted in their natural order.
				1003	Starting from symbol `0` and table position `0`,
				1004	each symbol gets attributed as many cells as its probability.
				1005	Cell allocation is spreaded, not linear :
				1006	each successor position follow this rule :
				1007
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	1008	```
				1009	position += (tableSize>>1) + (tableSize>>3) + 3;
				1010	position &= tableSize-1;
				1011	```
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	1012
				1013	A position is skipped if already occupied,
				1014	typically by a "less than 1" probability symbol.
				1015
				1016	The result is a list of state values.
				1017	Each state will decode the current symbol.
				1018
				1019	To get the Number of bits and baseline required for next state,
				1020	it's first necessary to sort all states in their natural order.
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	1021	The lower states will need 1 more bit than higher ones.
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	1022
				1023	__Example__ :
				1024	Presuming a symbol has a probability of 5.
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	1025	It receives 5 state values. States are sorted in natural order.
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	1026
				1027	Next power of 2 is 8.
				1028	Space of probabilities is divided into 8 equal parts.
				1029	Presuming the AccuracyLog is 7, it defines 128 states.
				1030	Divided by 8, each share is 16 large.
				1031
				1032	In order to reach 8, 8-5=3 lowest states will count "double",
				1033	taking shares twice larger,
				1034	requiring one more bit in the process.
				1035
				1036	Numbering starts from higher states using less bits.
				1037
				1038	\| state order \| 0 \| 1 \| 2 \| 3 \| 4 \|
				1039	\| ----------- \| ----- \| ----- \| ------ \| ---- \| ----- \|
				1040	\| width \| 32 \| 32 \| 32 \| 16 \| 16 \|
				1041	\| nb Bits \| 5 \| 5 \| 5 \| 4 \| 4 \|
				1042	\| range nb \| 2 \| 4 \| 6 \| 0 \| 1 \|
				1043	\| baseline \| 32 \| 64 \| 96 \| 0 \| 16 \|
				1044	\| range \| 32-63 \| 64-95 \| 96-127 \| 0-15 \| 16-31 \|
				1045
				1046	Next state is determined from current state
				1047	by reading the required number of bits, and adding the specified baseline.
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	1048
				1049
				1050	#### Bitstream
Yann Collet	00d44ab	2016-07-04 01:29:47 +0200	[diff] [blame]	1051
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	1052	All sequences are stored in a single bitstream, read _backward_.
				1053	It is therefore necessary to know the bitstream size,
				1054	which is deducted from compressed block size.
				1055
Yann Collet	c40ba71	2016-07-08 15:39:02 +0200	[diff] [blame]	1056	The last useful bit of the stream is followed by an end-bit-flag.
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	1057	Highest bit of last byte is this flag.
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	1058	It does not belong to the useful part of the bitstream.
				1059	Therefore, last byte has 0-7 useful bits.
				1060	Note that it also means that last byte cannot be `0`.
				1061
				1062	##### Starting states
				1063
				1064	The bitstream starts with initial state values,
				1065	each using the required number of bits in their respective _accuracy_,
				1066	decoded previously from their normalized distribution.
				1067
				1068	It starts by `Literal Length State`,
				1069	followed by `Offset State`,
				1070	and finally `Match Length State`.
				1071
				1072	Reminder : always keep in mind that all values are read _backward_.
				1073
				1074	##### Decoding a sequence
				1075
				1076	A state gives a code.
				1077	A code provides a baseline and number of bits to add.
				1078	See [Symbol Decoding] section for details on each symbol.
				1079
				1080	Decoding starts by reading the nb of bits required to decode offset.
				1081	It then does the same for match length,
				1082	and then for literal length.
				1083
Yann Collet	c40ba71	2016-07-08 15:39:02 +0200	[diff] [blame]	1084	Offset / matchLength / litLength define a sequence.
				1085	It starts by inserting the number of literals defined by `litLength`,
				1086	then continue by copying `matchLength` bytes from `currentPos - offset`.
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	1087
				1088	The next operation is to update states.
				1089	Using rules pre-calculated in the decoding tables,
				1090	`Literal Length State` is updated,
				1091	followed by `Match Length State`,
				1092	and then `Offset State`.
				1093
				1094	This operation will be repeated `NbSeqs` times.
				1095	At the end, the bitstream shall be entirely consumed,
				1096	otherwise bitstream is considered corrupted.
				1097
				1098	[Symbol Decoding]:#symbols-decoding
				1099
				1100	##### Repeat offsets
				1101
				1102	As seen in [Offset Codes], the first 3 values define a repeated offset.
				1103	They are sorted in recency order, with 1 meaning "most recent one".
				1104
				1105	There is an exception though, when current sequence's literal length is `0`.
				1106	In which case, 1 would just make previous match longer.
				1107	Therefore, in such case, 1 means in fact 2, and 2 is impossible.
				1108	Meaning of 3 is unmodified.
				1109
				1110	Repeat offsets start with the following values : 1, 4 and 8 (in order).
				1111
				1112	Then each block receives its start value from previous compressed block.
				1113	Note that non-compressed blocks are skipped,
				1114	they do not contribute to offset history.
				1115
				1116	[Offset Codes]: #offset-codes
				1117
				1118	###### Offset updates rules
				1119
				1120	When the new offset is a normal one,
				1121	offset history is simply translated by one position,
				1122	with the new offset taking first spot.
				1123
				1124	- When repeat offset 1 (most recent) is used, history is unmodified.
				1125	- When repeat offset 2 is used, it's swapped with offset 1.
				1126	- When repeat offset 3 is used, it takes first spot,
				1127	pushing the other ones by one position.
Yann Collet	00d44ab	2016-07-04 01:29:47 +0200	[diff] [blame]	1128
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	1129
Yann Collet	bd10607	2016-07-08 19:16:57 +0200	[diff] [blame]	1130	Dictionary format
				1131	-----------------
				1132
				1133	`zstd` is compatible with "pure content" dictionaries, free of any format restriction.
				1134	But dictionaries created by `zstd --train` follow a format, described here.
				1135
				1136	__Pre-requisites__ : a dictionary has a known length,
				1137	defined either by a buffer limit, or a file size.
				1138
				1139	\| Header \| DictID \| Stats \| Content \|
				1140	\| ------ \| ------ \| ----- \| ------- \|
				1141
				1142	__Header__ : 4 bytes ID, value 0xEC30A437, Little Endian format
				1143
				1144	__Dict_ID__ : 4 bytes, stored in Little Endian format.
				1145	DictID can be any value, except 0 (which means no DictID).
Yann Collet	722e14b	2016-07-08 19:22:16 +0200	[diff] [blame]	1146	It's used by decoders to check if they use the correct dictionary.
Yann Collet	f6ff53c	2016-07-15 17:03:38 +0200	[diff] [blame]	1147	_Reserved ranges :_
				1148	If the frame is going to be distributed in a private environment,
				1149	any dictionary ID can be used.
				1150	However, for public distribution of compressed frames,
				1151	some ranges are reserved for future use :
Yann Collet	6cacd34	2016-07-15 17:58:13 +0200	[diff] [blame^]	1152
				1153	- low range : 1 - 32767 : reserved
				1154	- high range : >= (2^31) : reserved
Yann Collet	bd10607	2016-07-08 19:16:57 +0200	[diff] [blame]	1155
				1156	__Stats__ : Entropy tables, following the same format as a [compressed blocks].
				1157	They are stored in following order :
				1158	Huffman tables for literals, FSE table for offset,
Yann Collet	722e14b	2016-07-08 19:22:16 +0200	[diff] [blame]	1159	FSE table for matchLenth, and FSE table for litLength.
				1160	It's finally followed by 3 offset values, populating recent offsets,
				1161	stored in order, 4-bytes little endian each, for a total of 12 bytes.
Yann Collet	bd10607	2016-07-08 19:16:57 +0200	[diff] [blame]	1162
				1163	__Content__ : Where the actual dictionary content is.
Yann Collet	722e14b	2016-07-08 19:22:16 +0200	[diff] [blame]	1164	Content size depends on Dictionary size.
Yann Collet	bd10607	2016-07-08 19:16:57 +0200	[diff] [blame]	1165
				1166	[compressed blocks]: #compressed-block-format
				1167
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	1168
				1169	Version changes
				1170	---------------
Yann Collet	6cacd34	2016-07-15 17:58:13 +0200	[diff] [blame^]	1171	- 0.1.1 reserved dictID ranges
				1172	- 0.1.0 initial release