Blame - zstd_compression_format.md - external_zstd

blob: 867a9b03302d0506339ae5da0541f6638a07737f [file] [log] [blame] [view]

Yann Collet	5cc1882	2016-07-03 19:03:13 +0200	[diff] [blame]	1	Zstandard Compression Format
				2	============================
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	3
				4	### Notices
				5
				6	Copyright (c) 2016 Yann Collet
				7
				8	Permission is granted to copy and distribute this document
inikep	f9c3cce	2016-07-25 11:04:56 +0200	[diff] [blame]	9	for any purpose and without charge,
				10	including translations into other languages
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	11	and incorporation into compilations,
				12	provided that the copyright notice and this notice are preserved,
				13	and that any substantive changes or deletions from the original
				14	are clearly marked.
				15	Distribution of this document is unlimited.
				16
				17	### Version
				18
Yann Collet	6fa05a2	2016-07-20 14:58:49 +0200	[diff] [blame]	19	0.2.0 (22/07/16)
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	20
				21
				22	Introduction
				23	------------
				24
				25	The purpose of this document is to define a lossless compressed data format,
				26	that is independent of CPU type, operating system,
				27	file system and character set, suitable for
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	28	file compression, pipe and streaming compression,
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	29	using the [Zstandard algorithm](http://www.zstandard.org).
				30
				31	The data can be produced or consumed,
				32	even for an arbitrarily long sequentially presented input data stream,
				33	using only an a priori bounded amount of intermediate storage,
				34	and hence can be used in data communications.
				35	The format uses the Zstandard compression method,
				36	and optional [xxHash-64 checksum method](http://www.xxhash.org),
				37	for detection of data corruption.
				38
				39	The data format defined by this specification
				40	does not attempt to allow random access to compressed data.
				41
				42	This specification is intended for use by implementers of software
				43	to compress data into Zstandard format and/or decompress data from Zstandard format.
				44	The text of the specification assumes a basic background in programming
				45	at the level of bits and other primitive data representations.
				46
				47	Unless otherwise indicated below,
				48	a compliant compressor must produce data sets
				49	that conform to the specifications presented here.
				50	It doesn’t need to support all options though.
				51
				52	A compliant decompressor must be able to decompress
				53	at least one working set of parameters
				54	that conforms to the specifications presented here.
				55	It may also ignore informative fields, such as checksum.
				56	Whenever it does not support a parameter defined in the compressed stream,
				57	it must produce a non-ambiguous error code and associated error message
				58	explaining which parameter is unsupported.
				59
				60
inikep	f9c3cce	2016-07-25 11:04:56 +0200	[diff] [blame]	61	Overall conventions
				62	-----------
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	63	In this document:
				64	- square brackets i.e. `[` and `]` are used to indicate optional fields or parameters.
				65	- a naming convention for identifiers is `Mixed_Case_With_Underscores`
inikep	f9c3cce	2016-07-25 11:04:56 +0200	[diff] [blame]	66
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	67	Definitions
				68	-----------
				69	A content compressed by Zstandard is transformed into a Zstandard __frame__.
				70	Multiple frames can be appended into a single file or stream.
				71	A frame is totally independent, has a defined beginning and end,
				72	and a set of parameters which tells the decoder how to decompress it.
				73
				74	A frame encapsulates one or multiple __blocks__.
				75	Each block can be compressed or not,
				76	and has a guaranteed maximum content size, which depends on frame parameters.
				77	Unlike frames, each block depends on previous blocks for proper decoding.
				78	However, each block can be decompressed without waiting for its successor,
				79	allowing streaming operations.
				80
				81
inikep	f9c3cce	2016-07-25 11:04:56 +0200	[diff] [blame]	82	Frame Concatenation
				83	-------------------
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	84
				85	In some circumstances, it may be required to append multiple frames,
				86	for example in order to add new data to an existing compressed file
				87	without re-framing it.
				88
				89	In such case, each frame brings its own set of descriptor flags.
				90	Each frame is considered independent.
				91	The only relation between frames is their sequential order.
				92
				93	The ability to decode multiple concatenated frames
				94	within a single stream or file is left outside of this specification.
				95	As an example, the reference `zstd` command line utility is able
				96	to decode all concatenated frames in their sequential order,
				97	delivering the final decompressed result as if it was a single content.
				98
				99
inikep	0132375	2016-08-25 12:20:38 +0200	[diff] [blame^]	100	Skippable Frames
				101	----------------
				102
				103	\| `Magic_Number` \| `Frame_Size` \| `User_Data` \|
				104	\|:--------------:\|:------------:\|:-----------:\|
				105	\| 4 bytes \| 4 bytes \| n bytes \|
				106
				107	Skippable frames allow the insertion of user-defined data
				108	into a flow of concatenated frames.
				109	Its design is pretty straightforward,
				110	with the sole objective to allow the decoder to quickly skip
				111	over user-defined data and continue decoding.
				112
				113	Skippable frames defined in this specification are compatible with [LZ4] ones.
				114
				115	[LZ4]:http://www.lz4.org
				116
				117	__`Magic_Number`__
				118
				119	4 Bytes, little-endian format.
				120	Value : 0x184D2A5X, which means any value from 0x184D2A50 to 0x184D2A5F.
				121	All 16 values are valid to identify a skippable frame.
				122
				123	__`Frame_Size`__
				124
				125	This is the size, in bytes, of the following `User_Data`
				126	(without including the magic number nor the size field itself).
				127	This field is represented using 4 Bytes, little-endian format, unsigned 32-bits.
				128	This means `User_Data` can’t be bigger than (2^32-1) bytes.
				129
				130	__`User_Data`__
				131
				132	The `User_Data` can be anything. Data will just be skipped by the decoder.
				133
				134
				135
inikep	f9c3cce	2016-07-25 11:04:56 +0200	[diff] [blame]	136	General Structure of Zstandard Frame format
				137	-------------------------------------------
				138	The structure of a single Zstandard frame is following:
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	139
Yann Collet	c991cc1	2016-07-28 00:55:43 +0200	[diff] [blame]	140	\| `Magic_Number` \| `Frame_Header` \|`Data_Block`\| [More data blocks] \| [`Content_Checksum`] \|
				141	\|:--------------:\|:--------------:\|:----------:\| ------------------ \|:--------------------:\|
				142	\| 4 bytes \| 2-14 bytes \| n bytes \| \| 0-4 bytes \|
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	143
inikep	f9c3cce	2016-07-25 11:04:56 +0200	[diff] [blame]	144	__`Magic_Number`__
				145
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	146	4 Bytes, little-endian format.
inikep	f9c3cce	2016-07-25 11:04:56 +0200	[diff] [blame]	147	Value : 0xFD2FB527
				148
				149	__`Frame_Header`__
				150
				151	2 to 14 Bytes, detailed in [next part](#the-structure-of-frame_header).
				152
				153	__`Data_Block`__
				154
				155	Detailed in [next chapter](#the-structure-of-data_block).
				156	That’s where compressed data is stored.
				157
Yann Collet	c991cc1	2016-07-28 00:55:43 +0200	[diff] [blame]	158	__`Content_Checksum`__
inikep	f9c3cce	2016-07-25 11:04:56 +0200	[diff] [blame]	159
Yann Collet	c991cc1	2016-07-28 00:55:43 +0200	[diff] [blame]	160	An optional 32-bit checksum, only present if `Content_Checksum_flag` is set.
inikep	f9c3cce	2016-07-25 11:04:56 +0200	[diff] [blame]	161	The content checksum is the result
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	162	of [xxh64() hash function](http://www.xxhash.org)
inikep	f9c3cce	2016-07-25 11:04:56 +0200	[diff] [blame]	163	digesting the original (decoded) data as input, and a seed of zero.
Yann Collet	c991cc1	2016-07-28 00:55:43 +0200	[diff] [blame]	164	The low 4 bytes of the checksum are stored in little endian format.
inikep	f9c3cce	2016-07-25 11:04:56 +0200	[diff] [blame]	165
				166
				167	The structure of `Frame_Header`
				168	-------------------------------
				169	The `Frame_Header` has a variable size, which uses a minimum of 2 bytes,
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	170	and up to 14 bytes depending on optional parameters.
inikep	f9c3cce	2016-07-25 11:04:56 +0200	[diff] [blame]	171	The structure of `Frame_Header` is following:
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	172
inikep	f9c3cce	2016-07-25 11:04:56 +0200	[diff] [blame]	173	\| `Frame_Header_Descriptor` \| [`Window_Descriptor`] \| [`Dictionary_ID`] \| [`Frame_Content_Size`] \|
				174	\| ------------------------- \| --------------------- \| ----------------- \| ---------------------- \|
				175	\| 1 byte \| 0-1 byte \| 0-4 bytes \| 0-8 bytes \|
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	176
inikep	f9c3cce	2016-07-25 11:04:56 +0200	[diff] [blame]	177	### `Frame_Header_Descriptor`
				178
				179	The first header's byte is called the `Frame_Header_Descriptor`.
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	180	It tells which other fields are present.
inikep	f9c3cce	2016-07-25 11:04:56 +0200	[diff] [blame]	181	Decoding this byte is enough to tell the size of `Frame_Header`.
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	182
inikep	f9c3cce	2016-07-25 11:04:56 +0200	[diff] [blame]	183	\| Bit number \| Field name \|
				184	\| ---------- \| ---------- \|
				185	\| 7-6 \| `Frame_Content_Size_flag` \|
				186	\| 5 \| `Single_Segment_flag` \|
				187	\| 4 \| `Unused_bit` \|
				188	\| 3 \| `Reserved_bit` \|
				189	\| 2 \| `Content_Checksum_flag` \|
				190	\| 1-0 \| `Dictionary_ID_flag` \|
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	191
				192	In this table, bit 7 is highest bit, while bit 0 is lowest.
				193
inikep	49ec6d1	2016-07-25 12:26:39 +0200	[diff] [blame]	194	__`Frame_Content_Size_flag`__
				195
				196	This is a 2-bits flag (`= Frame_Header_Descriptor >> 6`),
				197	specifying if decompressed data size is provided within the header.
Yann Collet	c991cc1	2016-07-28 00:55:43 +0200	[diff] [blame]	198	The `Flag_Value` can be converted into `Field_Size`,
				199	which is the number of bytes used by `Frame_Content_Size`
				200	according to the following table:
inikep	49ec6d1	2016-07-25 12:26:39 +0200	[diff] [blame]	201
inikep	0132375	2016-08-25 12:20:38 +0200	[diff] [blame^]	202	\|`Flag_Value`\| 0 \| 1 \| 2 \| 3 \|
				203	\| ---------- \| ------ \| --- \| --- \| --- \|
				204	\|`Field_Size`\| 0 or 1 \| 2 \| 4 \| 8 \|
inikep	49ec6d1	2016-07-25 12:26:39 +0200	[diff] [blame]	205
Yann Collet	c991cc1	2016-07-28 00:55:43 +0200	[diff] [blame]	206	When `Flag_Value` is `0`, `Field_Size` depends on `Single_Segment_flag` :
				207	if `Single_Segment_flag` is set, `Field_Size` is 1.
				208	Otherwise, `Field_Size` is 0 (content size not provided).
inikep	49ec6d1	2016-07-25 12:26:39 +0200	[diff] [blame]	209
inikep	f9c3cce	2016-07-25 11:04:56 +0200	[diff] [blame]	210	__`Single_Segment_flag`__
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	211
inikep	49ec6d1	2016-07-25 12:26:39 +0200	[diff] [blame]	212	If this flag is set,
Yann Collet	c991cc1	2016-07-28 00:55:43 +0200	[diff] [blame]	213	data must be regenerated within a single continuous memory segment.
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	214
Yann Collet	c991cc1	2016-07-28 00:55:43 +0200	[diff] [blame]	215	In this case, `Frame_Content_Size` is necessarily present,
				216	but `Window_Descriptor` byte is skipped.
inikep	49ec6d1	2016-07-25 12:26:39 +0200	[diff] [blame]	217	As a consequence, the decoder must allocate a memory segment
				218	of size equal or bigger than `Frame_Content_Size`.
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	219
				220	In order to preserve the decoder from unreasonable memory requirement,
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	221	a decoder can reject a compressed frame
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	222	which requests a memory size beyond decoder's authorized range.
				223
				224	For broader compatibility, decoders are recommended to support
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	225	memory sizes of at least 8 MB.
				226	This is just a recommendation,
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	227	each decoder is free to support higher or lower limits,
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	228	depending on local limitations.
				229
inikep	f9c3cce	2016-07-25 11:04:56 +0200	[diff] [blame]	230	__`Unused_bit`__
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	231
Yann Collet	f0bc673	2016-07-13 17:30:21 +0200	[diff] [blame]	232	The value of this bit should be set to zero.
Yann Collet	c991cc1	2016-07-28 00:55:43 +0200	[diff] [blame]	233	A decoder compliant with this specification version shall not interpret it.
Yann Collet	f0bc673	2016-07-13 17:30:21 +0200	[diff] [blame]	234	It might be used in a future version,
				235	to signal a property which is not mandatory to properly decode the frame.
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	236
inikep	f9c3cce	2016-07-25 11:04:56 +0200	[diff] [blame]	237	__`Reserved_bit`__
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	238
				239	This bit is reserved for some future feature.
				240	Its value _must be zero_.
				241	A decoder compliant with this specification version must ensure it is not set.
				242	This bit may be used in a future revision,
Yann Collet	c991cc1	2016-07-28 00:55:43 +0200	[diff] [blame]	243	to signal a feature that must be interpreted to decode the frame correctly.
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	244
inikep	f9c3cce	2016-07-25 11:04:56 +0200	[diff] [blame]	245	__`Content_Checksum_flag`__
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	246
Yann Collet	c991cc1	2016-07-28 00:55:43 +0200	[diff] [blame]	247	If this flag is set, a 32-bits `Content_Checksum` will be present at frame's end.
				248	See `Content_Checksum` paragraph.
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	249
inikep	f9c3cce	2016-07-25 11:04:56 +0200	[diff] [blame]	250	__`Dictionary_ID_flag`__
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	251
				252	This is a 2-bits flag (`= FHD & 3`),
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	253	telling if a dictionary ID is provided within the header.
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	254	It also specifies the size of this field as `Field_Size`.
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	255
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	256	\|`Flag_Value`\| 0 \| 1 \| 2 \| 3 \|
				257	\| ---------- \| --- \| --- \| --- \| --- \|
				258	\|`Field_Size`\| 0 \| 1 \| 2 \| 4 \|
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	259
inikep	f9c3cce	2016-07-25 11:04:56 +0200	[diff] [blame]	260	### `Window_Descriptor`
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	261
				262	Provides guarantees on maximum back-reference distance
Yann Collet	c991cc1	2016-07-28 00:55:43 +0200	[diff] [blame]	263	that will be used within compressed data.
				264	This information is important for decoders to allocate enough memory.
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	265
Yann Collet	c991cc1	2016-07-28 00:55:43 +0200	[diff] [blame]	266	The `Window_Descriptor` byte is optional. It is absent when `Single_Segment_flag` is set.
inikep	f9c3cce	2016-07-25 11:04:56 +0200	[diff] [blame]	267	In this case, the maximum back-reference distance is the content size itself,
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	268	which can be any value from 1 to 2^64-1 bytes (16 EB).
				269
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	270	\| Bit numbers \| 7-3 \| 0-2 \|
				271	\| ----------- \| ---------- \| ---------- \|
				272	\| Field name \| `Exponent` \| `Mantissa` \|
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	273
				274	Maximum distance is given by the following formulae :
				275	```
				276	windowLog = 10 + Exponent;
				277	windowBase = 1 << windowLog;
				278	windowAdd = (windowBase / 8) * Mantissa;
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	279	Window_Size = windowBase + windowAdd;
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	280	```
				281	The minimum window size is 1 KB.
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	282	The maximum size is `15*(1<<38)` bytes, which is 1.875 TB.
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	283
				284	To properly decode compressed data,
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	285	a decoder will need to allocate a buffer of at least `Window_Size` bytes.
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	286
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	287	In order to preserve decoder from unreasonable memory requirements,
				288	a decoder can refuse a compressed frame
				289	which requests a memory size beyond decoder's authorized range.
				290
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	291	For improved interoperability,
Yann Collet	c991cc1	2016-07-28 00:55:43 +0200	[diff] [blame]	292	decoders are recommended to be compatible with window sizes of 8 MB,
				293	and encoders are recommended to not request more than 8 MB.
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	294	It's merely a recommendation though,
				295	decoders are free to support larger or lower limits,
				296	depending on local limitations.
				297
inikep	f9c3cce	2016-07-25 11:04:56 +0200	[diff] [blame]	298	### `Dictionary_ID`
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	299
Yann Collet	f6ff53c	2016-07-15 17:03:38 +0200	[diff] [blame]	300	This is a variable size field, which contains
				301	the ID of the dictionary required to properly decode the frame.
				302	Note that this field is optional. When it's not present,
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	303	it's up to the caller to make sure it uses the correct dictionary.
				304
inikep	f9c3cce	2016-07-25 11:04:56 +0200	[diff] [blame]	305	Field size depends on `Dictionary_ID_flag`.
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	306	1 byte can represent an ID 0-255.
				307	2 bytes can represent an ID 0-65535.
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	308	4 bytes can represent an ID 0-4294967295.
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	309
				310	It's allowed to represent a small ID (for example `13`)
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	311	with a large 4-bytes dictionary ID, losing some compacity in the process.
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	312
Yann Collet	f6ff53c	2016-07-15 17:03:38 +0200	[diff] [blame]	313	_Reserved ranges :_
				314	If the frame is going to be distributed in a private environment,
				315	any dictionary ID can be used.
				316	However, for public distribution of compressed frames using a dictionary,
inikep	f9c3cce	2016-07-25 11:04:56 +0200	[diff] [blame]	317	the following ranges are reserved for future use and should not be used :
				318	- low range : 1 - 32767
				319	- high range : >= (2^31)
Yann Collet	f6ff53c	2016-07-15 17:03:38 +0200	[diff] [blame]	320
				321
inikep	f9c3cce	2016-07-25 11:04:56 +0200	[diff] [blame]	322	### `Frame_Content_Size`
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	323
inikep	2fc3752	2016-07-25 12:47:02 +0200	[diff] [blame]	324	This is the original (uncompressed) size. This information is optional.
				325	The `Field_Size` is provided according to value of `Frame_Content_Size_flag`.
				326	The `Field_Size` can be equal to 0 (not present), 1, 2, 4 or 8 bytes.
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	327	Format is little-endian.
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	328
inikep	2fc3752	2016-07-25 12:47:02 +0200	[diff] [blame]	329	\| `Field_Size` \| Range \|
				330	\| ------------ \| ---------- \|
				331	\| 1 \| 0 - 255 \|
				332	\| 2 \| 256 - 65791\|
				333	\| 4 \| 0 - 2^32-1 \|
				334	\| 8 \| 0 - 2^64-1 \|
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	335
inikep	2fc3752	2016-07-25 12:47:02 +0200	[diff] [blame]	336	When `Field_Size` is 1, 4 or 8 bytes, the value is read directly.
				337	When `Field_Size` is 2, _the offset of 256 is added_.
inikep	f9c3cce	2016-07-25 11:04:56 +0200	[diff] [blame]	338	It's allowed to represent a small size (for example `18`) using any compatible variant.
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	339
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	340
inikep	f9c3cce	2016-07-25 11:04:56 +0200	[diff] [blame]	341	The structure of `Data_Block`
				342	-----------------------------
				343	The structure of `Data_Block` is following:
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	344
Yann Collet	c991cc1	2016-07-28 00:55:43 +0200	[diff] [blame]	345	\| `Last_Block` \| `Block_Type` \| `Block_Size` \| `Block_Content` \|
				346	\|:------------:\|:------------:\|:------------:\|:---------------:\|
				347	\| 1 bit \| 2 bits \| 21 bits \| n bytes \|
				348
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	349	The block header (`Last_Block`, `Block_Type`, and `Block_Size`) uses 3-bytes.
Yann Collet	c991cc1	2016-07-28 00:55:43 +0200	[diff] [blame]	350
				351	__`Last_Block`__
				352
				353	The lowest bit signals if this block is the last one.
				354	Frame ends right after this block.
				355	It may be followed by an optional `Content_Checksum` .
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	356
inikep	f9c3cce	2016-07-25 11:04:56 +0200	[diff] [blame]	357	__`Block_Type` and `Block_Size`__
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	358
Yann Collet	c991cc1	2016-07-28 00:55:43 +0200	[diff] [blame]	359	The next 2 bits represent the `Block_Type`,
				360	while the remaining 21 bits represent the `Block_Size`.
				361	Format is __little-endian__.
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	362
				363	There are 4 block types :
				364
inikep	f9c3cce	2016-07-25 11:04:56 +0200	[diff] [blame]	365	\| Value \| 0 \| 1 \| 2 \| 3 \|
				366	\| ------------ \| ----------- \| ----------- \| ------------------ \| --------- \|
Yann Collet	c991cc1	2016-07-28 00:55:43 +0200	[diff] [blame]	367	\| `Block_Type` \| `Raw_Block` \| `RLE_Block` \| `Compressed_Block` \| `Reserved`\|
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	368
inikep	f9c3cce	2016-07-25 11:04:56 +0200	[diff] [blame]	369	- `Raw_Block` - this is an uncompressed block.
				370	`Block_Size` is the number of bytes to read and copy.
				371	- `RLE_Block` - this is a single byte, repeated N times.
				372	In which case, `Block_Size` is the size to regenerate,
				373	while the "compressed" block is just 1 byte (the byte to repeat).
				374	- `Compressed_Block` - this is a [Zstandard compressed block](#the-format-of-compressed_block),
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	375	detailed in another section of this specification.
inikep	f9c3cce	2016-07-25 11:04:56 +0200	[diff] [blame]	376	`Block_Size` is the compressed size.
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	377	Decompressed size is unknown,
				378	but its maximum possible value is guaranteed (see below)
Yann Collet	c991cc1	2016-07-28 00:55:43 +0200	[diff] [blame]	379	- `Reserved` - this is not a block.
				380	This value cannot be used with current version of this specification.
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	381
				382	Block sizes must respect a few rules :
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	383	- In compressed mode, compressed size if always strictly `< decompressed size`.
				384	- Block decompressed size is always <= maximum back-reference distance .
				385	- Block decompressed size is always <= 128 KB
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	386
				387
inikep	f9c3cce	2016-07-25 11:04:56 +0200	[diff] [blame]	388	__`Block_Content`__
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	389
inikep	f9c3cce	2016-07-25 11:04:56 +0200	[diff] [blame]	390	The `Block_Content` is where the actual data to decode stands.
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	391	It might be compressed or not, depending on previous field indications.
				392	A data block is not necessarily "full" :
				393	since an arbitrary “flush” may happen anytime,
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	394	block decompressed content can be any size,
inikep	f9c3cce	2016-07-25 11:04:56 +0200	[diff] [blame]	395	up to `Block_Maximum_Decompressed_Size`, which is the smallest of :
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	396	- Maximum back-reference distance
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	397	- 128 KB
				398
				399
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	400
inikep	f9c3cce	2016-07-25 11:04:56 +0200	[diff] [blame]	401	The format of `Compressed_Block`
				402	--------------------------------
				403	The size of `Compressed_Block` must be provided using `Block_Size` field from `Data_Block`.
				404	The `Compressed_Block` has a guaranteed maximum regenerated size,
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	405	in order to properly allocate destination buffer.
inikep	f9c3cce	2016-07-25 11:04:56 +0200	[diff] [blame]	406	See [`Data_Block`](#the-structure-of-data_block) for more details.
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	407
				408	A compressed block consists of 2 sections :
inikep	0079425	2016-08-04 14:43:21 +0200	[diff] [blame]	409	- [`Literals_Section`](#literals_section)
				410	- [`Sequences_Section`](#sequences_section)
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	411
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	412	### Prerequisites
				413	To decode a compressed block, the following elements are necessary :
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	414	- Previous decoded blocks, up to a distance of `Window_Size`,
inikep	f9c3cce	2016-07-25 11:04:56 +0200	[diff] [blame]	415	or all previous blocks when `Single_Segment_flag` is set.
Yann Collet	698cb63	2016-07-03 18:49:35 +0200	[diff] [blame]	416	- List of "recent offsets" from previous compressed block.
Yann Collet	00d44ab	2016-07-04 01:29:47 +0200	[diff] [blame]	417	- Decoding tables of previous compressed block for each symbol type
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	418	(literals, litLength, matchLength, offset).
Yann Collet	698cb63	2016-07-03 18:49:35 +0200	[diff] [blame]	419
				420
inikep	f896c1d	2016-08-03 16:37:42 +0200	[diff] [blame]	421	### `Literals_Section`
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	422
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	423	During sequence phase, literals will be entangled with match copy operations.
				424	All literals are regrouped in the first part of the block.
				425	They can be decoded first, and then copied during sequence operations,
Yann Collet	00d44ab	2016-07-04 01:29:47 +0200	[diff] [blame]	426	or they can be decoded on the flow, as needed by sequence commands.
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	427
inikep	4f270ac	2016-08-04 11:25:52 +0200	[diff] [blame]	428	\| `Literals_Section_Header` \| [`Huffman_Tree_Description`] \| Stream1 \| [Stream2] \| [Stream3] \| [Stream4] \|
				429	\| ------------------------- \| ---------------------------- \| ------- \| --------- \| --------- \| --------- \|
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	430
inikep	f9c3cce	2016-07-25 11:04:56 +0200	[diff] [blame]	431	Literals can be stored uncompressed or compressed using Huffman prefix codes.
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	432	When compressed, an optional tree description can be present,
				433	followed by 1 or 4 streams.
				434
inikep	f9c3cce	2016-07-25 11:04:56 +0200	[diff] [blame]	435
inikep	f896c1d	2016-08-03 16:37:42 +0200	[diff] [blame]	436	#### `Literals_Section_Header`
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	437
Yann Collet	00d44ab	2016-07-04 01:29:47 +0200	[diff] [blame]	438	Header is in charge of describing how literals are packed.
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	439	It's a byte-aligned variable-size bitfield, ranging from 1 to 5 bytes,
Yann Collet	198e6aa	2016-07-20 20:12:24 +0200	[diff] [blame]	440	using little-endian convention.
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	441
inikep	f896c1d	2016-08-03 16:37:42 +0200	[diff] [blame]	442	\| `Literals_Block_Type` \| `Size_Format` \| `Regenerated_Size` \| [`Compressed_Size`] \|
				443	\| --------------------- \| ------------- \| ------------------ \| ----------------- \|
				444	\| 2 bits \| 1 - 2 bits \| 5 - 20 bits \| 0 - 18 bits \|
Yann Collet	198e6aa	2016-07-20 20:12:24 +0200	[diff] [blame]	445
				446	In this representation, bits on the left are smallest bits.
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	447
Yann Collet	70c2326	2016-08-21 00:24:18 +0200	[diff] [blame]	448	__`Literals_Block_Type`__
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	449
Yann Collet	198e6aa	2016-07-20 20:12:24 +0200	[diff] [blame]	450	This field uses 2 lowest bits of first byte, describing 4 different block types :
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	451
inikep	0132375	2016-08-25 12:20:38 +0200	[diff] [blame^]	452	\| `Literals_Block_Type` \| Value \|
				453	\| ----------------------------- \| ----- \|
				454	\| `Raw_Literals_Block` \| 0 \|
				455	\| `RLE_Literals_Block` \| 1 \|
				456	\| `Compressed_Literals_Block` \| 2 \|
				457	\| `Repeat_Stats_Literals_Block` \| 3 \|
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	458
inikep	f896c1d	2016-08-03 16:37:42 +0200	[diff] [blame]	459	- `Raw_Literals_Block` - Literals are stored uncompressed.
				460	- `RLE_Literals_Block` - Literals consist of a single byte value repeated N times.
				461	- `Compressed_Literals_Block` - This is a standard Huffman-compressed block,
inikep	586a055	2016-08-03 16:16:38 +0200	[diff] [blame]	462	starting with a Huffman tree description.
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	463	See details below.
inikep	f896c1d	2016-08-03 16:37:42 +0200	[diff] [blame]	464	- `Repeat_Stats_Literals_Block` - This is a Huffman-compressed block,
inikep	586a055	2016-08-03 16:16:38 +0200	[diff] [blame]	465	using Huffman tree _from previous Huffman-compressed literals block_.
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	466	Huffman tree description will be skipped.
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	467
Yann Collet	70c2326	2016-08-21 00:24:18 +0200	[diff] [blame]	468	__`Size_Format`__
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	469
inikep	f896c1d	2016-08-03 16:37:42 +0200	[diff] [blame]	470	`Size_Format` is divided into 2 families :
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	471
inikep	f896c1d	2016-08-03 16:37:42 +0200	[diff] [blame]	472	- For `Compressed_Block`, it requires to decode both `Compressed_Size`
				473	and `Regenerated_Size` (the decompressed size). It will also decode the number of streams.
inikep	0132375	2016-08-25 12:20:38 +0200	[diff] [blame^]	474	- For `Raw_Literals_Block` and `RLE_Literals_Block` it's enough to decode `Regenerated_Size`.
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	475
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	476	For values spanning several bytes, convention is little-endian.
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	477
inikep	9d003c1	2016-08-04 10:41:49 +0200	[diff] [blame]	478	__`Size_Format` for `Raw_Literals_Block` and `RLE_Literals_Block`__ :
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	479
inikep	0132375	2016-08-25 12:20:38 +0200	[diff] [blame^]	480	- Value x0 : `Regenerated_Size` uses 5 bits (0-31).
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	481	`Literals_Section_Header` has 1 byte.
				482	`Regenerated_Size = Header[0]>>3`
inikep	0132375	2016-08-25 12:20:38 +0200	[diff] [blame^]	483	- Value 01 : `Regenerated_Size` uses 12 bits (0-4095).
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	484	`Literals_Section_Header` has 2 bytes.
				485	`Regenerated_Size = (Header[0]>>4) + (Header[1]<<4)`
inikep	0132375	2016-08-25 12:20:38 +0200	[diff] [blame^]	486	- Value 11 : `Regenerated_Size` uses 20 bits (0-1048575).
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	487	`Literals_Section_Header` has 3 bytes.
				488	`Regenerated_Size = (Header[0]>>4) + (Header[1]<<4) + (Header[2]<<12)`
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	489
inikep	0132375	2016-08-25 12:20:38 +0200	[diff] [blame^]	490	Note : it's allowed to represent a short value (for example `13`)
				491	using a long format, accepting the increased compressed data size.
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	492
inikep	9d003c1	2016-08-04 10:41:49 +0200	[diff] [blame]	493	__`Size_Format` for `Compressed_Literals_Block` and `Repeat_Stats_Literals_Block`__ :
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	494
inikep	0132375	2016-08-25 12:20:38 +0200	[diff] [blame^]	495	- Value 00 : _A single stream_.
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	496	Both `Compressed_Size` and `Regenerated_Size` use 10 bits (0-1023).
				497	`Literals_Section_Header` has 3 bytes.
inikep	0132375	2016-08-25 12:20:38 +0200	[diff] [blame^]	498	- Value 01 : 4 streams.
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	499	Both `Compressed_Size` and `Regenerated_Size` use 10 bits (0-1023).
				500	`Literals_Section_Header` has 3 bytes.
inikep	0132375	2016-08-25 12:20:38 +0200	[diff] [blame^]	501	- Value 10 : 4 streams.
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	502	Both `Compressed_Size` and `Regenerated_Size` use 14 bits (0-16383).
				503	`Literals_Section_Header` has 4 bytes.
inikep	0132375	2016-08-25 12:20:38 +0200	[diff] [blame^]	504	- Value 11 : 4 streams.
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	505	Both `Compressed_Size` and `Regenerated_Size` use 18 bits (0-262143).
				506	`Literals_Section_Header` has 5 bytes.
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	507
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	508	Both `Compressed_Size` and `Regenerated_Size` fields follow little-endian convention.
inikep	f896c1d	2016-08-03 16:37:42 +0200	[diff] [blame]	509
Yann Collet	698cb63	2016-07-03 18:49:35 +0200	[diff] [blame]	510
inikep	4f270ac	2016-08-04 11:25:52 +0200	[diff] [blame]	511	#### `Huffman_Tree_Description`
Yann Collet	698cb63	2016-07-03 18:49:35 +0200	[diff] [blame]	512
inikep	0079425	2016-08-04 14:43:21 +0200	[diff] [blame]	513	This section is only present when `Literals_Block_Type` type is `Compressed_Block` (`2`).
Yann Collet	00d44ab	2016-07-04 01:29:47 +0200	[diff] [blame]	514
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	515	Prefix coding represents symbols from an a priori known alphabet
Yann Collet	b21e9cb	2016-07-15 17:31:13 +0200	[diff] [blame]	516	by bit sequences (codewords), one codeword for each symbol,
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	517	in a manner such that different symbols may be represented
				518	by bit sequences of different lengths,
				519	but a parser can always parse an encoded string
				520	unambiguously symbol-by-symbol.
Yann Collet	00d44ab	2016-07-04 01:29:47 +0200	[diff] [blame]	521
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	522	Given an alphabet with known symbol frequencies,
				523	the Huffman algorithm allows the construction of an optimal prefix code
				524	using the fewest bits of any possible prefix codes for that alphabet.
Yann Collet	00d44ab	2016-07-04 01:29:47 +0200	[diff] [blame]	525
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	526	Prefix code must not exceed a maximum code length.
Yann Collet	00d44ab	2016-07-04 01:29:47 +0200	[diff] [blame]	527	More bits improve accuracy but cost more header size,
Yann Collet	e557fd5	2016-07-17 16:21:37 +0200	[diff] [blame]	528	and require more memory or more complex decoding operations.
				529	This specification limits maximum code length to 11 bits.
Yann Collet	00d44ab	2016-07-04 01:29:47 +0200	[diff] [blame]	530
Yann Collet	698cb63	2016-07-03 18:49:35 +0200	[diff] [blame]	531
				532	##### Representation
				533
Yann Collet	00d44ab	2016-07-04 01:29:47 +0200	[diff] [blame]	534	All literal values from zero (included) to last present one (excluded)
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	535	are represented by `Weight` values, from 0 to `Max_Number_of_Bits`.
				536	Transformation from `Weight` to `Number_of_Bits` follows this formulae :
				537	`Number_of_Bits = Weight ? Max_Number_of_Bits + 1 - Weight : 0` .
				538	The last symbol's `Weight` is deduced from previously decoded ones,
Yann Collet	698cb63	2016-07-03 18:49:35 +0200	[diff] [blame]	539	by completing to the nearest power of 2.
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	540	This power of 2 gives `Max_Number_of_Bits`, the depth of the current tree.
Yann Collet	698cb63	2016-07-03 18:49:35 +0200	[diff] [blame]	541
				542	__Example__ :
inikep	586a055	2016-08-03 16:16:38 +0200	[diff] [blame]	543	Let's presume the following Huffman tree must be described :
Yann Collet	698cb63	2016-07-03 18:49:35 +0200	[diff] [blame]	544
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	545	\| literal \| 0 \| 1 \| 2 \| 3 \| 4 \| 5 \|
				546	\| ---------------- \| --- \| --- \| --- \| --- \| --- \| --- \|
				547	\| `Number_of_Bits` \| 1 \| 2 \| 3 \| 0 \| 4 \| 4 \|
Yann Collet	698cb63	2016-07-03 18:49:35 +0200	[diff] [blame]	548
				549	The tree depth is 4, since its smallest element uses 4 bits.
				550	Value `5` will not be listed, nor will values above `5`.
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	551	Values from `0` to `4` will be listed using `Weight` instead of `Number_of_Bits`.
				552	Weight formula is : `Weight = Number_of_Bits ? Max_Number_of_Bits + 1 - Number_of_Bits : 0`.
Yann Collet	698cb63	2016-07-03 18:49:35 +0200	[diff] [blame]	553	It gives the following serie of weights :
				554
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	555	\| `Weight` \| 4 \| 3 \| 2 \| 0 \| 1 \|
				556	\| -------- \| --- \| --- \| --- \| --- \| --- \|
				557	\| literal \| 0 \| 1 \| 2 \| 3 \| 4 \|
Yann Collet	698cb63	2016-07-03 18:49:35 +0200	[diff] [blame]	558
				559	The decoder will do the inverse operation :
Yann Collet	d916c90	2016-07-04 00:42:58 +0200	[diff] [blame]	560	having collected weights of literals from `0` to `4`,
				561	it knows the last literal, `5`, is present with a non-zero weight.
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	562	The weight of `5` can be deducted by joining to the nearest power of 2.
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	563	Sum of `2^(Weight-1)` (excluding 0) is :
				564	`8 + 4 + 2 + 0 + 1 = 15`.
Yann Collet	698cb63	2016-07-03 18:49:35 +0200	[diff] [blame]	565	Nearest power of 2 is 16.
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	566	Therefore, `Max_Number_of_Bits = 4` and `Weight[5] = 1`.
Yann Collet	698cb63	2016-07-03 18:49:35 +0200	[diff] [blame]	567
				568	##### Huffman Tree header
				569
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	570	This is a single byte value (0-255),
				571	which tells how to decode the list of weights.
Yann Collet	698cb63	2016-07-03 18:49:35 +0200	[diff] [blame]	572
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	573	- if `headerByte` >= 128 : this is a direct representation,
				574	where each `Weight` is written directly as a 4 bits field (0-15).
				575	The full representation occupies `((Number_of_Symbols+1)/2)` bytes,
				576	meaning it uses a last full byte even if `Number_of_Symbols` is odd.
				577	`Number_of_Symbols = headerByte - 127`.
				578	Note that maximum `Number_of_Symbols` is 255-127 = 128.
Yann Collet	26f6814	2016-07-08 10:42:59 +0200	[diff] [blame]	579	A larger serie must necessarily use FSE compression.
Yann Collet	698cb63	2016-07-03 18:49:35 +0200	[diff] [blame]	580
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	581	- if `headerByte` < 128 :
Yann Collet	698cb63	2016-07-03 18:49:35 +0200	[diff] [blame]	582	the serie of weights is compressed by FSE.
Yann Collet	26f6814	2016-07-08 10:42:59 +0200	[diff] [blame]	583	The length of the FSE-compressed serie is `headerByte` (0-127).
Yann Collet	698cb63	2016-07-03 18:49:35 +0200	[diff] [blame]	584
inikep	586a055	2016-08-03 16:16:38 +0200	[diff] [blame]	585	##### FSE (Finite State Entropy) compression of Huffman weights
Yann Collet	698cb63	2016-07-03 18:49:35 +0200	[diff] [blame]	586
Yann Collet	26f6814	2016-07-08 10:42:59 +0200	[diff] [blame]	587	The serie of weights is compressed using FSE compression.
Yann Collet	698cb63	2016-07-03 18:49:35 +0200	[diff] [blame]	588	It's a single bitstream with 2 interleaved states,
Yann Collet	26f6814	2016-07-08 10:42:59 +0200	[diff] [blame]	589	sharing a single distribution table.
Yann Collet	698cb63	2016-07-03 18:49:35 +0200	[diff] [blame]	590
				591	To decode an FSE bitstream, it is necessary to know its compressed size.
				592	Compressed size is provided by `headerByte`.
Yann Collet	38b75dd	2016-07-24 15:35:59 +0200	[diff] [blame]	593	It's also necessary to know its _maximum possible_ decompressed size,
Yann Collet	26f6814	2016-07-08 10:42:59 +0200	[diff] [blame]	594	which is `255`, since literal values span from `0` to `255`,
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	595	and last symbol value is not represented.
Yann Collet	698cb63	2016-07-03 18:49:35 +0200	[diff] [blame]	596
				597	An FSE bitstream starts by a header, describing probabilities distribution.
Yann Collet	00d44ab	2016-07-04 01:29:47 +0200	[diff] [blame]	598	It will create a Decoding Table.
Yann Collet	26f6814	2016-07-08 10:42:59 +0200	[diff] [blame]	599	Table must be pre-allocated, which requires to support a maximum accuracy.
inikep	586a055	2016-08-03 16:16:38 +0200	[diff] [blame]	600	For a list of Huffman weights, maximum accuracy is 7 bits.
Yann Collet	698cb63	2016-07-03 18:49:35 +0200	[diff] [blame]	601
Yann Collet	26f6814	2016-07-08 10:42:59 +0200	[diff] [blame]	602	FSE header is [described in relevant chapter](#fse-distribution-table--condensed-format),
				603	and so is [FSE bitstream](#bitstream).
				604	The main difference is that Huffman header compression uses 2 states,
				605	which share the same FSE distribution table.
Yann Collet	38b75dd	2016-07-24 15:35:59 +0200	[diff] [blame]	606	Bitstream contains only FSE symbols (no interleaved "raw bitfields").
Yann Collet	26f6814	2016-07-08 10:42:59 +0200	[diff] [blame]	607	The number of symbols to decode is discovered
				608	by tracking bitStream overflow condition.
				609	When both states have overflowed the bitstream, end is reached.
				610
Yann Collet	698cb63	2016-07-03 18:49:35 +0200	[diff] [blame]	611
inikep	586a055	2016-08-03 16:16:38 +0200	[diff] [blame]	612	##### Conversion from weights to Huffman prefix codes
Yann Collet	698cb63	2016-07-03 18:49:35 +0200	[diff] [blame]	613
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	614	All present symbols shall now have a `Weight` value.
				615	It is possible to transform weights into Number_of_Bits, using this formula :
				616	`Number_of_Bits = Number_of_Bits ? Max_Number_of_Bits + 1 - Weight : 0` .
Yann Collet	d916c90	2016-07-04 00:42:58 +0200	[diff] [blame]	617
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	618	Symbols are sorted by `Weight`. Within same `Weight`, symbols keep natural order.
				619	Symbols with a `Weight` of zero are removed.
Yann Collet	38b75dd	2016-07-24 15:35:59 +0200	[diff] [blame]	620	Then, starting from lowest weight, prefix codes are distributed in order.
Yann Collet	d916c90	2016-07-04 00:42:58 +0200	[diff] [blame]	621
				622	__Example__ :
Yann Collet	b21e9cb	2016-07-15 17:31:13 +0200	[diff] [blame]	623	Let's presume the following list of weights has been decoded :
Yann Collet	d916c90	2016-07-04 00:42:58 +0200	[diff] [blame]	624
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	625	\| Literal \| 0 \| 1 \| 2 \| 3 \| 4 \| 5 \|
				626	\| -------- \| --- \| --- \| --- \| --- \| --- \| --- \|
				627	\| `Weight` \| 4 \| 3 \| 2 \| 0 \| 1 \| 1 \|
Yann Collet	d916c90	2016-07-04 00:42:58 +0200	[diff] [blame]	628
				629	Sorted by weight and then natural order,
				630	it gives the following distribution :
				631
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	632	\| Literal \| 3 \| 4 \| 5 \| 2 \| 1 \| 0 \|
				633	\| ---------------- \| --- \| --- \| --- \| --- \| --- \| ---- \|
				634	\| `Weight` \| 0 \| 1 \| 1 \| 2 \| 3 \| 4 \|
				635	\| `Number_of_Bits` \| 0 \| 4 \| 4 \| 3 \| 2 \| 1 \|
				636	\| prefix codes \| N/A \| 0000\| 0001\| 001 \| 01 \| 1 \|
Yann Collet	d916c90	2016-07-04 00:42:58 +0200	[diff] [blame]	637
				638
Yann Collet	d916c90	2016-07-04 00:42:58 +0200	[diff] [blame]	639	#### Literals bitstreams
				640
				641	##### Bitstreams sizes
				642
				643	As seen in a previous paragraph,
inikep	586a055	2016-08-03 16:16:38 +0200	[diff] [blame]	644	there are 2 flavors of Huffman-compressed literals :
Yann Collet	d916c90	2016-07-04 00:42:58 +0200	[diff] [blame]	645	single stream, and 4-streams.
				646
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	647	4-streams is useful for CPU with multiple execution units and out-of-order operations.
Yann Collet	d916c90	2016-07-04 00:42:58 +0200	[diff] [blame]	648	Since each stream can be decoded independently,
				649	it's possible to decode them up to 4x faster than a single stream,
				650	presuming the CPU has enough parallelism available.
				651
				652	For single stream, header provides both the compressed and regenerated size.
				653	For 4-streams though,
				654	header only provides compressed and regenerated size of all 4 streams combined.
Yann Collet	d916c90	2016-07-04 00:42:58 +0200	[diff] [blame]	655	In order to properly decode the 4 streams,
				656	it's necessary to know the compressed and regenerated size of each stream.
				657
Yann Collet	38b75dd	2016-07-24 15:35:59 +0200	[diff] [blame]	658	Regenerated size of each stream can be calculated by `(totalSize+3)/4`,
				659	except for last one, which can be up to 3 bytes smaller, to reach `totalSize`.
Yann Collet	d916c90	2016-07-04 00:42:58 +0200	[diff] [blame]	660
Yann Collet	38b75dd	2016-07-24 15:35:59 +0200	[diff] [blame]	661	Compressed size is provided explicitly : in the 4-streams variant,
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	662	bitstreams are preceded by 3 unsigned little-endian 16-bits values.
Yann Collet	00d44ab	2016-07-04 01:29:47 +0200	[diff] [blame]	663	Each value represents the compressed size of one stream, in order.
Yann Collet	d916c90	2016-07-04 00:42:58 +0200	[diff] [blame]	664	The last stream size is deducted from total compressed size
Yann Collet	38b75dd	2016-07-24 15:35:59 +0200	[diff] [blame]	665	and from previously decoded stream sizes :
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	666	`stream4CSize = totalCSize - 6 - stream1CSize - stream2CSize - stream3CSize`.
Yann Collet	d916c90	2016-07-04 00:42:58 +0200	[diff] [blame]	667
Yann Collet	00d44ab	2016-07-04 01:29:47 +0200	[diff] [blame]	668	##### Bitstreams read and decode
Yann Collet	d916c90	2016-07-04 00:42:58 +0200	[diff] [blame]	669
				670	Each bitstream must be read _backward_,
				671	that is starting from the end down to the beginning.
				672	Therefore it's necessary to know the size of each bitstream.
				673
				674	It's also necessary to know exactly which _bit_ is the latest.
				675	This is detected by a final bit flag :
				676	the highest bit of latest byte is a final-bit-flag.
				677	Consequently, a last byte of `0` is not possible.
				678	And the final-bit-flag itself is not part of the useful bitstream.
Yann Collet	38b75dd	2016-07-24 15:35:59 +0200	[diff] [blame]	679	Hence, the last byte contains between 0 and 7 useful bits.
Yann Collet	d916c90	2016-07-04 00:42:58 +0200	[diff] [blame]	680
				681	Starting from the end,
				682	it's possible to read the bitstream in a little-endian fashion,
				683	keeping track of already used bits.
				684
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	685	Reading the last `Max_Number_of_Bits` bits,
Yann Collet	b21e9cb	2016-07-15 17:31:13 +0200	[diff] [blame]	686	it's then possible to compare extracted value to decoding table,
Yann Collet	d916c90	2016-07-04 00:42:58 +0200	[diff] [blame]	687	determining the symbol to decode and number of bits to discard.
				688
				689	The process continues up to reading the required number of symbols per stream.
				690	If a bitstream is not entirely and exactly consumed,
Yann Collet	b21e9cb	2016-07-15 17:31:13 +0200	[diff] [blame]	691	hence reaching exactly its beginning position with _all_ bits consumed,
Yann Collet	d916c90	2016-07-04 00:42:58 +0200	[diff] [blame]	692	the decoding process is considered faulty.
				693
Yann Collet	698cb63	2016-07-03 18:49:35 +0200	[diff] [blame]	694
inikep	4f270ac	2016-08-04 11:25:52 +0200	[diff] [blame]	695	### `Sequences_Section`
Yann Collet	00d44ab	2016-07-04 01:29:47 +0200	[diff] [blame]	696
				697	A compressed block is a succession of _sequences_ .
				698	A sequence is a literal copy command, followed by a match copy command.
				699	A literal copy command specifies a length.
				700	It is the number of bytes to be copied (or extracted) from the literal section.
				701	A match copy command specifies an offset and a length.
				702	The offset gives the position to copy from,
Yann Collet	b21e9cb	2016-07-15 17:31:13 +0200	[diff] [blame]	703	which can be within a previous block.
Yann Collet	00d44ab	2016-07-04 01:29:47 +0200	[diff] [blame]	704
Yann Collet	70c2326	2016-08-21 00:24:18 +0200	[diff] [blame]	705	When all _sequences_ are decoded,
				706	if there is any literal left in the _literal section_,
				707	these bytes are added at the end of the block.
				708
				709	The _Sequences_Section_ regroup all symbols required to decode commands.
				710	There are 3 symbol types : literals lengths, offsets and match lengths.
				711	They are encoded together, interleaved, in a single _bitstream_.
Yann Collet	00d44ab	2016-07-04 01:29:47 +0200	[diff] [blame]	712
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	713	Each symbol is a _code_ in its own context,
				714	which specifies a baseline and a number of bits to add.
Yann Collet	00d44ab	2016-07-04 01:29:47 +0200	[diff] [blame]	715	_Codes_ are FSE compressed,
				716	and interleaved with raw additional bits in the same bitstream.
				717
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	718	The Sequences section starts by a header,
				719	followed by optional Probability tables for each symbol type,
Yann Collet	00d44ab	2016-07-04 01:29:47 +0200	[diff] [blame]	720	followed by the bitstream.
				721
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	722	\| `Sequences_Section_Header` \| [`Literals_Length_Table`] \| [`Offset_Table`] \| [`Match_Length_Table`] \| bitStream \|
				723	\| -------------------------- \| ------------------------- \| ---------------- \| ---------------------- \| --------- \|
Yann Collet	c40ba71	2016-07-08 15:39:02 +0200	[diff] [blame]	724
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	725	To decode the Sequence section, it's required to know its size.
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	726	This size is deducted from `blockSize - literalSectionSize`.
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	727
				728
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	729	#### `Sequences_Section_Header`
Yann Collet	00d44ab	2016-07-04 01:29:47 +0200	[diff] [blame]	730
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	731	Consists in 2 items :
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	732	- `Number_of_Sequences`
				733	- Symbol compression modes
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	734
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	735	__`Number_of_Sequences`__
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	736
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	737	This is a variable size field using between 1 and 3 bytes.
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	738	Let's call its first byte `byte0`.
				739	- `if (byte0 == 0)` : there are no sequences.
				740	The sequence section stops there.
				741	Regenerated content is defined entirely by literals section.
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	742	- `if (byte0 < 128)` : `Number_of_Sequences = byte0` . Uses 1 byte.
				743	- `if (byte0 < 255)` : `Number_of_Sequences = ((byte0-128) << 8) + byte1` . Uses 2 bytes.
				744	- `if (byte0 == 255)`: `Number_of_Sequences = byte1 + (byte2<<8) + 0x7F00` . Uses 3 bytes.
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	745
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	746	__Symbol compression modes__
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	747
				748	This is a single byte, defining the compression mode of each symbol type.
				749
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	750	\|Bit number\| 7-6 \| 5-4 \| 3-2 \| 1-0 \|
				751	\| -------- \| ----------------------- \| -------------- \| -------------------- \| ---------- \|
				752	\|Field name\| `Literals_Lengths_Mode` \| `Offsets_Mode` \| `Match_Lengths_Mode` \| `Reserved` \|
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	753
				754	The last field, `Reserved`, must be all-zeroes.
				755
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	756	`Literals_Lengths_Mode`, `Offsets_Mode` and `Match_Lengths_Mode` define the compression mode of
				757	literals lengths, offsets and match lengths respectively.
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	758
				759	They follow the same enumeration :
				760
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	761	\| Value \| 0 \| 1 \| 2 \| 3 \|
				762	\| ------------------ \| ----------------- \| ---------- \| --------------------- \| ------------- \|
				763	\| `Compression_Mode` \| `Predefined_Mode` \| `RLE_Mode` \| `FSE_Compressed_Mode` \| `Repeat_Mode` \|
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	764
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	765	- `Predefined_Mode` : uses a predefined distribution table.
				766	- `RLE_Mode` : it's a single code, repeated `Number_of_Sequences` times.
				767	- `Repeat_Mode` : re-use distribution table from previous compressed block.
				768	- `FSE_Compressed_Mode` : standard FSE compression.
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	769	A distribution table will be present.
				770	It will be described in [next part](#distribution-tables).
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	771
				772	#### Symbols decoding
				773
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	774	##### Literals Length codes
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	775
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	776	Literals length codes are values ranging from `0` to `35` included.
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	777	They define lengths from 0 to 131071 bytes.
				778
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	779	\| `Literals_Length_Code` \| 0-15 \|
				780	\| ---------------------- \| ---------------------- \|
				781	\| length \| `Literals_Length_Code` \|
				782	\| `Number_of_Bits` \| 0 \|
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	783
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	784	\| `Literals_Length_Code` \| 16 \| 17 \| 18 \| 19 \| 20 \| 21 \| 22 \| 23 \|
				785	\| ---------------------- \| ---- \| ---- \| ---- \| ---- \| ---- \| ---- \| ---- \| ---- \|
				786	\| Baseline \| 16 \| 18 \| 20 \| 22 \| 24 \| 28 \| 32 \| 40 \|
				787	\| `Number_of_Bits` \| 1 \| 1 \| 1 \| 1 \| 2 \| 2 \| 3 \| 3 \|
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	788
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	789	\| `Literals_Length_Code` \| 24 \| 25 \| 26 \| 27 \| 28 \| 29 \| 30 \| 31 \|
				790	\| ---------------------- \| ---- \| ---- \| ---- \| ---- \| ---- \| ---- \| ---- \| ---- \|
				791	\| Baseline \| 48 \| 64 \| 128 \| 256 \| 512 \| 1024 \| 2048 \| 4096 \|
				792	\| `Number_of_Bits` \| 4 \| 6 \| 7 \| 8 \| 9 \| 10 \| 11 \| 12 \|
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	793
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	794	\| `Literals_Length_Code` \| 32 \| 33 \| 34 \| 35 \|
				795	\| ---------------------- \| ---- \| ---- \| ---- \| ---- \|
				796	\| Baseline \| 8192 \|16384 \|32768 \|65536 \|
				797	\| `Number_of_Bits` \| 13 \| 14 \| 15 \| 16 \|
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	798
				799	__Default distribution__
				800
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	801	When `Compression_Mode` is `Predefined_Mode`,
				802	a predefined distribution is used for FSE compression.
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	803
Yann Collet	c40ba71	2016-07-08 15:39:02 +0200	[diff] [blame]	804	Below is its definition. It uses an accuracy of 6 bits (64 states).
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	805	```
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	806	short literalsLength_defaultDistribution[36] =
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	807	{ 4, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1,
				808	2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 1, 1, 1, 1, 1,
				809	-1,-1,-1,-1 };
				810	```
				811
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	812	##### Match Length codes
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	813
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	814	Match length codes are values ranging from `0` to `52` included.
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	815	They define lengths from 3 to 131074 bytes.
				816
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	817	\| `Match_Length_Code` \| 0-31 \|
				818	\| ------------------- \| ----------------------- \|
				819	\| value \| `Match_Length_Code` + 3 \|
				820	\| `Number_of_Bits` \| 0 \|
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	821
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	822	\| `Match_Length_Code` \| 32 \| 33 \| 34 \| 35 \| 36 \| 37 \| 38 \| 39 \|
				823	\| ------------------- \| ---- \| ---- \| ---- \| ---- \| ---- \| ---- \| ---- \| ---- \|
				824	\| Baseline \| 35 \| 37 \| 39 \| 41 \| 43 \| 47 \| 51 \| 59 \|
				825	\| `Number_of_Bits` \| 1 \| 1 \| 1 \| 1 \| 2 \| 2 \| 3 \| 3 \|
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	826
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	827	\| `Match_Length_Code` \| 40 \| 41 \| 42 \| 43 \| 44 \| 45 \| 46 \| 47 \|
				828	\| ------------------- \| ---- \| ---- \| ---- \| ---- \| ---- \| ---- \| ---- \| ---- \|
				829	\| Baseline \| 67 \| 83 \| 99 \| 131 \| 258 \| 514 \| 1026 \| 2050 \|
				830	\| `Number_of_Bits` \| 4 \| 4 \| 5 \| 7 \| 8 \| 9 \| 10 \| 11 \|
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	831
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	832	\| `Match_Length_Code` \| 48 \| 49 \| 50 \| 51 \| 52 \|
				833	\| ------------------- \| ---- \| ---- \| ---- \| ---- \| ---- \|
				834	\| Baseline \| 4098 \| 8194 \|16486 \|32770 \|65538 \|
				835	\| `Number_of_Bits` \| 12 \| 13 \| 14 \| 15 \| 16 \|
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	836
				837	__Default distribution__
				838
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	839	When `Compression_Mode` is defined as `Predefined_Mode`,
				840	a predefined distribution is used for FSE compression.
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	841
				842	Here is its definition. It uses an accuracy of 6 bits (64 states).
				843	```
				844	short matchLengths_defaultDistribution[53] =
				845	{ 1, 4, 3, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1,
				846	1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
				847	1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,-1,-1,
				848	-1,-1,-1,-1,-1 };
				849	```
				850
				851	##### Offset codes
				852
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	853	Offset codes are values ranging from `0` to `N`.
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	854
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	855	A decoder is free to limit its maximum `N` supported.
				856	Recommendation is to support at least up to `22`.
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	857	For information, at the time of this writing.
				858	the reference decoder supports a maximum `N` value of `28` in 64-bits mode.
				859
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	860	An offset code is also the number of additional bits to read,
				861	and can be translated into an `Offset_Value` using the following formulae :
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	862
				863	```
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	864	Offset_Value = (1 << offsetCode) + readNBits(offsetCode);
				865	if (Offset_Value > 3) offset = Offset_Value - 3;
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	866	```
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	867	It means that maximum `Offset_Value` is `2^(N+1))-1` and it supports back-reference distance up to 2^(N+1))-4
				868	but is limited by [maximum back-reference distance](#window_descriptor).
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	869
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	870	Offset_Value from 1 to 3 are special : they define "repeat codes",
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	871	which means one of the previous offsets will be repeated.
				872	They are sorted in recency order, with 1 meaning the most recent one.
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	873	See [Repeat offsets](#repeat-offsets) paragraph.
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	874
				875	__Default distribution__
				876
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	877	When `Compression_Mode` is defined as `Predefined_Mode`,
				878	a predefined distribution is used for FSE compression.
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	879
				880	Here is its definition. It uses an accuracy of 5 bits (32 states),
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	881	and supports a maximum `N` of 28, allowing offset values up to 536,870,908 .
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	882
				883	If any sequence in the compressed block requires an offset larger than this,
				884	it's not possible to use the default distribution to represent it.
				885
				886	```
				887	short offsetCodes_defaultDistribution[53] =
				888	{ 1, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1,
				889	1, 1, 1, 1, 1, 1, 1, 1,-1,-1,-1,-1,-1 };
				890	```
				891
				892	#### Distribution tables
				893
				894	Following the header, up to 3 distribution tables can be described.
Yann Collet	10b9c13	2016-07-24 01:21:53 +0200	[diff] [blame]	895	When present, they are in this order :
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	896	- Literals lengths
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	897	- Offsets
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	898	- Match Lengths
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	899
Yann Collet	10b9c13	2016-07-24 01:21:53 +0200	[diff] [blame]	900	The content to decode depends on their respective encoding mode :
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	901	- `Predefined_Mode` : no content. Use predefined distribution table.
				902	- `RLE_Mode` : 1 byte. This is the only code to use across the whole compressed block.
				903	- `FSE_Compressed_Mode` : A distribution table is present.
				904	- `Repeat_Mode` : no content. Re-use distribution from previous compressed block.
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	905
				906	##### FSE distribution table : condensed format
				907
				908	An FSE distribution table describes the probabilities of all symbols
				909	from `0` to the last present one (included)
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	910	on a normalized scale of `1 << Accuracy_Log` .
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	911
				912	It's a bitstream which is read forward, in little-endian fashion.
				913	It's not necessary to know its exact size,
				914	since it will be discovered and reported by the decoding process.
				915
				916	The bitstream starts by reporting on which scale it operates.
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	917	`Accuracy_Log = low4bits + 5`.
Yann Collet	70c2326	2016-08-21 00:24:18 +0200	[diff] [blame]	918	Note that maximum `Accuracy_Log` for literal and match lengths is `9`,
				919	and for offsets is `8`. Higher values are considered errors.
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	920
				921	Then follow each symbol value, from `0` to last present one.
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	922	The number of bits used by each field is variable.
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	923	It depends on :
				924
				925	- Remaining probabilities + 1 :
				926	__example__ :
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	927	Presuming an `Accuracy_Log` of 8,
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	928	and presuming 100 probabilities points have already been distributed,
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	929	the decoder may read any value from `0` to `255 - 100 + 1 == 156` (included).
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	930	Therefore, it must read `log2sup(156) == 8` bits.
				931
				932	- Value decoded : small values use 1 less bit :
				933	__example__ :
				934	Presuming values from 0 to 156 (included) are possible,
				935	255-156 = 99 values are remaining in an 8-bits field.
				936	They are used this way :
				937	first 99 values (hence from 0 to 98) use only 7 bits,
				938	values from 99 to 156 use 8 bits.
				939	This is achieved through this scheme :
				940
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	941	\| Value read \| Value decoded \| Number of bits used \|
				942	\| ---------- \| ------------- \| ------------------- \|
				943	\| 0 - 98 \| 0 - 98 \| 7 \|
				944	\| 99 - 127 \| 99 - 127 \| 8 \|
				945	\| 128 - 226 \| 0 - 98 \| 7 \|
				946	\| 227 - 255 \| 128 - 156 \| 8 \|
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	947
				948	Symbols probabilities are read one by one, in order.
				949
				950	Probability is obtained from Value decoded by following formulae :
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	951	`Proba = value - 1`
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	952
				953	It means value `0` becomes negative probability `-1`.
				954	`-1` is a special probability, which means `less than 1`.
Yann Collet	c40ba71	2016-07-08 15:39:02 +0200	[diff] [blame]	955	Its effect on distribution table is described in [next paragraph].
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	956	For the purpose of calculating cumulated distribution, it counts as one.
				957
Yann Collet	c40ba71	2016-07-08 15:39:02 +0200	[diff] [blame]	958	[next paragraph]:#fse-decoding--from-normalized-distribution-to-decoding-tables
				959
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	960	When a symbol has a probability of `zero`,
				961	it is followed by a 2-bits repeat flag.
				962	This repeat flag tells how many probabilities of zeroes follow the current one.
				963	It provides a number ranging from 0 to 3.
				964	If it is a 3, another 2-bits repeat flag follows, and so on.
				965
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	966	When last symbol reaches cumulated total of `1 << Accuracy_Log`,
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	967	decoding is complete.
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	968	If the last symbol makes cumulated total go above `1 << Accuracy_Log`,
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	969	distribution is considered corrupted.
				970
Yann Collet	10b9c13	2016-07-24 01:21:53 +0200	[diff] [blame]	971	Then the decoder can tell how many bytes were used in this process,
				972	and how many symbols are present.
				973	The bitstream consumes a round number of bytes.
				974	Any remaining bit within the last byte is just unused.
				975
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	976	##### FSE decoding : from normalized distribution to decoding tables
				977
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	978	The distribution of normalized probabilities is enough
				979	to create a unique decoding table.
				980
				981	It follows the following build rule :
				982
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	983	The table has a size of `tableSize = 1 << Accuracy_Log`.
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	984	Each cell describes the symbol decoded,
				985	and instructions to get the next state.
				986
				987	Symbols are scanned in their natural order for `less than 1` probabilities.
				988	Symbols with this probability are being attributed a single cell,
				989	starting from the end of the table.
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	990	These symbols define a full state reset, reading `Accuracy_Log` bits.
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	991
				992	All remaining symbols are sorted in their natural order.
				993	Starting from symbol `0` and table position `0`,
				994	each symbol gets attributed as many cells as its probability.
				995	Cell allocation is spreaded, not linear :
				996	each successor position follow this rule :
				997
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	998	```
				999	position += (tableSize>>1) + (tableSize>>3) + 3;
				1000	position &= tableSize-1;
				1001	```
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	1002
				1003	A position is skipped if already occupied,
				1004	typically by a "less than 1" probability symbol.
				1005
				1006	The result is a list of state values.
				1007	Each state will decode the current symbol.
				1008
				1009	To get the Number of bits and baseline required for next state,
				1010	it's first necessary to sort all states in their natural order.
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	1011	The lower states will need 1 more bit than higher ones.
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	1012
				1013	__Example__ :
				1014	Presuming a symbol has a probability of 5.
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	1015	It receives 5 state values. States are sorted in natural order.
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	1016
				1017	Next power of 2 is 8.
				1018	Space of probabilities is divided into 8 equal parts.
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	1019	Presuming the `Accuracy_Log` is 7, it defines 128 states.
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	1020	Divided by 8, each share is 16 large.
				1021
				1022	In order to reach 8, 8-5=3 lowest states will count "double",
				1023	taking shares twice larger,
				1024	requiring one more bit in the process.
				1025
				1026	Numbering starts from higher states using less bits.
				1027
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	1028	\| state order \| 0 \| 1 \| 2 \| 3 \| 4 \|
				1029	\| ---------------- \| ----- \| ----- \| ------ \| ---- \| ----- \|
				1030	\| width \| 32 \| 32 \| 32 \| 16 \| 16 \|
				1031	\| `Number_of_Bits` \| 5 \| 5 \| 5 \| 4 \| 4 \|
				1032	\| range number \| 2 \| 4 \| 6 \| 0 \| 1 \|
				1033	\| baseline \| 32 \| 64 \| 96 \| 0 \| 16 \|
				1034	\| range \| 32-63 \| 64-95 \| 96-127 \| 0-15 \| 16-31 \|
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	1035
				1036	Next state is determined from current state
				1037	by reading the required number of bits, and adding the specified baseline.
Yann Collet	23f05cc	2016-07-04 16:13:11 +0200	[diff] [blame]	1038
				1039
				1040	#### Bitstream
Yann Collet	00d44ab	2016-07-04 01:29:47 +0200	[diff] [blame]	1041
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	1042	All sequences are stored in a single bitstream, read _backward_.
				1043	It is therefore necessary to know the bitstream size,
				1044	which is deducted from compressed block size.
				1045
Yann Collet	c40ba71	2016-07-08 15:39:02 +0200	[diff] [blame]	1046	The last useful bit of the stream is followed by an end-bit-flag.
Yann Collet	cd25a91	2016-07-05 11:50:37 +0200	[diff] [blame]	1047	Highest bit of last byte is this flag.
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	1048	It does not belong to the useful part of the bitstream.
				1049	Therefore, last byte has 0-7 useful bits.
				1050	Note that it also means that last byte cannot be `0`.
				1051
				1052	##### Starting states
				1053
				1054	The bitstream starts with initial state values,
				1055	each using the required number of bits in their respective _accuracy_,
				1056	decoded previously from their normalized distribution.
				1057
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	1058	It starts by `Literals_Length_State`,
				1059	followed by `Offset_State`,
				1060	and finally `Match_Length_State`.
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	1061
				1062	Reminder : always keep in mind that all values are read _backward_.
				1063
				1064	##### Decoding a sequence
				1065
				1066	A state gives a code.
				1067	A code provides a baseline and number of bits to add.
				1068	See [Symbol Decoding] section for details on each symbol.
				1069
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	1070	Decoding starts by reading the number of bits required to decode offset.
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	1071	It then does the same for match length,
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	1072	and then for literals length.
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	1073
Yann Collet	c40ba71	2016-07-08 15:39:02 +0200	[diff] [blame]	1074	Offset / matchLength / litLength define a sequence.
				1075	It starts by inserting the number of literals defined by `litLength`,
				1076	then continue by copying `matchLength` bytes from `currentPos - offset`.
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	1077
				1078	The next operation is to update states.
				1079	Using rules pre-calculated in the decoding tables,
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	1080	`Literals_Length_State` is updated,
				1081	followed by `Match_Length_State`,
				1082	and then `Offset_State`.
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	1083
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	1084	This operation will be repeated `Number_of_Sequences` times.
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	1085	At the end, the bitstream shall be entirely consumed,
				1086	otherwise bitstream is considered corrupted.
				1087
				1088	[Symbol Decoding]:#symbols-decoding
				1089
				1090	##### Repeat offsets
				1091
				1092	As seen in [Offset Codes], the first 3 values define a repeated offset.
				1093	They are sorted in recency order, with 1 meaning "most recent one".
				1094
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	1095	There is an exception though, when current sequence's literals length is `0`.
Yann Collet	917fe18	2016-07-31 04:01:57 +0200	[diff] [blame]	1096	In which case, repcodes are "pushed by one",
				1097	so 1 becomes 2, 2 becomes 3,
				1098	and 3 becomes "offset_1 - 1_byte".
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	1099
Yann Collet	917fe18	2016-07-31 04:01:57 +0200	[diff] [blame]	1100	On first block, offset history is populated by the following values : 1, 4 and 8 (in order).
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	1101
				1102	Then each block receives its start value from previous compressed block.
				1103	Note that non-compressed blocks are skipped,
				1104	they do not contribute to offset history.
				1105
				1106	[Offset Codes]: #offset-codes
				1107
				1108	###### Offset updates rules
				1109
Yann Collet	917fe18	2016-07-31 04:01:57 +0200	[diff] [blame]	1110	New offset take the lead in offset history,
				1111	up to its previous place if it was already present.
Yann Collet	9ca7336	2016-07-05 10:53:38 +0200	[diff] [blame]	1112
Yann Collet	917fe18	2016-07-31 04:01:57 +0200	[diff] [blame]	1113	It means that when repeat offset 1 (most recent) is used, history is unmodified.
				1114	When repeat offset 2 is used, it's swapped with offset 1.
Yann Collet	00d44ab	2016-07-04 01:29:47 +0200	[diff] [blame]	1115
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	1116
Yann Collet	bd10607	2016-07-08 19:16:57 +0200	[diff] [blame]	1117	Dictionary format
				1118	-----------------
				1119
				1120	`zstd` is compatible with "pure content" dictionaries, free of any format restriction.
				1121	But dictionaries created by `zstd --train` follow a format, described here.
				1122
				1123	__Pre-requisites__ : a dictionary has a known length,
				1124	defined either by a buffer limit, or a file size.
				1125
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	1126	\| `Magic_Number` \| `Dictionary_ID` \| `Entropy_Tables` \| `Content` \|
				1127	\| -------------- \| --------------- \| ---------------- \| --------- \|
Yann Collet	bd10607	2016-07-08 19:16:57 +0200	[diff] [blame]	1128
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	1129	__`Magic_Number`__ : 4 bytes ID, value 0xEC30A437, little-endian format
Yann Collet	bd10607	2016-07-08 19:16:57 +0200	[diff] [blame]	1130
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	1131	__`Dictionary_ID`__ : 4 bytes, stored in little-endian format.
				1132	`Dictionary_ID` can be any value, except 0 (which means no `Dictionary_ID`).
Yann Collet	722e14b	2016-07-08 19:22:16 +0200	[diff] [blame]	1133	It's used by decoders to check if they use the correct dictionary.
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	1134
				1135	_Reserved ranges :_
Yann Collet	f6ff53c	2016-07-15 17:03:38 +0200	[diff] [blame]	1136	If the frame is going to be distributed in a private environment,
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	1137	any `Dictionary_ID` can be used.
Yann Collet	f6ff53c	2016-07-15 17:03:38 +0200	[diff] [blame]	1138	However, for public distribution of compressed frames,
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	1139	the following ranges are reserved for future use and should not be used :
Yann Collet	6cacd34	2016-07-15 17:58:13 +0200	[diff] [blame]	1140
Yann Collet	70c2326	2016-08-21 00:24:18 +0200	[diff] [blame]	1141	- low range : 1 - 32767
				1142	- high range : >= (2^31)
Yann Collet	bd10607	2016-07-08 19:16:57 +0200	[diff] [blame]	1143
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	1144	__`Entropy_Tables`__ : following the same format as a [compressed blocks].
Yann Collet	bd10607	2016-07-08 19:16:57 +0200	[diff] [blame]	1145	They are stored in following order :
				1146	Huffman tables for literals, FSE table for offset,
Yann Collet	722e14b	2016-07-08 19:22:16 +0200	[diff] [blame]	1147	FSE table for matchLenth, and FSE table for litLength.
				1148	It's finally followed by 3 offset values, populating recent offsets,
inikep	2fc3752	2016-07-25 12:47:02 +0200	[diff] [blame]	1149	stored in order, 4-bytes little-endian each, for a total of 12 bytes.
Yann Collet	bd10607	2016-07-08 19:16:57 +0200	[diff] [blame]	1150
inikep	e81f2cb	2016-08-13 09:36:24 +0200	[diff] [blame]	1151	__`Content`__ : Where the actual dictionary content is.
Yann Collet	722e14b	2016-07-08 19:22:16 +0200	[diff] [blame]	1152	Content size depends on Dictionary size.
Yann Collet	bd10607	2016-07-08 19:16:57 +0200	[diff] [blame]	1153
inikep	f9c3cce	2016-07-25 11:04:56 +0200	[diff] [blame]	1154	[compressed blocks]: #the-format-of-compressed_block
Yann Collet	bd10607	2016-07-08 19:16:57 +0200	[diff] [blame]	1155
Yann Collet	2fa9904	2016-07-01 20:55:28 +0200	[diff] [blame]	1156
				1157	Version changes
				1158	---------------
Yann Collet	6fa05a2	2016-07-20 14:58:49 +0200	[diff] [blame]	1159	- 0.2.0 : numerous format adjustments for zstd v0.8
inikep	586a055	2016-08-03 16:16:38 +0200	[diff] [blame]	1160	- 0.1.2 : limit Huffman tree depth to 11 bits
Yann Collet	e557fd5	2016-07-17 16:21:37 +0200	[diff] [blame]	1161	- 0.1.1 : reserved dictID ranges
				1162	- 0.1.0 : initial release