The Go programming language
Go to file
Nigel Tao d8b7bd6a1f compress/flate: replace "Best Speed" with specialized version
This encoding algorithm, which prioritizes speed over output size, is
based on Snappy's LZ77-style encoder: github.com/golang/snappy

This commit keeps the diff between this package's encodeBestSpeed
function and and Snappy's encodeBlock function as small as possible (see
the diff below). Follow-up commits will improve this package's
performance and output size.

This package's speed benchmarks:

name                    old speed      new speed      delta
EncodeDigitsSpeed1e4-8  40.7MB/s ± 0%  73.0MB/s ± 0%   +79.18%  (p=0.008 n=5+5)
EncodeDigitsSpeed1e5-8  33.0MB/s ± 0%  77.3MB/s ± 1%  +134.04%  (p=0.008 n=5+5)
EncodeDigitsSpeed1e6-8  32.1MB/s ± 0%  82.1MB/s ± 0%  +156.18%  (p=0.008 n=5+5)
EncodeTwainSpeed1e4-8   42.1MB/s ± 0%  65.0MB/s ± 0%   +54.61%  (p=0.008 n=5+5)
EncodeTwainSpeed1e5-8   46.3MB/s ± 0%  80.0MB/s ± 0%   +72.81%  (p=0.008 n=5+5)
EncodeTwainSpeed1e6-8   47.3MB/s ± 0%  81.7MB/s ± 0%   +72.86%  (p=0.008 n=5+5)

Here's the milliseconds taken, before and after this commit, to compress
a number of test files:

Go's src/compress/testdata files:

     4          1 e.txt
     8          4 Mark.Twain-Tom.Sawyer.txt

github.com/golang/snappy's benchmark files:

     3          1 alice29.txt
    12          3 asyoulik.txt
     6          1 fireworks.jpeg
     1          1 geo.protodata
     1          0 html
     2          2 html_x_4
     6          3 kppkn.gtb
    11          4 lcet10.txt
     5          1 paper-100k.pdf
    14          6 plrabn12.txt
    17          6 urls.10K

Larger files linked to from
https://docs.google.com/spreadsheets/d/1VLxi-ac0BAtf735HyH3c1xRulbkYYUkFecKdLPH7NIQ/edit#gid=166102500

  2409       3182 adresser.001
 16757      11027 enwik9
 13764      12946 gob-stream
153978      74317 rawstudio-mint14.tar
  4371        770 sharnd.out

Output size is larger. In the table below, the first column is the input
size, the second column is the output size prior to this commit, the
third column is the output size after this commit.

    100003      47707      50006 e.txt
    387851     172707     182930 Mark.Twain-Tom.Sawyer.txt
    152089      62457      66705 alice29.txt
    125179      54503      57274 asyoulik.txt
    123093     122827     123108 fireworks.jpeg
    118588      18574      20558 geo.protodata
    102400      16601      17305 html
    409600      65506      70313 html_x_4
    184320      49007      50944 kppkn.gtb
    426754     166957     179355 lcet10.txt
    102400      82126      84937 paper-100k.pdf
    481861     218617     231988 plrabn12.txt
    702087     241774     258020 urls.10K
1073741824   43074110   57269781 adresser.001
1000000000  365772256  391052000 enwik9
1911399616  340364558  378679516 gob-stream
8558382592 3807229562 3972329193 rawstudio-mint14.tar
 200000000  200061040  200015265 sharnd.out

The diff between github.com/golang/snappy's encodeBlock function and
this commit's encodeBestSpeed function:

1c1,7
< func encodeBlock(dst, src []byte) (d int) {
---
> func encodeBestSpeed(dst []token, src []byte) []token {
> 	// This check isn't in the Snappy implementation, but there, the caller
> 	// instead of the callee handles this case.
> 	if len(src) < minNonLiteralBlockSize {
> 		return emitLiteral(dst, src)
> 	}
>
4c10
< 	// and len(src) <= maxBlockSize and maxBlockSize == 65536.
---
> 	// and len(src) <= maxStoreBlockSize and maxStoreBlockSize == 65535.
65c71
< 			if load32(src, s) == load32(src, candidate) {
---
> 			if s-candidate < maxOffset && load32(src, s) == load32(src, candidate) {
73c79
< 		d += emitLiteral(dst[d:], src[nextEmit:s])
---
> 		dst = emitLiteral(dst, src[nextEmit:s])
90c96
< 			// This is an inlined version of:
---
> 			// This is an inlined version of Snappy's:
93c99,103
< 			for i := candidate + 4; s < len(src) && src[i] == src[s]; i, s = i+1, s+1 {
---
> 			s1 := base + maxMatchLength
> 			if s1 > len(src) {
> 				s1 = len(src)
> 			}
> 			for i := candidate + 4; s < s1 && src[i] == src[s]; i, s = i+1, s+1 {
96c106,107
< 			d += emitCopy(dst[d:], base-candidate, s-base)
---
> 			// matchToken is flate's equivalent of Snappy's emitCopy.
> 			dst = append(dst, matchToken(uint32(s-base-3), uint32(base-candidate-minOffsetSize)))
114c125
< 			if uint32(x>>8) != load32(src, candidate) {
---
> 			if s-candidate >= maxOffset || uint32(x>>8) != load32(src, candidate) {
124c135
< 		d += emitLiteral(dst[d:], src[nextEmit:])
---
> 		dst = emitLiteral(dst, src[nextEmit:])
126c137
< 	return d
---
> 	return dst

This change is based on https://go-review.googlesource.com/#/c/21021/ by
Klaus Post, but it is a separate changelist as cl/21021 seems to have
stalled in code review, and the Go 1.7 feature freeze approaches.

Golang-dev discussion:
https://groups.google.com/d/topic/golang-dev/XYgHX9p8IOk/discussion and
of course cl/21021.

Change-Id: Ib662439417b3bd0b61c2977c12c658db3e44d164
Reviewed-on: https://go-review.googlesource.com/22370
Reviewed-by: Russ Cox <rsc@golang.org>
2016-04-29 06:58:42 +00:00
.github doc: update issue template 2016-03-09 00:52:16 +00:00
api api: update next.txt 2016-04-22 23:09:00 +00:00
doc spec: be more explicit about equivalence of empty string and absent field tags 2016-04-25 19:21:40 +00:00
lib/time lib/time: update to IANA release 2016d (Apr 2016). 2016-04-28 23:12:27 +00:00
misc misc/cgo/testcarchive: fix path of libgo.a for darwin/arm 2016-04-27 20:07:12 +00:00
src compress/flate: replace "Best Speed" with specialized version 2016-04-29 06:58:42 +00:00
test test: added test case for (fixed) issue 15470 2016-04-29 00:42:13 +00:00
.gitattributes
.gitignore .gitignore: don't ignore y.output 2016-04-17 22:41:13 +00:00
AUTHORS A+C: manual updates 2016-01-08 06:02:39 +00:00
CONTRIBUTING.md doc: use new Gerrit URL and mention our instance in CONTRIBUTING.md 2016-03-09 00:52:42 +00:00
CONTRIBUTORS C: fix jbd's identity 2016-04-07 18:25:13 +00:00
LICENSE
PATENTS
README.md
favicon.ico
robots.txt

README.md

The Go Programming Language

Go is an open source programming language that makes it easy to build simple, reliable, and efficient software.

Gopher image

For documentation about how to install and use Go, visit https://golang.org/ or load doc/install-source.html in your web browser.

Our canonical Git repository is located at https://go.googlesource.com/go. There is a mirror of the repository at https://github.com/golang/go.

Go is the work of hundreds of contributors. We appreciate your help!

To contribute, please read the contribution guidelines: https://golang.org/doc/contribute.html

Note that we do not accept pull requests and that we use the issue tracker for bug reports and proposals only. Please ask questions on https://forum.golangbridge.org or https://groups.google.com/forum/#!forum/golang-nuts.

Unless otherwise noted, the Go source files are distributed under the BSD-style license found in the LICENSE file.

--

Binary Distribution Notes

If you have just untarred a binary Go distribution, you need to set the environment variable $GOROOT to the full path of the go directory (the one containing this file). You can omit the variable if you unpack it into /usr/local/go, or if you rebuild from sources by running all.bash (see doc/install-source.html). You should also add the Go binary directory $GOROOT/bin to your shell's path.

For example, if you extracted the tar file into $HOME/go, you might put the following in your .profile:

export GOROOT=$HOME/go
export PATH=$PATH:$GOROOT/bin

See https://golang.org/doc/install or doc/install.html for more details.