Previously, the read method checked whether the current block
was fully consumed or not based on whether the buffer could be filled
with a non-zero number of bytes. This check is problematic because
zero bytes could be read if the provided buffer is empty.
We fix this case by simply checking for whether the input buffer
provided by the user was empty or not. If empty, we assume that
we could not read any bytes because the buffer was too small,
rather than indicating that the current block was fully exhausted.
This check causes bzip2.Reader to be unable to make progress
on the next block unless a non-empty buffer is provided.
However, that is an entirely reasonable expectation since a
non-empty buffer needs to be provided eventually anyways to
read the actual contents of subsequent blocks.
Fixes#22028
Change-Id: I2bb1b2d54e78567baf2bf7b490a272c0853d7bfe
Reviewed-on: https://go-review.googlesource.com/66110
Reviewed-by: Ian Lance Taylor <iant@golang.org>
The logic performs a series of shifts, which are useless given
that they are followed by an assignment that overrides the
value of the previous computation.
I suspect (but cannot prove) that this is leftover logic from an
original approach that attempted to store both the Huffman code
and the length within the same variable instead of using two
different variables as it currently does now.
Fixes#17949
Change-Id: Ibf6c807c6cef3b28bfdaf2b68d9bc13503ac21b2
Reviewed-on: https://go-review.googlesource.com/44091
Reviewed-by: Ian Lance Taylor <iant@golang.org>
This reverts commit 467109bf56.
Replaced by a improved strategy later in the CL relation chain.
Change-Id: Ib90813b5a6c4716b563c8496013d2d57f9c022b8
Reviewed-on: https://go-review.googlesource.com/36066
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Run-TryBot: Elias Naur <elias.naur@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
The working directory is now adjusted to match the typical Go test
working directory in main, as the old trick for adjusting earlier
stopped working with the latest version of LLDB bugs.
That means the small number of places where testdata files are
read before main is called no longer work. This CL adjusts those
reads to happen after main is called. (This has the bonus effect of
not reading some benchmark testdata files in all.bash.)
Fixes compress/bzip2, go/doc, go/parser, os, and time package
tests on the iOS builder.
Change-Id: If60f026aa7848b37511c36ac5e3985469ec25209
Reviewed-on: https://go-review.googlesource.com/35255
Run-TryBot: David Crawshaw <crawshaw@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
cmd and runtime were handled separately, and I'm intentionally skipped
syscall. This is the rest of the standard library.
CL generated mechanically with github.com/mdempsky/unconvert.
Change-Id: I9e0eff886974dedc37adb93f602064b83e469122
Reviewed-on: https://go-review.googlesource.com/22104
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This change removes a lot of dead code. Some of the code has never been
used, not even when it was first commited. The rest shouldn't have
survived refactors.
This change doesn't remove unused routines helpful for debugging, nor
does it remove code that's used in commented out blocks of code that are
only unused temporarily. Furthermore, unused constants weren't removed
when they were part of a set of constants from specifications.
One noteworthy omission from this CL are about 1000 lines of unused code
in cmd/fix, 700 lines of which are the typechecker, which hasn't been
used ever since the pre-Go 1 fixes have been removed. I wasn't sure if
this code should stick around for future uses of cmd/fix or be culled as
well.
Change-Id: Ib714bc7e487edc11ad23ba1c3222d1fd02e4a549
Reviewed-on: https://go-review.googlesource.com/20926
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Over the years as more bugs were discovered with the bzip2 library,
new Tests were appended the unit tests and the tests became gnarly.
Clean up the tests to be more consistent with modern Go style in
addition to coalescing common tests into a general version that
iterates over a list of input/output pairs. This has the advantage that
the input, output, and test code are all in the same area, rather than
being sprawled around the test file.
There is no loss of test coverage.
Change-Id: I377ed89378f0b89763d4a56ffc37b22d9c2a369e
Reviewed-on: https://go-review.googlesource.com/20133
Run-TryBot: Joe Tsai <joetsai@digital-static.net>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Unlike RFC 1951 (DEFLATE), bzip2 does not use zero-length Huffman codes
to indicate that the symbol is missing. Instead, bzip2 uses a sparse
bitmap to indicate which symbols are present. Thus, it is undefined what
happens when a length of zero is used. Thus, fix the parsing logic so that
the length cannot ever go below 1-bit similar to how the C logic does things.
To confirm that the C bzip2 utility chokes on this data:
$ echo "425a6836314159265359b1f7404b000000400040002000217d184682ee48
a70a12163ee80960" | xxd -r -p | bzip2 -d
bzip2: Data integrity error when decompressing
For reference see:
bzip2-1.0.6/decompress.c:320
Change-Id: Ic1568f8e7f80cdea51d887b4d712cc239c2fe85e
Reviewed-on: https://go-review.googlesource.com/20119
Run-TryBot: Joe Tsai <joetsai@digital-static.net>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Motivation:
* Previously, the size of the compressed data was used for metrics,
rather than the uncompressed size. This causes the library to appear
to perform poorly relative to C or other implementation. Switch it
to use the uncompressed size so that it matches how decompression
benchmarks are usually done (like in compress/flate). This also makes
it easier to compare bzip2 rates to other algorithms since they measure
performance in this way.
* Also, reset the timer after doing initialization work.
Change-Id: I32112c2ee8e7391e658c9cf31039f70a689d9b9d
Reviewed-on: https://go-review.googlesource.com/17611
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
The bzip2 block size is a multiple of 100*1000 not 100*1024.
Thus, the bzip2 decoder would incorrectly decode files with larger
block sizes when it should have otherwise failed.
Fortunately, we can correct this in a backwards compatible way since
Go has no implementation of a bzip2 encoder to produce bad blocks :)
To confirm that the C bzip2 utlity chokes on this data:
$ echo "425a683131415926535936dc55330063ffc0006000200020a40830008b00
08b8bb9229c28481b6e2a998" | xxd -r -p | bzip2 -d
bzip2: Data integrity error when decompressing.
Fixes#13941
Change-Id: I2402e8829a8027ef94dd4fac050b200440a3d4e4
Reviewed-on: https://go-review.googlesource.com/20011
Run-TryBot: Joe Tsai <joetsai@digital-static.net>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Commit 7a1fb95d50 strips non-free license
from Mark.Twain-Tom.Sawyer.txt, but forgot to remove it from the compressed
version of the file.
Update #13216
Change-Id: I60f53275d56ba5baa6898db47b1d41f85e985c00
Reviewed-on: https://go-review.googlesource.com/17264
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This change strips non-free license from Mark.Twain-Tom.Sawyer.txt along with all reference to Project Gutenberg in the file and the whole source tree. Making the file public domain again.
Fixes#13216
Change-Id: I2f41b0de225f627dde152efe93c006a4c24be668
Reviewed-on: https://go-review.googlesource.com/17196
Reviewed-by: Andrew Gerrand <adg@golang.org>
Issue 6754 reports that Go bzip2 Decode function is much slower
(about 2.5x in go1.5) than the Python equivalent (which is
actually just a wrapper around the usual C library) on random data.
Profiling the code shows that half a dozen of CMP instructions in a
tight loop are responsibile for most of the execution time.
This patch reduces the number of branches of the loop, greatly
improving performance on random data and speeding up decoding of
real data.
name old time/op new time/op delta
DecodeDigits-4 9.28ms ± 1% 8.05ms ± 1% -13.18% (p=0.000 n=15+14)
DecodeTwain-4 28.9ms ± 2% 26.4ms ± 1% -8.57% (p=0.000 n=15+14)
DecodeRand-4 3.94ms ± 1% 3.06ms ± 1% -22.45% (p=0.000 n=15+14)
name old speed new speed delta
DecodeDigits-4 4.65MB/s ± 1% 5.36MB/s ± 1% +15.21% (p=0.000 n=13+14)
DecodeTwain-4 4.32MB/s ± 2% 4.72MB/s ± 1% +9.36% (p=0.000 n=15+14)
DecodeRand-4 4.27MB/s ± 1% 5.51MB/s ± 1% +28.86% (p=0.000 n=15+14)
I've run some benchmark comparing Go bzip2 implementation with the
usual Linux bzip2 command (which is written in C). On my machine
this patch brings go1.5
from ~2.26x to ~1.50x of bzip2 time (on 64MB random data)
from ~1.70x to ~1.50x of bzip2 time (on 100MB english text)
from ~2.00x to ~1.88x of bzip2 time (on 64MB /dev/zero data)
Fixes#6754
Change-Id: I3cb12d2c0c2243c1617edef1edc88f05f91d26d1
Reviewed-on: https://go-review.googlesource.com/13853
Reviewed-by: Nigel Tao <nigeltao@golang.org>
Only documentation / comment changes. Update references to
point to golang.org permalinks or go.googlesource.com/go.
References in historical release notes under doc are left as is.
Change-Id: Icfc14e4998723e2c2d48f9877a91c5abef6794ea
Reviewed-on: https://go-review.googlesource.com/4060
Reviewed-by: Ian Lance Taylor <iant@golang.org>