Commit Graph

164 Commits

Author SHA1 Message Date
Diogo Pinela 8cd00a5262 archive/zip: prevent writing data for a directory
When creating a directory, Writer.Create now returns a dummy
io.Writer that always returns an error on Write.

Fixes #24043

Change-Id: I7792f54440d45d22d0aa174cba5015ed5fab1c5c
Reviewed-on: https://go-review.googlesource.com/108615
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-04-26 15:57:06 +00:00
Robert Griesemer 542ea5ad91 go/printer, gofmt: tuned table alignment for better results
The go/printer (and thus gofmt) uses a heuristic to determine
whether to break alignment between elements of an expression
list which is spread across multiple lines. The heuristic only
kicked in if the entry sizes (character length) was above a
certain threshold (20) and the ratio between the previous and
current entry size was above a certain value (4).

This heuristic worked reasonably most of the time, but also
led to unfortunate breaks in many cases where a single entry
was suddenly much smaller (or larger) then the previous one.

The behavior of gofmt was sufficiently mysterious in some of
these situations that many issues were filed against it.

The simplest solution to address this problem is to remove
the heuristic altogether and have a programmer introduce
empty lines to force different alignments if it improves
readability. The problem with that approach is that the
places where it really matters, very long tables with many
(hundreds, or more) entries, may be machine-generated and
not "post-processed" by a human (e.g., unicode/utf8/tables.go).

If a single one of those entries is overlong, the result
would be that the alignment would force all comments or
values in key:value pairs to be adjusted to that overlong
value, making the table hard to read (e.g., that entry may
not even be visible on screen and all other entries seem
spaced out too wide).

Instead, we opted for a slightly improved heuristic that
behaves much better for "normal", human-written code.

1) The threshold is increased from 20 to 40. This disables
the heuristic for many common cases yet even if the alignment
is not "ideal", 40 is not that many characters per line with
todays screens, making it very likely that the entire line
remains "visible" in an editor.

2) Changed the heuristic to not simply look at the size ratio
between current and previous line, but instead considering the
geometric mean of the sizes of the previous (aligned) lines.
This emphasizes the "overall picture" of the previous lines,
rather than a single one (which might be an outlier).

3) Changed the ratio from 4 to 2.5. Now that we ignore sizes
below 40, a ratio of 4 would mean that a new entry would have
to be 4 times bigger (160) or smaller (10) before alignment
would be broken. A ratio of 2.5 seems more sensible.

Applied updated gofmt to all of src and misc. Also tested
against several former issues that complained about this
and verified that the output for the given examples is
satisfactory (added respective test cases).

Some of the files changed because they were not gofmt-ed
in the first place.

For #644.
For #7335.
For #10392.
(and probably more related issues)

Fixes #22852.

Change-Id: I5e48b3d3b157a5cf2d649833b7297b33f43a6f6e
2018-04-04 13:39:34 -07:00
Brad Fitzpatrick 48db2c01b4 all: use strings.Builder instead of bytes.Buffer where appropriate
I grepped for "bytes.Buffer" and "buf.String" and mostly ignored test
files. I skipped a few on purpose and probably missed a few others,
but otherwise I think this should be most of them.

Updates #18990

Change-Id: I5a6ae4296b87b416d8da02d7bfaf981d8cc14774
Reviewed-on: https://go-review.googlesource.com/102479
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-03-26 23:05:53 +00:00
Yury Smolsky 3b7ad1680f archive/zip: improve Writer.Create documentation on how to add directories
FileHeader.Name also reflects this fact.

Fixes #24018

Change-Id: Id0860a9b23c264ac4c6ddd65ba20e0f1f36e4865
Reviewed-on: https://go-review.googlesource.com/97057
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
2018-02-26 19:58:48 +00:00
Agniva De Sarker 39852bf4cc archive/tar: remove loop label from reader
CL 14624 introduced this label. At that time,
the switch-case had a break to label statement which made this necessary.
But now, the code no longer has a break statement and it directly returns.

Hence, it is no longer necessary to have a label.

Change-Id: Idde0fcc4d2db2d76424679f5acfe33ab8573bce4
Reviewed-on: https://go-review.googlesource.com/96935
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
2018-02-25 21:50:17 +00:00
Joe Tsai 9697a119e6 archive/zip: fix handling of Info-ZIP Unix extended timestamps
The Info-ZIP Unix1 extra field is specified as such:
>>>
Value    Size   Description
-----    ----   -----------
0x5855   Short  tag for this extra block type ("UX")
TSize    Short  total data size for this block
AcTime   Long   time of last access (GMT/UTC)
ModTime  Long   time of last modification (GMT/UTC)
<<<

The previous handling was incorrect in that it read the AcTime field
instead of the ModTime field.

The test-osx.zip test unfortunately locked in the wrong behavior.
Manually parsing that ZIP file shows that the encoded MS-DOS
date and time are 0x4b5f and 0xa97d, which corresponds with a
date of 2017-10-31 21:11:58, which matches the correct mod time
(off by 1 second due to MS-DOS timestamp resolution).

Fixes #23901

Change-Id: I567824c66e8316b9acd103dbecde366874a4b7ef
Reviewed-on: https://go-review.googlesource.com/96895
Run-TryBot: Joe Tsai <joetsai@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-02-23 23:48:32 +00:00
Ilya Tocar 2629703a5c archive/zip: make benchmarks more representative
Currently zip benchmarks spend 60% in the rleBuffer code,
which is used only to test zip archive/zip itself:
    17.48s 37.02% 37.02%     18.12s 38.37%  archive/zip.(*rleBuffer).ReadAt
     9.51s 20.14% 57.16%     10.43s 22.09%  archive/zip.(*rleBuffer).Write
     9.15s 19.38% 76.54%     10.85s 22.98%  compress/flate.(*compressor).deflate

This means that benchmarks currently test performance of test helper.
Updating ReadAt/Write methods to be more performant makes benchmarks closer to real world.

name                       old time/op    new time/op    delta
CompressedZipGarbage-8       2.34ms ± 0%    2.34ms ± 1%     ~     (p=0.684 n=10+10)
Zip64Test-8                  58.1ms ± 2%    10.7ms ± 1%  -81.54%  (p=0.000 n=10+10)
Zip64TestSizes/4096-8        4.05µs ± 2%    3.65µs ± 5%   -9.96%  (p=0.000 n=9+10)
Zip64TestSizes/1048576-8      238µs ± 0%      43µs ± 0%  -82.06%  (p=0.000 n=10+10)
Zip64TestSizes/67108864-8    15.3ms ± 1%     2.6ms ± 0%  -83.12%  (p=0.000 n=10+9)

name                       old alloc/op   new alloc/op   delta
CompressedZipGarbage-8       17.9kB ±14%    16.0kB ±24%  -10.48%  (p=0.026 n=9+10)

name                       old allocs/op  new allocs/op  delta
CompressedZipGarbage-8         44.0 ± 0%      44.0 ± 0%     ~     (all equal)

Change-Id: Idfd920d0e4bed4aec2f5be84dc7e3919d9f1dd2d
Reviewed-on: https://go-review.googlesource.com/83857
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-02-21 00:32:33 +00:00
Ryuma Yoshida 8fc25b531b all: remove duplicate word "the"
Change-Id: Ia5908e94a6bd362099ca3c63f6ffb7e94457131d
GitHub-Last-Rev: 545a40571a
GitHub-Pull-Request: golang/go#23942
Reviewed-on: https://go-review.googlesource.com/95435
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-02-20 16:45:55 +00:00
Kunpei Sakai f356e83e2e all: remove "the" duplications
Change-Id: I1f25b11fb9b7cd3c09968ed99913dc85db2025ef
Reviewed-on: https://go-review.googlesource.com/94976
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-02-18 17:54:20 +00:00
Caio Marcelo de Oliveira Filho e4bde05104 archive/tar: automatically promote TypeRegA
Change Reader to promote TypeRegA to TypeReg in headers, unless their
name have a trailing slash which is already promoted to TypeDir. This
will allow client code to handle just TypeReg instead both TypeReg and
TypeRegA.

Change Writer to promote TypeRegA to TypeReg or TypeDir in the headers
depending on whether the name has a trailing slash. This normalization
is motivated by the specification (in pax(1)):

   0 represents a regular file. For backwards-compatibility, a
   typeflag value of binary zero ( '\0' ) should be recognized as
   meaning a regular file when extracting files from the
   archive. Archives written with this version of the archive file
   format create regular files with a typeflag value of the
   ISO/IEC 646:1991 standard IRV '0'.

Fixes #22768.

Change-Id: I149ec55824580d446cdde5a0d7a0457ad7b03466
Reviewed-on: https://go-review.googlesource.com/85656
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-02-13 18:36:49 +00:00
Russ Cox 9cc9f10855 archive/zip: add test for Modified vs ModTime behavior
Lock in fix for #22738, submitted in CL 78031.

Fixes #22738.

Change-Id: I6896feb158569e3f12fa7055387cbd7caad29ef4
Reviewed-on: https://go-review.googlesource.com/80635
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
2017-12-01 00:22:21 +00:00
Joe Tsai 9ec0c7abe1 archive/tar: use placeholder name for global PAX records
Several usages of tar (reasonably) just use the Header.FileInfo
to determine the type of the header. However, the os.FileMode type
is not expressive enough to represent "files" that are not files
at all, but some form of metadata.

Thus, Header{Typeflag: TypeXGlobalHeader}.FileInfo().Mode().IsRegular()
reports true, even though the expected result may have been false.

To reduce (not eliminate) the possibility of failure for such usages,
use the placeholder filename from the global PAX headers.
Thus, in the event the user did not handle special "meta" headers
specifically, they will just be written to disk as a regular file.

As an example use case, the "git archive --format=tgz" command produces
an archive where the first "file" is a global PAX header with the
name "global_pax_header". For users that do not explicitly check
the Header.Typeflag field to ignore such headers, they may end up
extracting a file named "global_pax_header". While it is a bogus file,
it at least does not stop the extraction process.

Updates #22748

Change-Id: I28448b528dcfacb4e92311824c33c71b482f49c9
Reviewed-on: https://go-review.googlesource.com/78355
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-11-29 19:04:57 +00:00
Joe Tsai d85a3535fe archive/zip: preserve old FileHeader.ModTime behavior
In order to avoid a regression where the date of the ModTime method
changed behavior, simply preserve the old behavior of determining
the date based on the legacy fields.

This ensures that anyone relying on ModTime before Go1.10 will have
the exact same behavior as before.
New users should use FileHeader.Modified instead.

We keep the UTC coersion logic in SetModTime since some users
manually compute timezone offsets in order to have precise control
over the MS-DOS time field.

Fixes #22738

Change-Id: Ib18b6ebd863bcf645748e083357dce9bc788cdba
Reviewed-on: https://go-review.googlesource.com/78031
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-11-29 16:28:13 +00:00
Russ Cox be67e269b4 archive/zip: replace Writer.Comment field with SetComment method
A method is more in keeping with the rest of the Writer API and
incidentally allows the comment error to be reported earlier.

Fixes #22737.

Change-Id: I1eee2103a0720c76d0c394ccd6541e6219996dc0
Reviewed-on: https://go-review.googlesource.com/79415
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
2017-11-29 16:27:42 +00:00
Joe Tsai ba2835db6c archive/tar: partially revert sparse file support
This CL removes the following APIs:
	type SparseEntry struct{ ... }
	type Header struct{ SparseHoles []SparseEntry; ... }
	func (*Header) DetectSparseHoles(f *os.File) error
	func (*Header) PunchSparseHoles(f *os.File) error
	func (*Reader) WriteTo(io.Writer) (int, error)
	func (*Writer) ReadFrom(io.Reader) (int, error)

This API was added during the Go1.10 dev cycle, and are safe to remove.

The rationale for reverting is because Header.DetectSparseHoles and
Header.PunchSparseHoles are functionality that probably better belongs in
the os package itself.

The other API like Header.SparseHoles, Reader.WriteTo, and Writer.ReadFrom
perform no OS specific logic and only perform the actual business logic of
reading and writing sparse archives. Since we do know know what the API added to
package os may look like, we preemptively revert these non-OS specific changes
as well by simply commenting them out.

Updates #13548
Updates #22735

Change-Id: I77842acd39a43de63e5c754bfa1c26cc24687b70
Reviewed-on: https://go-review.googlesource.com/78030
Reviewed-by: Russ Cox <rsc@golang.org>
2017-11-16 16:54:08 +00:00
Joe Kyo a8474c799f archive/zip: add documentation about compression methods
Change-Id: I491c5ddd1a5d8e55f8e6bb9377bc3811e42773f8
Reviewed-on: https://go-review.googlesource.com/77870
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-11-16 01:31:52 +00:00
Russ Cox 3a181dc7bc archive/zip: fix handling of replacement rune in UTF8 check
The replacement rune is a valid rune and can appear as itself in valid UTF8
(it encodes as three bytes). To check for invalid UTF8 it is necessary to
look for utf8.DecodeRune returning the replacement rune and size==1.

Change-Id: I169be8d1fe61605c921ac13cc2fde94f80f3463c
Reviewed-on: https://go-review.googlesource.com/78126
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
2017-11-15 21:30:30 +00:00
Joe Tsai d9fb9e7cf5 archive/tar: change error prefix
Change error message prefix from "tar:" to "archive/tar:" to maintain
backwards compatibility with Go1.9 and earlier in the unfortunate event
that someone is relying on string parsing of errors.

Fixes #22740

Change-Id: I59039c59818a0599e9d3b06bb5a531aa22a389b8
Reviewed-on: https://go-review.googlesource.com/77933
Reviewed-by: roger peppe <rogpeppe@gmail.com>
2017-11-15 18:56:32 +00:00
Awn 23c9db657e archive/tar: remove useless type conversions
Change-Id: I259a6ed6a1abc63d2dc39eca7e85f94cf38001cc
Reviewed-on: https://go-review.googlesource.com/47342
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-11-15 15:11:14 +00:00
Joe Tsai bdf30565e2 archive/zip: use Time.UTC instead of Time.In(time.UTC)
The former is more succinct and readable.

Change-Id: Ic249d1261a705ad715aeb611c70c7fa91db98254
Reviewed-on: https://go-review.googlesource.com/76830
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-11-10 00:27:33 +00:00
Stanislav Afanasev a4aa5c3181 archive/tar: a cosmetic fix after checking by golint
Existing methods regFileReader.LogicalRemaining and regFileReader.PhysicalRemaining have inconsistent reciever names with the previous name

Change-Id: Ief2024716737eaf482c4311f3fdf77d92801c36e
Reviewed-on: https://go-review.googlesource.com/76430
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
2017-11-07 20:11:28 +00:00
Joe Tsai 4fcc835971 archive/zip: add FileHeader.NonUTF8 field
The NonUTF8 field provides users with a way to explictly tell the
ZIP writer to avoid setting the UTF-8 flag.
This is necessary because many readers:
	1) (Still) do not support UTF-8
	2) And use the local system encoding instead

Thus, even though character encodings other than CP-437 and UTF-8
are not officially supported by the ZIP specification, pragmatically
the world has permitted use of them.

When a non-standard encoding is used, it is the user's responsibility
to ensure that the target system is expecting the encoding used
(e.g., producing a ZIP file you know is used on a Chinese version of Windows).

We adjust the detectUTF8 function to account for Shift-JIS and EUC-KR
not being identical to ASCII for two characters.

We don't need an API for users to explicitly specify that they are encoding
with UTF-8 since all single byte characters are compatible with all other
common encodings (Windows-1256, Windows-1252, Windows-1251, Windows-1250,
IEC-8859, EUC-KR, KOI8-R, Latin-1, Shift-JIS, GB-2312, GBK) except for
the non-printable characters and the backslash character (all of which
are invalid characters in a path name anyways).

Fixes #10741

Change-Id: I9004542d1d522c9137973f1b6e2b623fa54dfd66
Reviewed-on: https://go-review.googlesource.com/75592
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-11-06 21:35:59 +00:00
Joe Tsai 6e8894d5ff archive/zip: add FileHeader.Modified field
The ModifiedTime and ModifiedDate fields are not expressive enough
for many of the time extensions that have since been added to ZIP,
nor are they easy to access since they in a legacy MS-DOS format,
and must be set and retrieved via the SetModTime and ModTime methods.

Instead, we add new field Modified of time.Time type that contains
all of the previous information and more.

Support for extended timestamps have been attempted before, but the
change was reverted because it provided no ability for the user to
specify the timezone of the legacy MS-DOS fields.
Technically the old API did not either, but users were manually offsetting
the timestamp to achieve the same effect.

The Writer now writes the legacy timestamps according to the timezone
of the FileHeader.Modified field. When the Modified field is set via
the SetModTime method, it is in UTC, which preserves the old behavior.

The Reader attempts to determine the timezone if both the legacy
and extended timestamps are present since it can compute the delta
between the two values.

Since Modified is a superset of the information in ModifiedTime and ModifiedDate,
we mark ModifiedTime, ModifiedDate, ModTime, and SetModTime as deprecated.

Fixes #18359

Change-Id: I29c6bc0a62908095d02740df3e6902f50d3152f1
Reviewed-on: https://go-review.googlesource.com/74970
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-11-06 19:50:28 +00:00
Carl Mastrangelo f265f5db5d archive/zip, crypto/tls: use rand.Read instead of casting ints to bytes
Makes tests run ~1ms faster.

Change-Id: Ida509952469540280996d2bd9266724829e53c91
Reviewed-on: https://go-review.googlesource.com/47359
Reviewed-by: Filippo Valsorda <hi@filippo.io>
Run-TryBot: Filippo Valsorda <hi@filippo.io>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-11-01 05:51:30 +00:00
Joe Tsai 78805c07f4 archive/zip: restrict UTF-8 detection for comment and name fields
CL 39570 added support for automatically setting flag bit 11 to
indicate that the filename and comment fields are encoded in UTF-8,
which is (conventionally) the encoding using for most Go strings.

However, the detection added is too lose for two reasons:
* We need to ensure both fields are at least possibly UTF-8.
That is, if any field is definitely not UTF-8, then we can't set the bit.
* The utf8.ValidRune returns true for utf8.RuneError, which iterating
over a Go string automatically returns for invalid UTF-8.
Thus, we manually check for that value.

Updates #22367
Updates #10741

Change-Id: Ie8aae388432e546e44c6bebd06a00434373ca99e
Reviewed-on: https://go-review.googlesource.com/72791
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-10-25 22:16:46 +00:00
Joe Tsai 577aab0c59 archive/tar: ignore ChangeTime and AccessTime unless Format is specified
CL 59230 changed Writer.WriteHeader to ignore the ChangeTime and AccessTime
fields when considering using the USTAR format when the format is unspecified.
This policy is confusing and leads to unexpected behavior where some files
have ModTime only, while others have ModTime+AccessTime+ChangeTime if the
format became PAX for some unrelated reason (e.g., long pathname).

Change the policy to simply always ignore ChangeTime, AccessTime, and
sub-second time resolutions unless the user explicitly specifies a format.
This is a safe policy change since WriteHeader had no support for the
above features in any Go release.

Support for ChangeTime and AccessTime was added in CL 55570.
Support for sub-second times was added in CL 55552.
Both CLs landed after the latest Go release (i.e., Go1.9), which was
cut from the master branch around August 6th, 2017.

Change-Id: Ib82baa1bf9dd4573ed4f674b7d55d15f733a4843
Reviewed-on: https://go-review.googlesource.com/69296
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-10-10 20:13:27 +00:00
Joe Tsai 4cd58c2f26 archive/tar: improve handling of directory paths
The USTAR format says:
<<<
Implementors should be aware that the previous file format did not include
a mechanism to archive directory type files.
For this reason, the convention of using a filename ending with
<slash> was adopted to specify a directory on the archive.
>>>

In light of this suggestion, make the following changes:
* Writer.WriteHeader refuses to encode a header where a file that
is obviously a file-type has a trailing slash in the name.
* formatter.formatString avoids encoding a trailing slash in the event
that the string is truncated (the full string will be encoded elsewhere,
so stripping the slash is safe).
* Reader.Next treats a TypeRegA (which is the zero value of Typeflag)
as a TypeDir if the name has a trailing slash.

Change-Id: Ibf27aa8234cce2032d92e5e5b28546c2f2ae5ef6
Reviewed-on: https://go-review.googlesource.com/69293
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-10-10 20:11:26 +00:00
Marvin Stenger d153df8e4b all: revert "all: prefer strings.LastIndexByte over strings.LastIndex"
This reverts https://golang.org/cl/66372.

Updates #22148

Change-Id: I3e94af3dfc11a2883bf28e1d5e1f32f98760b3ee
Reviewed-on: https://go-review.googlesource.com/68431
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-10-05 23:19:42 +00:00
Joe Tsai e04ff3d133 archive/tar: fix typo in documentation
s/TypeSymLink/TypeSymlink/g

Change-Id: I2550843248eb27d90684d0036fe2add0b247ae5a
Reviewed-on: https://go-review.googlesource.com/67810
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-10-03 16:24:53 +00:00
Marvin Stenger d2826d3e06 all: prefer strings.LastIndexByte over strings.LastIndex
strings.LastIndexByte was introduced in go1.5 and it can be used
effectively wherever the second argument to strings.LastIndex is
exactly one byte long.

This avoids generating unnecessary string symbols and saves
a few calls to strings.LastIndex.

Change-Id: I7b5679d616197b055cffe6882a8675d24a98b574
Reviewed-on: https://go-review.googlesource.com/66372
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-09-27 00:54:24 +00:00
Joe Tsai 7246585f8c archive/tar: avoid empty IO operations
The interfaces for io.Reader and io.Writer permit calling Read/Write
with an empty buffer. However, this condition is often not well tested
and can lead to bugs in various implementations of io.Reader and io.Writer.
For example, see #22028 for buggy io.Reader in the bzip2 package.

We reduce the likelihood of hitting these bugs by adjusting
regFileReader.Read and regFileWriter.Write to avoid performing
Read and Write calls when the buffer is known to be empty.

Fixes #22029

Change-Id: Ie4a26be53cf87bc4d2abd951fa005db5871cc75c
Reviewed-on: https://go-review.googlesource.com/66111
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-09-25 23:06:04 +00:00
Giovanni Bajo 2f8b555de2 archive/tar: fix sparse files support on Darwin
Apple defined the SEEK_HOLE/SEEK_DATA constants in unistd.h
with swapped values, compared to all other UNIX systems.

Fixes #21970

Change-Id: I84a33e0741f0f33a2e04898e96b788b87aa9890f
Reviewed-on: https://go-review.googlesource.com/65570
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-09-23 16:53:28 +00:00
David du Colombier d83b23fd4f archive/tar: skip TestSparseFiles on Plan 9
CL 60871 added TestSparseFiles. This test is succeeding
on Plan 9 when executed on the ramfs file system, but
is failing when executed on the Fossil file system.

This may be due to an issue in the handling of sparse
files in the Fossil file system on Plan 9 that should
be investigated.

Updates #21977.

Change-Id: I177afff519b862a5c548e094203c219504852006
Reviewed-on: https://go-review.googlesource.com/65352
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-09-22 13:50:50 +00:00
Joe Tsai 718d9de60f archive/tar: perform test for hole-detection on specific builders
The test for hole-detection is heavily dependent on whether the
OS and underlying FS provides support for it.
Even on Linux, which has support for SEEK_HOLE and SEEK_DATA,
the underlying filesystem may not have support for it.
In order to avoid an ever-changing game of whack-a-mole,
we whitelist the specific builders that we expect the test to pass on.

Updates #21964

Change-Id: I7334e8532c96cc346ea83aabbb81b719685ad7e5
Reviewed-on: https://go-review.googlesource.com/65270
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-09-21 20:42:11 +00:00
Joe Tsai fdecab6ef0 archive/tar: make check for hole detection support more liberal
On most Unix OSes, lseek reports EINVAL when lacking SEEK_HOLE support.
However, there are reports that ENOTTY is reported instead.
Rather than tracking down every possible errno that may be used to
represent "not supported", just treat any non-nil error as meaning
that there is no support. This is the same strategy taken by the
GNU and BSD tar tools.

Fixes #21958

Change-Id: Iae68afdc934042f52fa914fca45f0ca89220c383
Reviewed-on: https://go-review.googlesource.com/65191
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-09-21 17:49:35 +00:00
Joe Tsai 1eacf78858 archive/tar: add Header.DetectSparseHoles and Header.PunchSparseHoles
To support the detection and creation of sparse files,
add two new methods:
	func Header.DetectSparseHoles(*os.File) error
	func Header.PunchSparseHoles(*os.File) error

DetectSparseHoles is intended to be used after FileInfoHeader
prior to serializing the Header with WriteHeader.
For each OS, it uses specialized logic to detect
the location of sparse holes. On most Unix systems, it uses
SEEK_HOLE and SEEK_DATA to query for the holes.
On Windows, it uses a specialized the FSCTL_QUERY_ALLOCATED_RANGES
syscall to query for all the holes.

PunchSparseHoles is intended to be used after Reader.Next
prior to populating the file with Reader.WriteTo.
On Windows, this uses the FSCTL_SET_ZERO_DATA syscall.
On other operating systems it simply truncates the file
to the end-offset of SparseHoles.

DetectSparseHoles and PunchSparseHoles are added as methods on
Header because they are heavily tied to the operating system,
for which there is already an existing precedence for
(since FileInfoHeader makes uses of OS-specific details).

Fixes #13548

Change-Id: I98a321dd1ce0165f3d143d4edadfda5e7db67746
Reviewed-on: https://go-review.googlesource.com/60871
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-09-20 22:12:38 +00:00
Joe Tsai 57c79febda archive/tar: add Reader.WriteTo and Writer.ReadFrom
To support the efficient packing and extracting of sparse files,
add two new methods:
	func Reader.WriteTo(io.Writer) (int64, error)
	func Writer.ReadFrom(io.Reader) (int64, error)

If the current archive entry is sparse and the provided io.{Reader,Writer}
is also an io.Seeker, then use Seek to skip past the holes.
If the last region in a file entry is a hole, then we seek to 1 byte
before the EOF:
	* for Reader.WriteTo to write a single byte
	to ensure that the resulting filesize is correct.
	* for Writer.ReadFrom to read a single byte
	to verify that the input filesize is correct.

The downside of this approach is when the last region in the sparse file
is a hole. In the case of Reader.WriteTo, the 1-byte write will cause
the last fragment to have a single chunk allocated.
However, the goal of ReadFrom/WriteTo is *not* the ability to
exactly reproduce sparse files (in terms of the location of sparse holes),
but rather to provide an efficient way to create them.

File systems already impose their own restrictions on how the sparse file
will be created. Some filesystems (e.g., HFS+) don't support sparseness and
seeking forward simply causes the FS to write zeros. Other filesystems
have different chunk sizes, which will cause chunk allocations at boundaries
different from what was in the original sparse file. In either case,
it should not be a normal expectation of users that the location of holes
in sparse files exactly matches the source.

For users that really desire to have exact reproduction of sparse holes,
they can wrap os.File with their own io.WriteSeeker that discards the
final 1-byte write and uses File.Truncate to resize the file to the
correct size.

Other reasons we choose this approach over special-casing *os.File because:
	* The Reader already has special-case logic for io.Seeker
	* As much as possible, we want to decouple OS-specific logic from
	Reader and Writer.
	* This allows other abstractions over *os.File to also benefit from
	the "skip past holes" logic.
	* It is easier to test, since it is harder to mock an *os.File.

Updates #13548

Change-Id: I0a4f293bd53d13d154a946bc4a2ade28a6646f6a
Reviewed-on: https://go-review.googlesource.com/60872
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-09-18 16:18:17 +00:00
Kunpei Sakai 5a986eca86 all: fix article typos
a -> an

Change-Id: I7362bdc199e83073a712be657f5d9ba16df3077e
Reviewed-on: https://go-review.googlesource.com/63850
Reviewed-by: Rob Pike <r@golang.org>
2017-09-15 02:39:16 +00:00
Tobias Klauser 3098cf0175 archive/tar: populate Devmajor and Devminor in FileInfoHeader on *BSD
Extract device major/minor number on all the BSDs and set Devmajor and
Devminor in FileInfoHeader. Code based on the corresponding Major/Minor
implementations in golang.org/x/sys/unix.

Change-Id: Ieffa7ce0cdbe6481950de666b2f5f88407a32382
Reviewed-on: https://go-review.googlesource.com/63470
Reviewed-by: Joe Tsai <joetsai@google.com>
2017-09-13 21:02:11 +00:00
Tobias Klauser ec359643a1 archive/tar: populate Devmajor and Devminor in FileInfoHeader on Darwin
Extract device major/minor number on Darwin and set Devmajor and
Devminor in FileInfoHeader. Code based on the Major/Minor functions for
Darwin in golang.org/x/sys/unix.

Change-Id: I51b65f607bfa2e6b177b8b66e2b246b771367b84
Reviewed-on: https://go-review.googlesource.com/60850
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-09-01 10:25:54 +00:00
Kenji Yano fda8269cc6 archive/zip: support "end of central directory record comment"
This change added support "end of central directory record comemnt" to the Writer.

There is a new exported field Writer.Comment in this change.
If invalid size of comment was set, Close returns error without closing resources.

Fixes #21634

Change-Id: Ifb62bc6c7f81b9257ac83eb570ad9915de727f8c
Reviewed-on: https://go-review.googlesource.com/59310
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-08-31 02:48:46 +00:00
Joe Tsai c1679286c3 archive/tar: minor doc fixes
Use "file" consistently instead of "entry".

Change-Id: Ia81c9665d0d956adb78f7fa49de40cdb87fba000
Reviewed-on: https://go-review.googlesource.com/60150
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-08-30 18:01:08 +00:00
Joe Tsai f85dc050ba archive/tar: require opt-in to PAX or GNU format for time features
Nearly every Header obtained from FileInfoHeader via the FS has
timestamps with sub-second resolution and the AccessTime
and ChangeTime fields populated. This forces the PAX format
to almost always be used, which has the following problems:
* PAX is still not as widely supported compared to USTAR
* The PAX headers will occupy at minimum 1KiB for every entry

The old behavior of tar Writer had no support for sub-second resolution
nor any support for AccessTime or ChangeTime, so had neither problem.
Instead the Writer would just truncate sub-second information and
ignore the AccessTime and ChangeTime fields.

In this CL, we preserve the behavior such that the *default* behavior
would output a USTAR header for most cases by truncating sub-second
time measurements and ignoring AccessTime and ChangeTime.
To use either of the features, users will need to explicitly specify
that the format is PAX or GNU.

The exact policy chosen is this:
* USTAR and GNU may still be chosen even if sub-second measurements
are present; they simply truncate the timestamp to the nearest second.
As before, PAX uses sub-second resolutions.
* If the Format is unspecified, then WriteHeader ignores AccessTime
and ChangeTime when using the USTAR format.

This ensures that USTAR may still be chosen for a vast majority of
file entries obtained through FileInfoHeader.

Updates #11171
Updates #17876

Change-Id: Icc5274d4245922924498fd79b8d3ae94d5717271
Reviewed-on: https://go-review.googlesource.com/59230
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-08-30 18:00:59 +00:00
Joe Tsai 0564e304a6 archive/tar: populate uname/gname/devmajor/devminor in FileInfoHeader
We take a best-effort approach since information for these fields
are not well supported on all platforms.

user.LookupId+user.LookupGroupId is currently 15x slower than os.Stat.
For performance reasons, we perpetually cache username and groupname
with a sync.Map. As a result, this function will not be updated whenever
the user or group names are renamed in the OS. However, this is a better
situation than before, where those fields were not populated at all.

Change-Id: I3cec8291aed7675dea89ee1cbda92bd493c8831f
Reviewed-on: https://go-review.googlesource.com/59531
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-08-30 00:52:31 +00:00
Joe Tsai bad6b6fa91 archive/tar: improve package documentation
Many aspects of the package is woefully undocumented.
With the recent flurry of improvements, the package is now at feature
parity with the GNU and TAR tools. Thoroughly all of the public API
and perform some minor stylistic cleanup in some code segments.

Change-Id: Ic892fd72c587f30dfe91d1b25b88c9c8048cc389
Reviewed-on: https://go-review.googlesource.com/59210
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-08-25 23:29:55 +00:00
Joe Tsai 19a995945f archive/tar: add raw support for global PAX records
The PAX specification says the following:
<<<
'g' represents global extended header records for the following files in the archive.
The format of these extended header records shall be as described in pax Extended Header.
Each value shall affect all subsequent files that do not override that value
in their own extended header record and until another global extended header record
is reached that provides another value for the same field.
>>>

This CL adds support for parsing and composing global PAX records,
but intentionally does not provide support for automatically
persisting the global state across files.

Changes made:
* When Reader encounters a TypeXGlobalRecord header, it parses the
PAX records and returns them to the user ad-verbatim. Reader does not
store them in its state, ensuring it has no effect on future Next calls.
* When Writer receives a TypeXGlobalRecord header, it writes the
PAX records to the archive ad-verbatim. It does not store them in
its state, ensuring it has no effect on future WriteHeader calls.
* The restriction regarding empty record values is lifted since this
value is used to represent deletion in global headers.

Why provide raw support only:
* Some archives in the wild have a global header section (often empty)
and it is the user's responsibility to manually read and discard it's body.
The logic added here allows users to more easily skip over these sections.
* For users that do care about global headers, having access to the raw
records allows them to implement the functionality of global headers themselves
and manually persist the global state across files.
* We can still upgrade to a full implementation in the future.

Why we don't provide full support:
* Even though the PAX specification describes their operation in detail,
both the GNU and BSD tar tools (which are the most common implementations)
do not have a consistent interpretation of many details.
* Global headers were a controversial feature in PAX, by admission of the
specification itself:
  <<<
  The concept of a global extended header (typeflag g) was controversial.

  The typeflag g global headers should not be used with interchange media that
  could suffer partial data loss in transporting the archive.
  >>>
* Having state persist from entry-to-entry complicates the implementation
for a feature that is not widely used and not well supported.

Change-Id: I1d904cacc2623ddcaa91525a5470b7dbe226c7e8
Reviewed-on: https://go-review.googlesource.com/59190
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
2017-08-25 23:03:52 +00:00
Joe Tsai a795ca51db archive/tar: support arbitrary PAX records
This CL adds the following new publicly visible API:
	type Header struct { ...; PAXRecords map[string]string }

The new Header.PAXRecords field is a map of all PAX extended header records.

We suggest (but do not enforce) that users use VENDOR-prefixed keys
according to the following in the PAX specification:
<<<
The standard developers have reserved keyword name space for vendor extensions.
It is suggested that the format to be used is:
	VENDOR.keyword
where VENDOR is the name of the vendor or organization in all uppercase letters.
>>>

When reading, the Header.PAXRecords is populated with all PAX records
encountered so far, including basic ones (e.g., "path", "mtime", etc).
When writing, the fields of Header will be merged into PAXRecords,
overwriting any records that may conflict.

Since PAXRecords is a more expressive feature than Xattrs and
is entirely a superset of Xattrs, we mark Xattrs as deprecated,
and steer users towards the new PAXRecords API.

The issue has a discussion about adding a Header.SetPAXRecord method
to help validate records and keep the Header fields in sync.
However, we do not include that in this CL since that helper
method can always be added in the future.

There is no support for global records.

Fixes #14472

Change-Id: If285a52749acc733476cf75a2c7ad15bc1542071
Reviewed-on: https://go-review.googlesource.com/58390
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-08-25 21:57:32 +00:00
Joe Tsai 3d62000adc archive/tar: return better WriteHeader errors
WriteHeader may fail to encode a header for any number of reasons,
which can be frustrating for the user when trying to create a tar archive.
As we validate the Header, we generate an informative error message
intended for human consumption and return that if and only if no
format can be selected.

This allows WriteHeader to return informative errors like:
    tar: cannot encode header: invalid PAX record: "linkpath = \x00hello"
    tar: cannot encode header: invalid PAX record: "SCHILY.xattr.foo=bar = baz"
    tar: cannot encode header: Format specifies GNU; and only PAX supports Xattrs
    tar: cannot encode header: Format specifies GNU; and GNU cannot encode ModTime=1969-12-31 15:59:59.0000005 -0800 PST
    tar: cannot encode header: Format specifies GNU; and GNU supports sparse files only with TypeGNUSparse
    tar: cannot encode header: Format specifies USTAR; and USTAR cannot encode ModTime=292277026596-12-04 07:30:07 -0800 PST
    tar: cannot encode header: Format specifies USTAR; and USTAR does not support sparse files
    tar: cannot encode header: Format specifies PAX; and only GNU supports TypeGNUSparse

Updates #18710

Change-Id: I82a498d6f29d02c4e73bce47b768eb578da8499c
Reviewed-on: https://go-review.googlesource.com/58310
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-08-25 05:21:00 +00:00
Joe Tsai 9d3d370632 archive/tar: support reporting and selecting the format
The Reader and Writer are now at feature parity,
meaning that everything that can be parsed by the Reader,
can also be composed by the Writer.

This position enables us to support selection of the format
in a backwards compatible way, since it ensures that everything
that can be read can also be round-trip written.

As such, we add the following new API:
    type Format int
            const FormatUnknown Format = 0 ...
    type Header struct { ...; Format Format }

The new Header.Format field is populated by the Reader on the
best guess on what the format is. Note that the Reader is very liberal
in what it permits, so a hybrid TAR file using aspects of multiple
formats can still be decoded, but will be reported as FormatUnknown.

Even though Reader has full support for V7 and basic support for STAR,
it will still report those formats as unknown (and the constants for
those formats are not even exported). The reasons for this is because
the Writer has no support for V7 or STAR. Leaving it as unknown allows
the Writer to choose a format usually USTAR or GNU that can encode
the equivalent Header.

When writing, the Header.allowedFormats will take the Format field
into consideration if it is a known format.

Fixes #18710

Change-Id: I00980c475d067c6969d3414e1ff0224fdd89cd49
Reviewed-on: https://go-review.googlesource.com/58230
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-08-24 01:35:39 +00:00
Joe Tsai e0ab505a97 archive/tar: implement Writer support for sparse files
This CL is the second step (of two; part1 is CL/56771) for adding
sparse file support to the Writer.

There are no new identifiers exported in this CL, but this does make
use of Header.SparseHoles added in part1. If the Typeflag is set to
TypeGNUSparse or len(SparseHoles) > 0, then the Writer will emit an
sparse file, where the holes must be written by the user as zeros.

If TypeGNUSparse is set, then the output file must use the GNU format.
Otherwise, it must use the PAX format (with GNU-defined PAX keys).

A future CL may export Reader.Discard and Writer.FillZeros,
but those methods are currently unexported, and only used by the
tests for efficiency reasons.
Calling Discard or FillZeros on a hole 10GiB in size does take
time, even if it is essentially a memcopy.

Updates #13548

Change-Id: Id586d9178c227c0577f796f731ae2cbb72355601
Reviewed-on: https://go-review.googlesource.com/57212
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-08-23 22:38:45 +00:00