Commit Graph

16 Commits

Author SHA1 Message Date
Emmanuel T Odeke e0ac989cf3 archive/tar: detect out of bounds accesses in PAX records resulting from padded lengths
Handles the case in which padding of a PAX record's length field
violates invariants about the formatting of record, whereby it no
longer matches the prescribed format:

    "%d %s=%s\n", <length>, <keyword>, <value>

as per:

    https://pubs.opengroup.org/onlinepubs/9699919799/utilities/pax.html#tag_20_92_13_03

0-padding, and paddings of other sorts weren't handled and we assumed
that only non-padded decimal lengths would be passed in.
Added test cases to ensure that the parsing still proceeds as expected.

The prior crashing repro:

    0000000000000000000000000000000030 mtime=1432668921.098285006\n30 ctime=2147483649.15163319

exposed the fallacy in the code, that assumed that the length would ALWAYS be a
non-padded decimal length string.

This bug has existed since Go1.1 as per CL 6700047.

Thanks to Josh Bleecher Snyder for fuzzing this package, and thanks to Tom
Thorogood for advocacy, raising parity with GNU Tar, but for providing more test cases.

Fixes #40196

Change-Id: I32e0af4887bc9221481bd9e8a5120a79f177f08c
Reviewed-on: https://go-review.googlesource.com/c/go/+/289629
Trust: Emmanuel Odeke <emmanuel@orijtech.com>
Trust: Joe Tsai <thebrokentoaster@gmail.com>
Run-TryBot: Emmanuel Odeke <emmanuel@orijtech.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
2021-02-09 05:28:50 +00:00
yangwenmai d92f8add32 archive/tar: fix typo in comment
Change-Id: Ifcc565b34b3c3bb7ee62bb0525648a5d2895bf0b
Reviewed-on: https://go-review.googlesource.com/c/go/+/282013
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Trust: Alberto Donizetti <alb.donizetti@gmail.com>
2021-01-08 02:03:24 +00:00
Robert Griesemer fae44a2be3 src, misc: apply gofmt
This applies the new gofmt literal normalizations to the library.

Change-Id: I8c1e8ef62eb556fc568872c9f77a31ef236348e7
Reviewed-on: https://go-review.googlesource.com/c/162539
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2019-02-19 20:38:28 +00:00
Awn 23c9db657e archive/tar: remove useless type conversions
Change-Id: I259a6ed6a1abc63d2dc39eca7e85f94cf38001cc
Reviewed-on: https://go-review.googlesource.com/47342
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-11-15 15:11:14 +00:00
Joe Tsai 4cd58c2f26 archive/tar: improve handling of directory paths
The USTAR format says:
<<<
Implementors should be aware that the previous file format did not include
a mechanism to archive directory type files.
For this reason, the convention of using a filename ending with
<slash> was adopted to specify a directory on the archive.
>>>

In light of this suggestion, make the following changes:
* Writer.WriteHeader refuses to encode a header where a file that
is obviously a file-type has a trailing slash in the name.
* formatter.formatString avoids encoding a trailing slash in the event
that the string is truncated (the full string will be encoded elsewhere,
so stripping the slash is safe).
* Reader.Next treats a TypeRegA (which is the zero value of Typeflag)
as a TypeDir if the name has a trailing slash.

Change-Id: Ibf27aa8234cce2032d92e5e5b28546c2f2ae5ef6
Reviewed-on: https://go-review.googlesource.com/69293
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-10-10 20:11:26 +00:00
Joe Tsai 5c20ffbb2f archive/tar: add support for long binary strings in GNU format
The GNU tar format defines the following type flags:
	TypeGNULongName = 'L' // Next file has a long name
	TypeGNULongLink = 'K' // Next file symlinks to a file w/ a long name

Anytime a string exceeds the field dedicated to store it, the GNU format
permits a fake "file" to be prepended where that file entry has a Typeflag
of 'L' or 'K' and the contents of the file is a NUL-terminated string.

Contrary to previous TODO comments,
the GNU format supports arbitrary strings (without NUL) rather UTF-8 strings.
The manual says the following:
<<<
The name, linkname, magic, uname, and gname are
null-terminated character strings
>>>
<<<
All characters in header blocks are represented
by using 8-bit characters in the local variant of ASCII.
>>>

From this description, we gather the following:
* We must forbid NULs in any GNU strings
* Any 8-bit value (other than NUL) is permitted

Since the modern world has moved to UTF-8, it is really difficult to
determine what a "local variant of ASCII" means. For this reason,
we treat strings as just an arbitrary binary string (without NUL)
and leave it to the user to determine the encoding of this string.
(Practically, it seems that UTF-8 is the typical encoding used
in GNU archives seen in the wild).

The implementation of GNU tar seems to confirm this interpretation
of the manual where it permits any arbitrary binary string to exist
within these fields so long as they do not contain the NUL character.

 $ touch `echo -e "not\x80\x81\x82\x83utf8"`
 $ gnutar -H gnu --tar -cvf gnu-not-utf8.tar $(echo -e "not\x80\x81\x82\x83utf8")

The fact that we permit arbitrary binary in GNU strings goes
hand-in-hand with the fact that GNU also permits a "base-256" encoding
of numeric fields, which is effectively two-complement binary.

Change-Id: Ic037ec6bed306d07d1312f0058594bd9b64d9880
Reviewed-on: https://go-review.googlesource.com/55573
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-08-16 00:39:32 +00:00
Joe Tsai 1da0e7e28e archive/tar: reject bad key-value pairs for PAX records
We forbid empty keys or keys with '=' because it leads to ambiguous parsing.
Relevent PAX specification:
<<<
A keyword shall not include an <equals-sign>.
>>>

Also, we forbid the writer from encoding records with an empty value.
While, this is a valid record syntactically, the semantics of an empty
value is that previous records with that key should be deleted.
Since we have no support (and probably never will) for global PAX records,
deletion is a non-sensible operation.
<<<
If the <value> field is zero length,
it shall delete any header block field,
previously entered extended header value,
or global extended header value of the same name.
>>>

Fixes #20698
Fixes #15567

Change-Id: Ia29c5c6ef2e36cd9e6d7f6cff10e92b96a62f0d1
Reviewed-on: https://go-review.googlesource.com/55571
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-08-15 02:29:29 +00:00
Joe Tsai 2bcc24e977 archive/tar: support PAX subsecond resolution times
Add support for PAX subsecond resolution times. Since the parser
supports negative timestamps, the formatter also handles negative
timestamps.

The relevant PAX specification is:
<<<
Portable file timestamps cannot be negative. If pax encounters a
file with a negative timestamp in copy or write mode, it can reject
the file, substitute a non-negative timestamp, or generate a
non-portable timestamp with a leading '-'.
>>>

<<<
All of these time records shall be formatted as a decimal
representation of the time in seconds since the Epoch.
If a <period> ( '.' ) decimal point character is present,
the digits to the right of the point shall represent the units of
a subsecond timing granularity, where the first digit is tenths of
a second and each subsequent digit is a tenth of the previous digit.
>>>

Fixes #11171

Change-Id: Ied108f3d2654390bc1b0ddd66a4081c2b83e490b
Reviewed-on: https://go-review.googlesource.com/55552
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-08-15 02:20:22 +00:00
Joe Tsai 7ae9561610 archive/tar: implement specialized logic for PAX format
Rather than going through writeHeader, which attempts to handle all formats,
implement writePAXHeader, which only has an understanding of the PAX format.

In PAX, the USTAR header is filled out in a best-effort manner.
Thus, we change logic of formatString and formatOctal to try their best to
output something (possibly truncated) in the event of an error.

The new implementation of PAX headers causes several tests to fail.
An investigation into the new output reveals that the new behavior is correct,
while the tests had actually locked in incorrect behavior before.

A dump of the differences is listed below (-before, +after):

<< writer-big.tar >>

This change is due to fact that we changed the Header.Devminor to force the
tar.Writer to choose the GNU format over the PAX one.
The ability to control the output is an open issue (see #18710).
- 00000150  00 30 30 30 30 30 30 30  00 00 00 00 00 00 00 00  |.0000000........|
+ 00000150  00 ff ff ff ff ff ff ff  ff 00 00 00 00 00 00 00  |................|

<< writer-big-long.tar>>

The previous logic generated the GNU magic values for a PAX file.
The new logic correctly uses the USTAR magic values.
- 00000100  00 75 73 74 61 72 20 20  00 00 00 00 00 00 00 00  |.ustar  ........|
- 00000500  00 75 73 74 61 72 20 20  00 67 75 69 6c 6c 61 75  |.ustar  .guillau|
+ 00000100  00 75 73 74 61 72 00 30  30 00 00 00 00 00 00 00  |.ustar.00.......|
+ 00000500  00 75 73 74 61 72 00 30  30 67 75 69 6c 6c 61 75  |.ustar.00guillau|

The previous logic tried to use the specified timestmap in the PAX headers file,
but this is problematic as this timestamp can overflow, defeating the point
of using PAX, which is intended to extend tar.
The new logic uses the zero timestamp similar to what GNU and BSD tar do.
- 00000080  30 30 30 30 32 33 32 00  31 32 33 33 32 37 37 30  |0000232.12332770|
+ 00000080  30 30 30 30 32 35 36 00  30 30 30 30 30 30 30 30  |0000256.00000000|

The previous logic populated the devminor and devmajor fields.
The new logic leaves them zeroed just like what GNU and BSD tar do.
- 00000140  00 00 00 00 00 00 00 00  00 30 30 30 30 30 30 30  |.........0000000|
- 00000150  00 30 30 30 30 30 30 30  00 00 00 00 00 00 00 00  |.0000000........|
+ 00000140  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
+ 00000150  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

The previous logic uses PAX headers, but fails to add a record for the size.
The new logic does properly add a record for the size.
- 00000290  31 36 67 69 67 2e 74 78  74 0a 00 00 00 00 00 00  |16gig.txt.......|
- 000002a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
+ 00000290  31 36 67 69 67 2e 74 78  74 0a 32 30 20 73 69 7a  |16gig.txt.20 siz|
+ 000002a0  65 3d 31 37 31 37 39 38  36 39 31 38 34 0a 00 00  |e=17179869184...|

The previous logic encoded the size as a base-256 field,
which is only valid in GNU, but the previous PAX headers implies this should
be a PAX file. This result in a strange hybrid that is neither GNU nor PAX.
The new logic uses PAX headers to store the size.
- 00000470  37 35 30 00 30 30 30 31  37 35 30 00 80 00 00 00  |750.0001750.....|
- 00000480  00 00 00 04 00 00 00 00  31 32 33 33 32 37 37 30  |........12332770|
+ 00000470  37 35 30 00 30 30 30 31  37 35 30 00 30 30 30 30  |750.0001750.0000|
+ 00000480  30 30 30 30 30 30 30 00  31 32 33 33 32 37 37 30  |0000000.12332770|

<< ustar.issue12594.tar >>

The previous logic used the specified timestamp for the PAX headers file.
The new logic just uses the zero timestmap.
- 00000080  30 30 30 30 32 33 31 00  31 32 31 30 34 34 30 32  |0000231.12104402|
+ 00000080  30 30 30 30 32 33 31 00  30 30 30 30 30 30 30 30  |0000231.00000000|

The previous logic populated the devminor and devmajor fields.
The new logic leaves them zeroed just like what GNU and BSD tar do.
- 00000140  00 00 00 00 00 00 00 00  00 30 30 30 30 30 30 30  |.........0000000|
- 00000150  00 30 30 30 30 30 30 30  00 00 00 00 00 00 00 00  |.0000000........|
+ 00000140  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
+ 00000150  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

Change-Id: I33419eb1124951968e9d5a10d50027e03133c811
Reviewed-on: https://go-review.googlesource.com/55231
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-08-14 06:26:35 +00:00
Joe Tsai 1d81251599 archive/tar: simplify toASCII and parseString
Use a simple []byte instead of bytes.Buffer to create a string.
Use bytes.IndexByte instead of our own for loop.

Change-Id: Ic4a1161d79017fd3af086a05c53d5f20a5f09326
Reviewed-on: https://go-review.googlesource.com/54752
Reviewed-by: Avelino <t@avelino.xxx>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
2017-08-13 02:32:28 +00:00
Agniva De Sarker 23cd87eb0a archive/tar: optimize formatPAXRecord() call
By replacing fmt.Sprintf with a simple string concat, we see
pretty good improvements across the board on time and memory.

name             old time/op    new time/op    delta
FormatPAXRecord     683ns ± 2%     210ns ± 5%  -69.22%  (p=0.000 n=10+10)

name             old alloc/op   new alloc/op   delta
FormatPAXRecord      112B ± 0%       32B ± 0%  -71.43%  (p=0.000 n=10+10)

name             old allocs/op  new allocs/op  delta
FormatPAXRecord      8.00 ± 0%      2.00 ± 0%  -75.00%  (p=0.000 n=10+10)

Ran with - -cpu=1 -count=10 on an AMD64 i5-5200U CPU @ 2.20GHz

Using the following benchmark:
func BenchmarkFormatPAXRecord(b *testing.B) {
  for n := 0; n < b.N; n++ {
    formatPAXRecord("foo", "bar")
  }
}

Change-Id: I828ddbafad2e5d937f0cf5f777b512638344acfc
Reviewed-on: https://go-review.googlesource.com/55210
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
2017-08-12 04:52:27 +00:00
Joe Tsai 310ba82828 archive/tar: ensure input fits in octal field
The prior logic would over-write the NUL-terminator if the octal value
was long enough. In order to prevent this, we add a fitsInOctal function
that does the proper check.

The relevant USTAR specification about NUL-terminator is:
<<<
Each numeric field is terminated by one or more <space> or NUL characters.
>>>

Change-Id: I6fbc6e8fe71168727eea201925d0fe08d43116ac
Reviewed-on: https://go-review.googlesource.com/54432
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-08-11 03:25:17 +00:00
Joe Tsai 019d8a07e1 archive/tar: forbid NUL character in string fields
USTAR and GNU strings are NUL-terminated. Thus, we should never
allow the NUL terminator, otherwise we will lose data round-trip.

Relevant specification text:
<<<
The fields magic, uname, and gname are character strings each terminated by a NUL character.
>>>

Technically, PAX keys and values should be UTF-8, but the observance
of invalid files in the wild causes us to be more liberal.
<<<
The <length> field, <blank>, <equals-sign>, and <newline> shown shall
be limited to the portable character set, as encoded in UTF-8.
>>>

Thus, we only reject NULs in PAX keys, and NULs for PAX values
representing the USTAR string fields (i.e., path, linkpath, uname, gname).
These are treated more strictly because they represent strings that
are typically represented as C-strings on POSIX systems.

Change-Id: I305b794d9d966faad852ff660bd0b3b0964e52bf
Reviewed-on: https://go-review.googlesource.com/14724
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-08-11 03:12:47 +00:00
Joe Tsai ef8e85e8b8 archive/tar: fix parsePAXTime
Issues fixed:
* Could not handle quantity of seconds greater than 1<<31 on
32bit machines since strconv.ParseInt did not treat integers as 64b.
* Did not handle negative timestamps properly if nanoseconds were used.
Note that "-123.456" should result in a call to time.Unix(-123, -456000000).
* Incorrectly allowed a '-' right after the '.' (e.g., -123.-456)
* Did not detect invalid input after the truncation point (e.g., 123.123456789badbadbad).

Note that negative timestamps are allowed by PAX, but are not guaranteed
to be portable. See the relevant specification:
<<<
If pax encounters a file with a negative timestamp in copy or write mode,
it can reject the file, substitute a non-negative timestamp, or generate
a non-portable timestamp with a leading '-'.
>>>

Since the previous behavior already partially supported negative timestamps,
we are bound by Go's compatibility rules to keep support for them.
However, we should at least make sure we handle them properly.

Change-Id: I5686997708bfb59110ea7981175427290be737d1
Reviewed-on: https://go-review.googlesource.com/31441
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-10-20 01:06:09 +00:00
Joe Tsai 14e545b60a archive/tar: reduce allocations in formatOctal
Change-Id: I9ddb7d2a97d28aba7a107b65f278993daf7807fa
Reviewed-on: https://go-review.googlesource.com/30960
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-10-12 21:27:51 +00:00
Joe Tsai 6fea452e38 archive/tar: move parse/format functionality into strconv.go
Move all parse/format related functionality into strconv.go
and thoroughly test them. This also reduces the amount of noise
inside reader.go and writer.go.

There was zero functionality change other than moving code around.

Change-Id: I3bc288d10c20ebb3814b30b75d8acd7be62b85d7
Reviewed-on: https://go-review.googlesource.com/28470
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-09-29 18:38:28 +00:00