mirror/go - go - Git Fam. Sieh

Commit Graph

Author	SHA1	Message	Date
Justin Nuß	9fbc06e6aa	encoding/csv: preserve \r\n in quoted fields The parser mistakenly assumed it could always fold \r\n into \n, which is not true since a \r\n inside a quoted fields has no special meaning and should be kept as is. Fix this by not folding \r\n to \n inside quotes fields. Fixes #21201 Change-Id: Ifebc302e49cf63e0a027ee90f088dbc050a2b7a6 Reviewed-on: https://go-review.googlesource.com/52810 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-08-14 18:42:20 +00:00
Justin Nuß	5d14ac74f6	encoding/csv: report line start line in errors Errors returned by Reader contain the line where the Reader originally encountered the error. This can be suboptimal since that line does not always correspond with the line the current record/field started at. This can easily happen with LazyQuotes as seen in #19019, but also happens for example when a quoted fields has no closing quote and the parser hits EOF before it finds another quote. When this happens finding the erroneous field can be somewhat complicated and time consuming, and in most cases it would be better to report the line where the record started. This change updates Reader to keep track of the line on which a record begins and uses it for errors instead of the current line, making it easier to find errors. Although a user-visible change, this should have no impact on existing code, since most users don't explicitly work with the line in the error and probably already expect the new behaviour. Updates #19019 Change-Id: Ic9bc70fad2651c69435d614d537e7a9266819b05 Reviewed-on: https://go-review.googlesource.com/52830 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-08-14 04:45:38 +00:00
Justin Nuß	2181653be6	encoding/csv: add option to reuse slices returned by Read In many cases the records returned by Reader.Read will only be used between calls to Read and become garbage once a new record is read. In this case, instead of allocating a new slice on each call to Read, we can reuse the last allocated slice for successive calls to avoid unnecessary allocations. This change adds a new field ReuseRecord to the Reader struct to enable this reuse. ReuseRecord is false by default to avoid breaking existing code which dependss on the current behaviour. I also added 4 new benchmarks, corresponding to the existing Read benchmarks, which set ReuseRecord to true. Benchstat on my local machine (old is ReuseRecord = false, new is ReuseRecord = true) name old time/op new time/op delta Read-8 2.75µs ± 2% 1.88µs ± 1% -31.52% (p=0.000 n=14+15) ReadWithFieldsPerRecord-8 2.75µs ± 0% 1.89µs ± 1% -31.43% (p=0.000 n=13+13) ReadWithoutFieldsPerRecord-8 2.77µs ± 1% 1.88µs ± 1% -32.06% (p=0.000 n=15+15) ReadLargeFields-8 55.4µs ± 1% 54.2µs ± 0% -2.07% (p=0.000 n=15+14) name old alloc/op new alloc/op delta Read-8 664B ± 0% 24B ± 0% -96.39% (p=0.000 n=15+15) ReadWithFieldsPerRecord-8 664B ± 0% 24B ± 0% -96.39% (p=0.000 n=15+15) ReadWithoutFieldsPerRecord-8 664B ± 0% 24B ± 0% -96.39% (p=0.000 n=15+15) ReadLargeFields-8 3.94kB ± 0% 2.98kB ± 0% -24.39% (p=0.000 n=15+15) name old allocs/op new allocs/op delta Read-8 18.0 ± 0% 8.0 ± 0% -55.56% (p=0.000 n=15+15) ReadWithFieldsPerRecord-8 18.0 ± 0% 8.0 ± 0% -55.56% (p=0.000 n=15+15) ReadWithoutFieldsPerRecord-8 18.0 ± 0% 8.0 ± 0% -55.56% (p=0.000 n=15+15) ReadLargeFields-8 24.0 ± 0% 12.0 ± 0% -50.00% (p=0.000 n=15+15) Fixes #19721 Change-Id: I79b14128bb9bb3465f53f40f93b1b528a9da6f58 Reviewed-on: https://go-review.googlesource.com/41730 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-04-26 15:55:56 +00:00
Russ Cox	30651b3bbb	encoding/csv: document Read error behavior Fixes #17342. Change-Id: I76af756d7aff464554c5564d444962a468d0eccc Reviewed-on: https://go-review.googlesource.com/32172 Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Quentin Smith <quentin@golang.org>	2016-10-28 19:38:12 +00:00
Justin Nuß	bd06d4827a	encoding/csv: avoid allocations when reading records This commit changes parseRecord to allocate a single string per record, instead of per field, by using indexes into the raw record. Benchstat (done with `f69991c17`) name old time/op new time/op delta Read-8 3.17µs ± 0% 2.78µs ± 1% -12.35% (p=0.016 n=4+5) ReadWithFieldsPerRecord-8 3.18µs ± 1% 2.79µs ± 1% -12.23% (p=0.008 n=5+5) ReadWithoutFieldsPerRecord-8 4.59µs ± 0% 2.77µs ± 0% -39.58% (p=0.016 n=4+5) ReadLargeFields-8 57.0µs ± 0% 55.7µs ± 0% -2.18% (p=0.008 n=5+5) name old alloc/op new alloc/op delta Read-8 660B ± 0% 664B ± 0% +0.61% (p=0.008 n=5+5) ReadWithFieldsPerRecord-8 660B ± 0% 664B ± 0% +0.61% (p=0.008 n=5+5) ReadWithoutFieldsPerRecord-8 1.14kB ± 0% 0.66kB ± 0% -41.75% (p=0.008 n=5+5) ReadLargeFields-8 3.86kB ± 0% 3.94kB ± 0% +1.86% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Read-8 30.0 ± 0% 18.0 ± 0% -40.00% (p=0.008 n=5+5) ReadWithFieldsPerRecord-8 30.0 ± 0% 18.0 ± 0% -40.00% (p=0.008 n=5+5) ReadWithoutFieldsPerRecord-8 50.0 ± 0% 18.0 ± 0% -64.00% (p=0.008 n=5+5) ReadLargeFields-8 66.0 ± 0% 24.0 ± 0% -63.64% (p=0.008 n=5+5) For a simple application that I wrote, which reads in a CSV file (via ReadAll) and outputs the number of rows read (15857625 rows), this change reduces the total time on my notebook from ~58 seconds to ~48 seconds. This reduces time and allocations (bytes) each by ~6% for a real world CSV file at work (~230000 rows, 13 colums). Updates #16791 Change-Id: Ia07177c94624e55cdd3064a7d2751fb69322d3e4 Reviewed-on: https://go-review.googlesource.com/24723 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-10-05 16:57:44 +00:00
Brad Fitzpatrick	efaa36017e	encoding/csv: update and add CSV reading benchmarks Benchmarks broken off from https://golang.org/cl/24723 and modified to allocate less in the places we're not trying to measure. Updates #16791 Change-Id: I508e4cfeac60322d56f1d71ff1912f6a6f183a63 Reviewed-on: https://go-review.googlesource.com/30357 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-10-05 04:29:07 +00:00
Jess Frazelle	8042bfe347	encoding/csv: update doc about comments whitespace This patch updates the doc about comments whitespace for the encoding/csv package to reflect that leading whitespace before the hash will treat the line as not a comment. Fixes #13775. Change-Id: Ia468c75b242a487b4b2b4cd3d342bfb8e07720ba Reviewed-on: https://go-review.googlesource.com/23302 Reviewed-by: Ian Lance Taylor <iant@golang.org>	2016-06-10 01:00:09 +00:00
Ian Lance Taylor	fa3f484800	encoding/csv: clarify that this package supports RFC 4180 The intent of this comment is to reduce the number of issues opened against the package to add support for new kinds of CSV formats, such as issues #3150, #8458, #12372, #12755. Change-Id: I452c0b748e4ca9ebde3e6cea188bf7774372148e Reviewed-on: https://go-review.googlesource.com/23401 Reviewed-by: Andrew Gerrand <adg@golang.org>	2016-05-25 01:47:53 +00:00
Robert Griesemer	93e8e70499	all: fixed a handful of typos Change-Id: Ib0683f27b44e2f107cca7a8dcc01d230cbcd5700 Reviewed-on: https://go-review.googlesource.com/23404 Reviewed-by: Alan Donovan <adonovan@google.com>	2016-05-24 21:18:03 +00:00
Brad Fitzpatrick	5fea2ccc77	all: single space after period. The tree's pretty inconsistent about single space vs double space after a period in documentation. Make it consistently a single space, per earlier decisions. This means contributors won't be confused by misleading precedence. This CL doesn't use go/doc to parse. It only addresses // comments. It was generated with: $ perl -i -npe 's,^(\s// .+[a-z]\.) +([A-Z]),$1 $2,' $(git grep -l -E '^\s//(.+\.) +([A-Z])') $ go test go/doc -update Change-Id: Iccdb99c37c797ef1f804a94b22ba5ee4b500c4f7 Reviewed-on: https://go-review.googlesource.com/20022 Reviewed-by: Rob Pike <r@golang.org> Reviewed-by: Dave Day <djd@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-03-02 00:13:47 +00:00
Brad Fitzpatrick	351c15f1ce	all: remove public named return values when useless Named returned values should only be used on public funcs and methods when it contributes to the documentation. Named return values should not be used if they're only saving the programmer a few lines of code inside the body of the function, especially if that means there's stutter in the documentation or it was only there so the programmer could use a naked return statement. (Naked returns should not be used except in very small functions) This change is a manual audit & cleanup of public func signatures. Signatures were not changed if: * the func was private (wouldn't be in public godoc) * the documentation referenced it * the named return value was an interesting name. (i.e. it wasn't simply stutter, repeating the name of the type) There should be no changes in behavior. (At least: none intended) Change-Id: I3472ef49619678fe786e5e0994bdf2d9de76d109 Reviewed-on: https://go-review.googlesource.com/20024 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Andrew Gerrand <adg@golang.org>	2016-02-29 03:31:19 +00:00
Eric Lagergren	4feb47bc76	encoding/csv: clarify that TrimLeadingSpace can trim the delimiter Fixes #14464 Change-Id: Iafc21641cca7d35b7a5631cfc94742ee8e7d5042 Reviewed-on: https://go-review.googlesource.com/19861 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2016-02-24 18:42:09 +00:00
Nathan VanBenschoten	b04f3b06ec	all: replace strings.Index with strings.Contains where possible Change-Id: Ia613f1c37bfce800ece0533a5326fca91d99a66a Reviewed-on: https://go-review.googlesource.com/18120 Reviewed-by: Robert Griesemer <gri@golang.org> Run-TryBot: Robert Griesemer <gri@golang.org>	2016-02-19 01:06:05 +00:00
Katrina Owen	a6f69b31e0	encoding/csv: indicate package of EOF in docs The documentation listing err == EOF can be confusing to newcomers to the language who are looking for the relevant documentation for that error. Change-Id: I301885950d0e1d0fbdf3a1892fca86eac7a0c616 Reviewed-on: https://go-review.googlesource.com/15806 Reviewed-by: Andrew Gerrand <adg@golang.org>	2015-10-14 00:46:21 +00:00
Carlos C	1be335b608	encoding/csv: add examples for package Change-Id: I3463826aa760aa5984dec4fc043b95fd2a5120ac Reviewed-on: https://go-review.googlesource.com/11240 Reviewed-by: Andrew Gerrand <adg@golang.org>	2015-06-22 11:11:37 +00:00
Damien Neil	ab89378cb7	encoding/csv: skip blank lines when FieldsPerRecord >= 0 Fixes #11050. Change-Id: Ie5d16960a1f829af947d82a63fe414924cd02ff6 Reviewed-on: https://go-review.googlesource.com/10666 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>	2015-06-12 18:21:12 +00:00
Ainar Garipov	7f9f70e5b6	all: fix misprints in comments These were found by grepping the comments from the go code and feeding the output to aspell. Change-Id: Id734d6c8d1938ec3c36bd94a4dbbad577e3ad395 Reviewed-on: https://go-review.googlesource.com/10941 Reviewed-by: Aamir Khan <syst3m.w0rm@gmail.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2015-06-11 14:18:57 +00:00
Ainar Garipov	4da658cf96	encoding/csv: fix comment in parseRecord Change-Id: I82edd9364e1b4634006f5e043202a69f383dcdbe Reviewed-on: https://go-review.googlesource.com/10826 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2015-06-10 14:26:10 +00:00
Justin Nuß	2db58f8f2d	encoding/csv: Preallocate records slice Currently parseRecord will always start with a nil slice and then resize the slice on append. For input with a fixed number of fields per record we can preallocate the slice to avoid having to resize the slice. This change implements this optimization by using FieldsPerRecord as capacity if it's > 0 and also adds a benchmark to better show the differences. benchmark old ns/op new ns/op delta BenchmarkRead 19741 17909 -9.28% benchmark old allocs new allocs delta BenchmarkRead 59 41 -30.51% benchmark old bytes new bytes delta BenchmarkRead 6276 5844 -6.88% Change-Id: I7c2abc9c80a23571369bcfcc99a8ffc474eae7ab Reviewed-on: https://go-review.googlesource.com/8880 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2015-04-26 16:28:51 +00:00
Josh Bleecher Snyder	2adc4e8927	all: use "reports whether" in place of "returns true if(f)" Comment changes only. Change-Id: I56848814564c4aa0988b451df18bebdfc88d6d94 Reviewed-on: https://go-review.googlesource.com/7721 Reviewed-by: Rob Pike <r@golang.org>	2015-03-18 15:14:06 +00:00
Russ Cox	6ad2749dcd	encoding/csv: for Postgres, unquote empty strings, quote \. In theory both of these lines encode the same three fields: a,,c a,"",c However, Postgres defines that when importing CSV, the unquoted version is treated as NULL (missing), while the quoted version is treated as a string value (empty string). If the middle field is supposed to be an integer value, the first line can be imported (NULL is okay), but the second line cannot (empty string is not). Postgres's import command (COPY FROM) has an option to force the unquoted empty to be interpreted as a string but it does not have an option to force the quoted empty to be interpreted as a NULL. From http://www.postgresql.org/docs/9.0/static/sql-copy.html: The CSV format has no standard way to distinguish a NULL value from an empty string. PostgreSQL's COPY handles this by quoting. A NULL is output as the NULL parameter string and is not quoted, while a non-NULL value matching the NULL parameter string is quoted. For example, with the default settings, a NULL is written as an unquoted empty string, while an empty string data value is written with double quotes (""). Reading values follows similar rules. You can use FORCE_NOT_NULL to prevent NULL input comparisons for specific columns. Therefore printing the unquoted empty is more flexible for imports into Postgres than printing the quoted empty. In addition to making the output more useful with Postgres, not quoting empty strings makes the output smaller and easier to read. It also matches the behavior of Microsoft Excel and Google Drive. Since we are here and making concessions for Postgres, handle this case too (again quoting the Postgres docs): Because backslash is not a special character in the CSV format, \., the end-of-data marker, could also appear as a data value. To avoid any misinterpretation, a \. data value appearing as a lone entry on a line is automatically quoted on output, and on input, if quoted, is not interpreted as the end-of-data marker. If you are loading a file created by another application that has a single unquoted column and might have a value of \., you might need to quote that value in the input file. Fixes #7586. LGTM=bradfitz R=bradfitz CC=golang-codereviews https://golang.org/cl/164760043	2014-10-23 23:44:47 -04:00
Russ Cox	c007ce824d	build: move package sources from src/pkg to src Preparation was in CL 134570043. This CL contains only the effect of 'hg mv src/pkg/* src'. For more about the move, see golang.org/s/go14nopkg.	2014-09-08 00:08:51 -04:00

22 Commits