Bug 802842 - Beaker CSV Parser does not accept standard escaped double quotes
Beaker CSV Parser does not accept standard escaped double quotes
Status: CLOSED CURRENTRELEASE
Product: Beaker
Classification: Community
Component: web UI (Show other bugs)
0.8
Unspecified Unspecified
unspecified Severity unspecified (vote)
: 0.9.1
: ---
Assigned To: Dan Callaghan
:
: 787519 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-03-13 11:43 EDT by Sean Waite
Modified: 2013-01-10 23:54 EST (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-07-19 20:39:15 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Sean Waite 2012-03-13 11:43:07 EDT
Description of problem:
A standard method of escaping a double quote in a CSV is to lead it with another double quote, for instance

""this is some text""

This doesn't work in beaker, and imports the data into all of the wrong fields.


Version-Release number of selected component (if applicable):
0.8
Comment 1 Matt Brodeur 2012-03-13 11:50:38 EDT
(In reply to comment #0)
> Description of problem:
> A standard method of escaping a double quote in a CSV is to lead it with
> another double quote, for instance
> 
> ""this is some text""
> 
> This doesn't work in beaker, and imports the data into all of the wrong fields.

The problem stems from csv.Sniffer() getting dialect.doublequote wrong.  The default, if no dialect is sniffed, is True.  For whatever reason Sniffer is setting doublequote to False, even when given the whole input file as a sample.

FWIW, the default dialect seems to do the right thing, at least with our tests.
Comment 3 Matt Brodeur 2012-03-13 16:43:27 EDT
(In reply to comment #1)
> 
> The problem stems from csv.Sniffer() getting dialect.doublequote wrong.  The
> default, if no dialect is sniffed, is True.  For whatever reason Sniffer is
> setting doublequote to False, even when given the whole input file as a sample.

I should point out that the *actual* problem happens when you have commas in a record along with escaped double quotes.  The csv reader views the double-double quotes as the end of a record and starts splitting on the remaining commas.

With that in mind, I wonder if the importer could check the number of fields read on each line.  If it's ever different from the number of header fields, it should be considered an error.
Comment 4 Dan Callaghan 2012-03-15 01:41:38 EDT
Could we just skip the sniffing altogether, and require that the input CSV is always in the same dialect as what Beaker outputs -- that is, the default Python dialect?

(In reply to comment #3)
> With that in mind, I wonder if the importer could check the number of fields
> read on each line.  If it's ever different from the number of header fields, it
> should be considered an error.

That sounds like a useful sanity check.
Comment 5 Sean Waite 2012-03-15 10:14:40 EDT
Since the default dialect seems to behave, that probably would work. That would give us a known standard to work with, rather than a known standard with some random broken bits, as we have now. 

Will it check to make sure the CSV is in the default python dialect prior to importing? I'd like to see a meaningful error if we import a different standard.

A field count would be most useful, and would have stopped a bad import I had a few days ago.
Comment 6 Matt Brodeur 2012-03-15 13:25:45 EDT
(In reply to comment #4)
> Could we just skip the sniffing altogether, and require that the input CSV is
> always in the same dialect as what Beaker outputs -- that is, the default
> Python dialect?

I think that will work.  The default dialect generates CSV we can work with, and accepts the variations of quoting I've thrown at it.  Unless someone tries something crazy like changing the delimiter or quote character, the sniffing shouldn't be needed.  IIRC, the importer will already complain if the header line doesn't parse into recognizable fields.
Comment 7 Dan Callaghan 2012-06-05 01:32:12 EDT
*** Bug 787519 has been marked as a duplicate of this bug. ***
Comment 8 Dan Callaghan 2012-06-05 04:11:28 EDT
On Gerrit: http://gerrit.beaker-project.org/1114
Comment 11 Dan Callaghan 2012-07-19 20:39:15 EDT
Beaker 0.9.1 has been released.

Note You need to log in before you can comment on or make changes to this bug.