Bug 802842 - Beaker CSV Parser does not accept standard escaped double quotes
Summary: Beaker CSV Parser does not accept standard escaped double quotes
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Beaker
Classification: Retired
Component: web UI
Version: 0.8
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Dan Callaghan
QA Contact:
URL:
Whiteboard:
: 787519 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-03-13 15:43 UTC by Sean Waite
Modified: 2019-05-22 13:42 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-07-20 00:39:15 UTC
Embargoed:


Attachments (Terms of Use)

Description Sean Waite 2012-03-13 15:43:07 UTC
Description of problem:
A standard method of escaping a double quote in a CSV is to lead it with another double quote, for instance

""this is some text""

This doesn't work in beaker, and imports the data into all of the wrong fields.


Version-Release number of selected component (if applicable):
0.8

Comment 1 Matt Brodeur 2012-03-13 15:50:38 UTC
(In reply to comment #0)
> Description of problem:
> A standard method of escaping a double quote in a CSV is to lead it with
> another double quote, for instance
> 
> ""this is some text""
> 
> This doesn't work in beaker, and imports the data into all of the wrong fields.

The problem stems from csv.Sniffer() getting dialect.doublequote wrong.  The default, if no dialect is sniffed, is True.  For whatever reason Sniffer is setting doublequote to False, even when given the whole input file as a sample.

FWIW, the default dialect seems to do the right thing, at least with our tests.

Comment 3 Matt Brodeur 2012-03-13 20:43:27 UTC
(In reply to comment #1)
> 
> The problem stems from csv.Sniffer() getting dialect.doublequote wrong.  The
> default, if no dialect is sniffed, is True.  For whatever reason Sniffer is
> setting doublequote to False, even when given the whole input file as a sample.

I should point out that the *actual* problem happens when you have commas in a record along with escaped double quotes.  The csv reader views the double-double quotes as the end of a record and starts splitting on the remaining commas.

With that in mind, I wonder if the importer could check the number of fields read on each line.  If it's ever different from the number of header fields, it should be considered an error.

Comment 4 Dan Callaghan 2012-03-15 05:41:38 UTC
Could we just skip the sniffing altogether, and require that the input CSV is always in the same dialect as what Beaker outputs -- that is, the default Python dialect?

(In reply to comment #3)
> With that in mind, I wonder if the importer could check the number of fields
> read on each line.  If it's ever different from the number of header fields, it
> should be considered an error.

That sounds like a useful sanity check.

Comment 5 Sean Waite 2012-03-15 14:14:40 UTC
Since the default dialect seems to behave, that probably would work. That would give us a known standard to work with, rather than a known standard with some random broken bits, as we have now. 

Will it check to make sure the CSV is in the default python dialect prior to importing? I'd like to see a meaningful error if we import a different standard.

A field count would be most useful, and would have stopped a bad import I had a few days ago.

Comment 6 Matt Brodeur 2012-03-15 17:25:45 UTC
(In reply to comment #4)
> Could we just skip the sniffing altogether, and require that the input CSV is
> always in the same dialect as what Beaker outputs -- that is, the default
> Python dialect?

I think that will work.  The default dialect generates CSV we can work with, and accepts the variations of quoting I've thrown at it.  Unless someone tries something crazy like changing the delimiter or quote character, the sniffing shouldn't be needed.  IIRC, the importer will already complain if the header line doesn't parse into recognizable fields.

Comment 7 Dan Callaghan 2012-06-05 05:32:12 UTC
*** Bug 787519 has been marked as a duplicate of this bug. ***

Comment 8 Dan Callaghan 2012-06-05 08:11:28 UTC
On Gerrit: http://gerrit.beaker-project.org/1114

Comment 11 Dan Callaghan 2012-07-20 00:39:15 UTC
Beaker 0.9.1 has been released.


Note You need to log in before you can comment on or make changes to this bug.