Description of problem: A standard method of escaping a double quote in a CSV is to lead it with another double quote, for instance ""this is some text"" This doesn't work in beaker, and imports the data into all of the wrong fields. Version-Release number of selected component (if applicable): 0.8
(In reply to comment #0) > Description of problem: > A standard method of escaping a double quote in a CSV is to lead it with > another double quote, for instance > > ""this is some text"" > > This doesn't work in beaker, and imports the data into all of the wrong fields. The problem stems from csv.Sniffer() getting dialect.doublequote wrong. The default, if no dialect is sniffed, is True. For whatever reason Sniffer is setting doublequote to False, even when given the whole input file as a sample. FWIW, the default dialect seems to do the right thing, at least with our tests.
(In reply to comment #1) > > The problem stems from csv.Sniffer() getting dialect.doublequote wrong. The > default, if no dialect is sniffed, is True. For whatever reason Sniffer is > setting doublequote to False, even when given the whole input file as a sample. I should point out that the *actual* problem happens when you have commas in a record along with escaped double quotes. The csv reader views the double-double quotes as the end of a record and starts splitting on the remaining commas. With that in mind, I wonder if the importer could check the number of fields read on each line. If it's ever different from the number of header fields, it should be considered an error.
Could we just skip the sniffing altogether, and require that the input CSV is always in the same dialect as what Beaker outputs -- that is, the default Python dialect? (In reply to comment #3) > With that in mind, I wonder if the importer could check the number of fields > read on each line. If it's ever different from the number of header fields, it > should be considered an error. That sounds like a useful sanity check.
Since the default dialect seems to behave, that probably would work. That would give us a known standard to work with, rather than a known standard with some random broken bits, as we have now. Will it check to make sure the CSV is in the default python dialect prior to importing? I'd like to see a meaningful error if we import a different standard. A field count would be most useful, and would have stopped a bad import I had a few days ago.
(In reply to comment #4) > Could we just skip the sniffing altogether, and require that the input CSV is > always in the same dialect as what Beaker outputs -- that is, the default > Python dialect? I think that will work. The default dialect generates CSV we can work with, and accepts the variations of quoting I've thrown at it. Unless someone tries something crazy like changing the delimiter or quote character, the sniffing shouldn't be needed. IIRC, the importer will already complain if the header line doesn't parse into recognizable fields.
*** Bug 787519 has been marked as a duplicate of this bug. ***
On Gerrit: http://gerrit.beaker-project.org/1114
Beaker 0.9.1 has been released.