Red Hat Bugzilla – Bug 997939
raw output format not reliably parseable -- newlines not encoded in "description" field
Last modified: 2013-10-28 13:51:54 EDT
Created attachment 787302 [details]
Description of problem:
The raw output format has bug fields tagged with a string beginning "ATTRIBUTE":
Such a string could appear at the beginning of a bug description line, which would then be incorrectly interpreted as a bug field tag.
Also, the first line has a bug id string: "Bugzilla 741734:" that is:
1. Inconsistent with the format of the other lines.
2. Redundant with:
Disclaimer: I did not actually submit a bug that would exhibit this problem.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. $ bugzilla query -b 741734 --raw > raw-not-reliably-parseable-1.txt
Incorrect parsing if a bug description line begins with a string:
Raw output format can be reliably parsed regardless of the content of the bug description.
The attached example is from:
$ bugzilla query -b 741734
#741734 NEW - Red Hat Kernel Manager - FEAT: /sys should provide more accurate current cpu frequency
Or, with the output manually modified to enable BZ autolinkification:
Bug 741734 NEW - Red Hat Kernel Manager - FEAT: /sys should provide more accurate current cpu frequency
Created attachment 787325 [details]
The problem appears to be that newlines in the "description" field are not encoded as "\n". That is in contrast with the "comments" field. The attached raw output illustrates the problem.
Attachment generated with:
$ bugzilla query -b 997939 --raw > bz997939-raw-not-reliably-parseable-2.txt
This example explicitly illustrates the problem:
$ cat bz997939-raw-not-reliably-parseable-2.txt | egrep '^ATTRIBUTE\[(id|foo)\]'
(In reply to Steve Tyler from comment #0)
> Also, the first line has a bug id string: "Bugzilla 741734:" that is:
> 1. Inconsistent with the format of the other lines.
> 2. Redundant with:
> ATTRIBUTE[id]: 741734
I have opened a separate bug for this problem. Changing bug summary accordingly.
Bug 997963 - raw output header "Bugzilla NNNNNN:" inconsistent and redundant
After looking at this in pdb, it appears that the description has newlines encoded as "\n" before line 698, and that the newline encoding has been removed after line 698:
$ less -N python-bugzilla-0.9.0/bin/bugzilla
690 def _format_output(bz, opt, buglist):
691 if opt.output == 'raw':
692 buglist = bz.getbugs([b.bug_id for b in buglist])
693 for b in buglist:
694 print "Bugzilla %s: " % b.bug_id
695 for a in dir(b):
696 if a.startswith("__") and a.endswith("__"):
698 print to_encoding(u"ATTRIBUTE[%s]: %s" % (a, getattr(b, a)))
699 print "\n\n"
This commit added that line:
This doesn't happen with comments, because they are lists, not strings.
The newlines are removed from comment text too, if the comments list is indexed as a list:
(Pdb) print b.comments['text']
Further, this produces the same output:
(Pdb) print b.comments['text'].encode(locale.getpreferredencoding(), 'replace')
Created attachment 787846 [details]
The attached prototype patch:
1. Fixes this bug.
2. Fixes Bug 998256.
3. Implements the ideas in Bug 997963, Comment 1 by adding a BUGZILLA keyword.
The output format retains the ATTRIBUTE keyword, but uses the Python json module to encode the data for each attribute. Thus, this format is a hybrid of the raw output format and the json format. The result is generally readable, except for the description, which has newlines encoded as "\n". This is unavoidable, if the format is to be reliably machine readable.
Two known issues are that:
1. Bug comments are missing, because the json encoder does not know how to encode comments.
2. Some spurious fields are generated:
These occur, because they can be json encoded (no TypeError is raised).
Created attachment 787849 [details]
bugzilla-raw-output-json-prototype-1.txt example raw output with two bugs
This is example output from the patched raw output generator with two bugs.
Notable features are:
1. There is a tagged header for each bug:
BUGZILLA[timestamp]: "2013-08-18 20:22:31 UTC"
2. The bug description is machine-readable, because newlines are encoded as "\n". In particular, lines in the description beginning "ATTRIBUTE" cannot confuse a parser. The second bug illustrates this, because it contains the string "ATTRIBUTE" several times in the description.
An enhancement that would improve readability would be to convert lines of text into elements of a list, so that each line of text could be listed separately.
 $ PYTHONPATH=. ./bin/bugzilla query -b 986069,997939 --raw > bugzilla-raw-output-json-prototype-1.txt
I think it's reasonable to add a json output option or similar, but I'm not personally interested in implementing it. If you want to follow up, please take this to the upstream mailing list where more interested parties are watching.