Created attachment 787302 [details] raw-not-reliably-parseable-1.txt Description of problem: The raw output format has bug fields tagged with a string beginning "ATTRIBUTE": ATTRIBUTE[id]: 741734 ATTRIBUTE[foo]: bar Such a string could appear at the beginning of a bug description line, which would then be incorrectly interpreted as a bug field tag. Also, the first line has a bug id string: "Bugzilla 741734:" that is: 1. Inconsistent with the format of the other lines. 2. Redundant with: ATTRIBUTE[id]: 741734 Disclaimer: I did not actually submit a bug that would exhibit this problem. Version-Release number of selected component (if applicable): python-bugzilla-0.9.0-1.fc19.noarch How reproducible: Always. Steps to Reproduce: 1. $ bugzilla query -b 741734 --raw > raw-not-reliably-parseable-1.txt Actual results: Incorrect parsing if a bug description line begins with a string: ATTRIBUTE[fff]: ... Expected results: Raw output format can be reliably parsed regardless of the content of the bug description. Additional info: The attached example is from: $ bugzilla query -b 741734 #741734 NEW - Red Hat Kernel Manager - FEAT: /sys should provide more accurate current cpu frequency Or, with the output manually modified to enable BZ autolinkification: Bug 741734 NEW - Red Hat Kernel Manager - FEAT: /sys should provide more accurate current cpu frequency
Created attachment 787325 [details] bz997939-raw-not-reliably-parseable-2.txt The problem appears to be that newlines in the "description" field are not encoded as "\n". That is in contrast with the "comments" field. The attached raw output illustrates the problem. Attachment generated with: $ bugzilla query -b 997939 --raw > bz997939-raw-not-reliably-parseable-2.txt
This example explicitly illustrates the problem: $ cat bz997939-raw-not-reliably-parseable-2.txt | egrep '^ATTRIBUTE\[(id|foo)\]' ATTRIBUTE[id]: 741734 ATTRIBUTE[foo]: bar ATTRIBUTE[id]: 741734 ATTRIBUTE[id]: 997939
(In reply to Steve Tyler from comment #0) ... > Also, the first line has a bug id string: "Bugzilla 741734:" that is: > 1. Inconsistent with the format of the other lines. > 2. Redundant with: > ATTRIBUTE[id]: 741734 ... I have opened a separate bug for this problem. Changing bug summary accordingly. Bug 997963 - raw output header "Bugzilla NNNNNN:" inconsistent and redundant
After looking at this in pdb, it appears that the description has newlines encoded as "\n" before line 698, and that the newline encoding has been removed after line 698: $ less -N python-bugzilla-0.9.0/bin/bugzilla ... 690 def _format_output(bz, opt, buglist): 691 if opt.output == 'raw': 692 buglist = bz.getbugs([b.bug_id for b in buglist]) 693 for b in buglist: 694 print "Bugzilla %s: " % b.bug_id 695 for a in dir(b): 696 if a.startswith("__") and a.endswith("__"): 697 continue 698 print to_encoding(u"ATTRIBUTE[%s]: %s" % (a, getattr(b, a))) 699 print "\n\n" 700 return ... https://git.fedorahosted.org/cgit/python-bugzilla.git/tree/bin/bugzilla?id=0.9.0#n698 This commit added that line: https://git.fedorahosted.org/cgit/python-bugzilla.git/commit/?id=e40b423fd4785b8a6df25959b9f97b6c5c06642a
This doesn't happen with comments, because they are lists, not strings. The newlines are removed from comment text too, if the comments list is indexed as a list: (Pdb) print b.comments[0]['text'] Further, this produces the same output: (Pdb) print b.comments[0]['text'].encode(locale.getpreferredencoding(), 'replace')
Created attachment 787846 [details] [prototype] bugzilla-raw-output-json-prototype-1.patch The attached prototype patch: 1. Fixes this bug. 2. Fixes Bug 998256. 3. Implements the ideas in Bug 997963, Comment 1 by adding a BUGZILLA keyword. The output format retains the ATTRIBUTE keyword, but uses the Python json module[1] to encode the data for each attribute. Thus, this format is a hybrid of the raw output format and the json format. The result is generally readable, except for the description, which has newlines encoded as "\n". This is unavoidable, if the format is to be reliably machine readable. Two known issues are that: 1. Bug comments are missing, because the json encoder does not know how to encode comments. 2. Some spurious fields are generated: ATTRIBUTE[__doc__]: ... ATTRIBUTE[__module__]: "bugzilla.bug" ATTRIBUTE[__weakref__]: null These occur, because they can be json encoded (no TypeError is raised). [1] http://docs.python.org/2.7/library/json.html
Created attachment 787849 [details] bugzilla-raw-output-json-prototype-1.txt example raw output with two bugs This is example output from the patched raw output generator with two bugs.[1] Notable features are: 1. There is a tagged header for each bug: BUGZILLA[bug_id]: 986069 BUGZILLA[timestamp]: "2013-08-18 20:22:31 UTC" BUGZILLA[version]: "0.9.0" 2. The bug description is machine-readable, because newlines are encoded as "\n". In particular, lines in the description beginning "ATTRIBUTE" cannot confuse a parser. The second bug illustrates this, because it contains the string "ATTRIBUTE" several times in the description. An enhancement that would improve readability would be to convert lines of text into elements of a list, so that each line of text could be listed separately. [1] $ PYTHONPATH=. ./bin/bugzilla query -b 986069,997939 --raw > bugzilla-raw-output-json-prototype-1.txt
I think it's reasonable to add a json output option or similar, but I'm not personally interested in implementing it. If you want to follow up, please take this to the upstream mailing list where more interested parties are watching.