Bug 997939
Summary: | raw output format not reliably parseable -- newlines not encoded in "description" field | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Steve Tyler <stephent98> |
Component: | python-bugzilla | Assignee: | Will Woods <wwoods> |
Status: | CLOSED DEFERRED | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 19 | CC: | bugs.michael, crobinso, dzickus, jskarvad, stephent98, wwoods |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2013-10-28 17:51:54 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Attachments: |
Description
Steve Tyler
2013-08-16 15:04:43 UTC
Created attachment 787325 [details] bz997939-raw-not-reliably-parseable-2.txt The problem appears to be that newlines in the "description" field are not encoded as "\n". That is in contrast with the "comments" field. The attached raw output illustrates the problem. Attachment generated with: $ bugzilla query -b 997939 --raw > bz997939-raw-not-reliably-parseable-2.txt This example explicitly illustrates the problem: $ cat bz997939-raw-not-reliably-parseable-2.txt | egrep '^ATTRIBUTE\[(id|foo)\]' ATTRIBUTE[id]: 741734 ATTRIBUTE[foo]: bar ATTRIBUTE[id]: 741734 ATTRIBUTE[id]: 997939 (In reply to Steve Tyler from comment #0) ... > Also, the first line has a bug id string: "Bugzilla 741734:" that is: > 1. Inconsistent with the format of the other lines. > 2. Redundant with: > ATTRIBUTE[id]: 741734 ... I have opened a separate bug for this problem. Changing bug summary accordingly. Bug 997963 - raw output header "Bugzilla NNNNNN:" inconsistent and redundant After looking at this in pdb, it appears that the description has newlines encoded as "\n" before line 698, and that the newline encoding has been removed after line 698: $ less -N python-bugzilla-0.9.0/bin/bugzilla ... 690 def _format_output(bz, opt, buglist): 691 if opt.output == 'raw': 692 buglist = bz.getbugs([b.bug_id for b in buglist]) 693 for b in buglist: 694 print "Bugzilla %s: " % b.bug_id 695 for a in dir(b): 696 if a.startswith("__") and a.endswith("__"): 697 continue 698 print to_encoding(u"ATTRIBUTE[%s]: %s" % (a, getattr(b, a))) 699 print "\n\n" 700 return ... https://git.fedorahosted.org/cgit/python-bugzilla.git/tree/bin/bugzilla?id=0.9.0#n698 This commit added that line: https://git.fedorahosted.org/cgit/python-bugzilla.git/commit/?id=e40b423fd4785b8a6df25959b9f97b6c5c06642a This doesn't happen with comments, because they are lists, not strings. The newlines are removed from comment text too, if the comments list is indexed as a list: (Pdb) print b.comments[0]['text'] Further, this produces the same output: (Pdb) print b.comments[0]['text'].encode(locale.getpreferredencoding(), 'replace') Created attachment 787846 [details] [prototype] bugzilla-raw-output-json-prototype-1.patch The attached prototype patch: 1. Fixes this bug. 2. Fixes Bug 998256. 3. Implements the ideas in Bug 997963, Comment 1 by adding a BUGZILLA keyword. The output format retains the ATTRIBUTE keyword, but uses the Python json module[1] to encode the data for each attribute. Thus, this format is a hybrid of the raw output format and the json format. The result is generally readable, except for the description, which has newlines encoded as "\n". This is unavoidable, if the format is to be reliably machine readable. Two known issues are that: 1. Bug comments are missing, because the json encoder does not know how to encode comments. 2. Some spurious fields are generated: ATTRIBUTE[__doc__]: ... ATTRIBUTE[__module__]: "bugzilla.bug" ATTRIBUTE[__weakref__]: null These occur, because they can be json encoded (no TypeError is raised). [1] http://docs.python.org/2.7/library/json.html Created attachment 787849 [details]
bugzilla-raw-output-json-prototype-1.txt example raw output with two bugs
This is example output from the patched raw output generator with two bugs.[1]
Notable features are:
1. There is a tagged header for each bug:
BUGZILLA[bug_id]: 986069
BUGZILLA[timestamp]: "2013-08-18 20:22:31 UTC"
BUGZILLA[version]: "0.9.0"
2. The bug description is machine-readable, because newlines are encoded as "\n". In particular, lines in the description beginning "ATTRIBUTE" cannot confuse a parser. The second bug illustrates this, because it contains the string "ATTRIBUTE" several times in the description.
An enhancement that would improve readability would be to convert lines of text into elements of a list, so that each line of text could be listed separately.
[1] $ PYTHONPATH=. ./bin/bugzilla query -b 986069,997939 --raw > bugzilla-raw-output-json-prototype-1.txt
I think it's reasonable to add a json output option or similar, but I'm not personally interested in implementing it. If you want to follow up, please take this to the upstream mailing list where more interested parties are watching. |