Bug 885099
| Summary: | error_is_set() conditionals should not use Error * parameters | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Markus Armbruster <armbru> |
| Component: | qemu-kvm | Assignee: | Cole Robinson <crobinso> |
| Status: | CLOSED WONTFIX | QA Contact: | Virtualization Bugs <virt-bugs> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 7.0 | CC: | acathrow, armbru, bsarathy, crobinso, hhuang, juzhang, lcapitulino, mkenneth, virt-maint |
| Target Milestone: | rc | ||
| Target Release: | 7.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2014-07-29 19:45:33 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
We should search for the incorrect pattern upstream first. Perhaps I can trick Coverity into flagging it; will report back if I can. Markus, I think you meant qmp_input_type_enum(). While the bug exists, it can't be triggered in practice in 6.4 tree as far as I could check it (it was a bit tough to check though, so I wouldn't complain if someone do a second pass). Moving to 6.5. Yes, I meant qmp_input_type_enum(), sorry for the typo. This bz is about two things: 1. A bad pattern in how we handle the error object that could cause functions to do unpredictable things 2. Coverity spotted qmp_input_type_enum() as a resource leak, because qmp_input_type_enum() uses the bad pattern from item 1 The good news is: the bad pattern is only triggered when the caller doesn't pass an error object (that is, the error object is NULL). This is valid, but the bad pattern doesn't handle this correctly. I've checked users of qmp_input_type_enum() in RHEL6.5 and they all seem to pass a non-NULL error object, which means RHEL6.5 won't trigger this bug. I also checked other functions with the same problem and it *seems* that the qapi always pass non-NULL error objects so I think RHEL6 is fine. For this reason I'm moving this to RHEL7.0 so that we can do right thing there. Has anyone considered using thread local storage to track a global 'last reported error' object? Saves having to manually pass around error objects and would avoid pitfalls like the bad pattern mentioned above. Libvirt uses this to great effect. Of course that would be a larger project... I don't know much of the history of error propagation in qemu. But we already have a mixture (== unfinished conversion) of printf(), error_report(), and explicit propagation of Error objects, so we probably shouldn't start another parallel infrastructure :) (In reply to Laszlo Ersek from comment #11) > I don't know much of the history of error propagation in qemu. But we > already have a mixture (== unfinished conversion) of printf(), > error_report(), and explicit propagation of Error objects, so we probably > shouldn't start another parallel infrastructure :) Changing function signatures to pass Error objects all the way down would cause rediculous churn though, and that's what would be presently needed to have qerror_report fully replace error_report. If we switched to using TLS stored error objects, you could convert error_report directly to qerror without touching function signatures. So it would really be an implementation detail that would ease the ongoing conversions... at least how I envision it in my head :) This bug is about a specific misuse of the error_set() API. We strayed into more general discussion of possible error APIs in QEMU in comment#10..#12. This isn't the best place for such discussions, but I'm going to add some background information now, to keep the record straight. Re comment#12: I think you mean error_set(), not qerror_report(). Conversion to error_set() involves changing function signatures, as you described. qerror_report() doesn't. It sets a monitor-local last error. This is similar, but not the same as a thread-local last error. We adopted qerror_report() so we don't have to pass error objects around, precisely because passing them around causes "rediculous churn". My memory gets hazy here (perhaps Luiz can correct me if I talk nonsense), but I think we made the error monitor-local, not thread-local, because all monitors share the same thread, along with much other code. It's a state machine. With a thread-local last error, you need to take care to save and restore the last error as you switch between unrelated states. If I remember correctly (big if by now), Anthony declared that impractical in QEMU. I didn't fully understand his arguments then, and time hasn't exactly improved my understanding. In my (limited!) experience, qerror_report() worked well enough for its intended purpose, namely to help with converting existing HMP commands to QMP. However, it could not satisfy the need for a general error reporting API less primitive than errno. I'm afraid I can't further explain that need to you, as I wasn't personally involved in things with that need, at least not deeply enough to pass judgement on whether the need is real. All I can say is this is how I remember us creating infrastructure for passing around error objects after having rejected it first. qerror_report() is now a transitional interface to help with converting existing HMP commands to QMP. Its uses elsewhere are probably all mistakes. "Transitional" means it should go away completely in due time. Regarding passing "Error objects all the way down": don't. When a simple error reporting mechanism such as returning -errno is fine, don't complicate things with an Error object just for the heck of it. Use common sense, and KISS. Is error_set() the best possible solution for QEMU's error reporting needs? I don't claim to know. But it's the solution that was picked, and I don't wish to reevaluate it from first principles now (and refight all the battles). Hope this helps :) (In reply to Markus Armbruster from comment #13) > My memory gets hazy here (perhaps Luiz can correct me if I talk > nonsense), but I think we made the error monitor-local, not > thread-local, because all monitors share the same thread, along with > much other code. It's a state machine. With a thread-local last > error, you need to take care to save and restore the last error as you > switch between unrelated states. If I remember correctly (big if by > now), Anthony declared that impractical in QEMU. I didn't fully > understand his arguments then, and time hasn't exactly improved my > understanding. Correct. The reason number one against any global error is that it prevents us from having parallel command execution support. Today, it's unclear whether we really need this feature, but back then we wanted to have it and the global error would block us. Another (arguable) problem with qerror_report() is that it does too much... It mixes error setting with error reporting and that entirely depends on the context it's being called. The new Error API is more generic and solves all those problems. Also, as Markus more or less puts it, I think we can say that there was general agreement that the Error API is a good solution for error reporting in QEMU (while there was wildly discussion against qerror_report()). I wouldn't propose changing directions here, especially because most of the churn has already been done. Thanks guys for all the info. Sounds like qerror_report already does more or less what I was suggesting, but the explicit Error passing is the preferred way upstream. I addressed the problem upstream by eliminating error_is_set() entirely, in commit d2e064a. In my elimination work, I examined all the bad uses of error_is_set(), and deemed them harmless in practice. Doesn't mean all downstream uses must be equally harmless. It's probable, though. The work was worthwhile upstream, because it made the code simpler, more robust, more consistent, got rid of a number of Coverity false positives, and removed the temptation to misuse error_is_set(), possibly in not so harmless ways. The case for backporting it is weaker. It would certainly be an improvement, but is it worth the cost? Perhaps if the backport is trivial. Some 40 commits to backport, not counting possible prerequisite commits. My only pros for the backport is to avoid future conflicts. This bug is assigned to me, but given that I have very little experience with RHEL qemu I probably shouldn't be making the decision on whether to backport these patches. Let's not backport anything for now. We can revisit the decision (and reopen the bug) if the conflicts around error_is_set() bother us too much. |
Description of problem: Incorrect usage pattern: void bad(Error *errp) { do_something(&errp); if (error_is_set(errp)) { return; } do_more(); } where do_something() uses error_set() to put an error object into errp when it fails. error_set() does nothing when its Error * argument is null. error_is_set() returns false when its Error * argument is null. Works as designed. bad() breaks when a caller passes 1. a null argument: error_set() does nothing, error_is_set() returns false, and bad() calls do_more() even after do_something() fails. 2. an argument for that error_is_set() returns true: bad() doesn't call do_more() even after do_something() succeeds(). 2 is usually a bad idea, but 1 can be perfectly legitimate. Correct usage pattern: void good(Error *errp) { Error *local_error = NULL; do_something(&local_error); if (local_err) { error_propagate(errp, local_err); return; } do_more(); } Coverity flagged an instance of the incorrect pattern in qmp_input_type_number(), as RESOURCE_LEAK. Version-Release number of selected component (if applicable): qemu-kvm-0.12.1.2-2.340.el6 How reproducible: Not tried; bug found by static analysis Steps to Reproduce: Non known