Bug 1198868
Summary: | Qemu crashed when start guest with invalid pvpanic ioport on rhel6.7 | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | vivian zhang <vivianzhang> |
Component: | qemu-kvm | Assignee: | Laszlo Ersek <lersek> |
Status: | CLOSED NEXTRELEASE | QA Contact: | Virtualization Bugs <virt-bugs> |
Severity: | low | Docs Contact: | |
Priority: | low | ||
Version: | 6.7 | CC: | bsarathy, chayang, dyuan, jdenemar, juzhang, lersek, mkenneth, mzhan, qzhang, rbalakri, virt-maint, vivianzhang, zhwang |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2015-03-13 05:53:32 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
vivian zhang
2015-03-05 00:51:07 UTC
This is the failing function: /* size is the word size in byte */ int register_ioport_read(pio_addr_t start, int length, int size, IOPortReadFunc *func, void *opaque) { int i, bsize; if (ioport_bsize(size, &bsize)) { hw_error("register_ioport_read: invalid size"); return -1; } for(i = start; i < start + length; i += size) { ioport_read_table[bsize][i] = func; if (ioport_opaque[i] != NULL && ioport_opaque[i] != opaque) hw_error("register_ioport_read: invalid opaque"); ioport_opaque[i] = opaque; } return 0; } This infrastructure was completely replaced in qemu-1.6 by Jan Kiszka's series [PATCH v3 00/14] Refactor portio dispatching http://thread.gmane.org/gmane.comp.emulators.qemu/218232 upstream commits a8aec295^..5767e4e1. Backporting this series to RHEL-6 is completely out of question; the best we can do here is figure out what exactly is wrong with placing pvpanic at ioport 9, and refusing it more gracefully. Okay, I know what's going on. Ioport 9 is already occupied, (previous comment saved unwittingly when modifying priority/severity) (In reply to Laszlo Ersek from comment #3) > Okay, I know what's going on. Ioport 9 is already occupied, ... and I'll figure out later by what exactly. (In reply to vivian zhang from comment #0) > 4. this issue can not be hit on rhel7.1 Alright, so in order to establish a rhel-7.1 baseline here, I added the following snippet to a VM's domain XML on a RHEL-7.1 host: <panic> <address type='isa' iobase='0x9'/> </panic> under <devices>. Indeed the VM starts up (on RHEL-7.1). Note however the following output from the # virsh qemu-monitor-command DOMAIN --hmp info mtree command: I/O ... 0000000000000008-000000000000000f (prio 0, RW): dma-cont 0000000000000009-0000000000000009 (prio 0, RW): pvpanic ... That is, an ioport region is registered by "dma-cont" (see "hw/dma/i8257.c" -- DMA controller) at base 0x8, size 0x8. In particular, the ioport at 0x9 is called "request register", for the 8-bit DMA controller: http://wiki.osdev.org/ISA_DMA#The_Registers So, this configuration is irrevocably broken. RHEL-7.1 doesn't complain because it uses the more recent MemoryRegion API, which explicitly supports overlapping regions. For example, outputting a uint16_t at ioport base 0x8 would be dispatched differently from outputting a uint8_t at ioport 0x9, even though the second byte of the first uint16_t output would "overlap" the ioport at 0x9. RHEL-6 is not so smart wrt. overlapping ranges, and it simply refuses the conflicting port assignment when it sees it (I still have to confirm the other party in the conflict on RHEL-6.) So this is simply a user / configuration error, not a QEMU bug. (In reply to vivian zhang from comment #0) > Expect result: > 1. libvirt should check to forbid the invalid ioport configuration for panic > 2. qemu could not crashed, guest start success Unfortunately, none of these are acceptable for RHEL-6. First, libvirt has absolutely no business deciding what ioports are valid for what devices. Second, porting the MemoryRegion API to RHEL-6 qemu-kvm is out of question; it is incredibly intrusive. (Plus, anyway, in this specific configuration, the MemoryRegion API just masks the error in the configuration.) See also the domain XML documentation <http://libvirt.org/formatdomain.html#elementsPanic>: The default ioport is 0x505. Most users don't need to specify an address. I don't know what use case justifies setting 0x9 as pvpanic port. I think the most appropriate closure for this BZ would be NOTABUG. RHEL-6 qemu-kvm detects the invalid configuration and it exits with an error. The best we could do is beautify the error message. We could print "conflicting ioport assignment". However, it doesn't seem possible to say what conflicts with what without introducing new stuff to track ioport assignments, and I think that's out of scope for RHEL-6. So here's what we can do: - close this as NOTABUG. - write a patch that changes the error messages register_ioport_read: invalid opaque register_ioport_write: invalid opaque to register_ioport_read: invalid opaque (probable ioport assignment conflict) register_ioport_write: invalid opaque (probable ioport assignment conflict) Please advise as to what you would prefer. (In reply to Laszlo Ersek from comment #5) > RHEL-6 is not so smart wrt. overlapping ranges, and it simply refuses the > conflicting port assignment when it sees it (I still have to confirm the > other party in the conflict on RHEL-6.) So this is simply a user / > configuration error, not a QEMU bug. Indeed; the first function registered as read handler for ioport 9 is read_cont(), from "hw/dma.c". This call succeeds. The second (offending) call tries to register pvpanic_ioport_read() for the same, and that triggers the hw_error() call (justifiedly). Needinfo about the last part of comment 5. I propose NOTABUG, but if you would like a slightly more informative error message than currently printed, that can be done. To summarize again, mapping pvpanic to a randomly chosen ioport is a user error. RHEL-6 qemu-kvm is right to reject it. RHEL-7 qemu-kvm does not reject it, because in RHEL-7 there is some infrastructure in place that allows more overlapping configurations than possible in RHEL-6 (*), but that doesn't change the fact that in this specific case, mapping pvpanic over ioport 9 is wrong. RHEL-7 doesn't complain about it, but the 8-bit DMA controller will be broken under such a setup. In short: don't mess with the pvpanic ioport unless you're 100% sure that the port you select is unoccupied. (*) There are some use cases when non-intuitive ioport overlappings are valid. For example, ioport 0xCF9 (byte wide) overlaps with 0xCF8..0xCFB (4 byte wide), and they validly have separate roles. It's simply impossible to tell in advance, in a generic fashion, when such an overlap is valid and when not. The pvpanic setting that is the subject of this BZ is a user error. hi, Laszlo Thanks for your deep checking for this bug. I agree with that using ioport with 0x09, or some other values, such as 0x01, 0x02... as an invalid ioport is user configuration error. But as a negative test scenario, QEMU crashed caused by this negative testing is not acceptable. So, I don't think it is reasonable to close this bug with NOTABUGļ¼ I suggest we should fix qemu crashed issue firstly and meanwhile give a patch to update the reasonable error message. thanks As I said before, qemu *did not crash*. It exited on purpose, after it detected the error in configuration (the overlapping ioport assignments). The hw_error() function that RHEL-6 qemu-kvm uses for reporting this error entails the following: - print error message to stderr, - dump state of VCPUs, - call abort(). The abort() function raises SIGABRT. (See the backtrace in comment 0.) In particular, the exit status of the process will show that the process exited due to the SIGABRT signal, and not with a usual (zero or nonzero) exit code. We can discuss the characteristics of *how* qemu exits under these circumstances. For example, we could consider reporting the error with something else than hw_error(), in register_ioport_read(); calling _exit() ultimately instead of the abort() that is currently called by hw_error() internally. That would change the exit status from "terminated with SIGABRT" to "exited with nonzero code", and a core dump would not be generated. What we can't change is *whether* RHEL-6 qemu-kvm exits under the above circumstances. It simply must. To reiterate, just because qemu terminates with SIGABRT that it raises itself, with abort(), that doesn't make this a "crash". It is completely different from a SIGSEGV for example that is generated by the kernel in response to a null pointer dereference. The SIGABRT in this case is 100% intentional in qemu. We can tune the specifics of *how* qemu should exit in this case, but we can't change *whether* it will exit. It must. Given explanation in comment #9, I'm closing it as NEXTRELEASE (it's OK in RHEL7). This is very low priority and not worth the extra time to just beautify the error message, at least not until a customer hits it and complains (which is very very unlikely). |