Bug 1367369
Summary: | Both guest and qemu hang after doing block stream when guest rebooting | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Qianqian Zhu <qizhu> | ||||
Component: | qemu-kvm-rhev | Assignee: | John Snow <jsnow> | ||||
Status: | CLOSED ERRATA | QA Contact: | Qianqian Zhu <qizhu> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 7.3 | CC: | chayang, juzhang, knoel, michen, mrezanin, pbonzini, qizhu, qzhang, virt-bugs, virt-maint, xfu | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | qemu-kvm-rhev-2.9.0-1.el7 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2017-08-01 23:32:13 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Qianqian Zhu
2016-08-16 09:42:02 UTC
Too late in the 7.3 cycle to fix this for this release. Will investigate for 7.4. qianqianzhu, Can you elaborate for me? What do you mean when you say "It will not hang until guest finish loading virtio-pci 0x6 on boot phase, and the address of my virtio disk is 0x7."? If I understand you correctly, the timeline looks like this: 1. Make snapshot 2. Reboot 3. Issue "block stream" immediately after step #2. -- Block stream is now happening while guest tries to boot -- Guest appears to freeze during bringup (SeaBIOS or Linux freezes?) -- The guest appears to be frozen after initializing the virtio-pci device? (What text output are you using to determine this?) -- Block stream finishes. -- Guest unfreezes and boot finishes. Is that accurate? (In reply to John Snow from comment #3) > qianqianzhu, Can you elaborate for me? > > What do you mean when you say "It will not hang until guest finish loading > virtio-pci 0x6 on boot phase, and the address of my virtio disk is 0x7."? > > > If I understand you correctly, the timeline looks like this: > > 1. Make snapshot > 2. Reboot > 3. Issue "block stream" immediately after step #2. > -- Block stream is now happening while guest tries to boot > -- Guest appears to freeze during bringup (SeaBIOS or Linux freezes?) > -- The guest appears to be frozen after initializing the virtio-pci device? > (What text output are you using to determine this?) > -- Block stream finishes. > -- Guest unfreezes and boot finishes. > > Is that accurate? Hi John, The steps you listed are correct, guest is booting linux not in seabios when frozen, and the guest dmesg is like: [ 2.771360] [drm:qxl_pci_probe [qxl]] *ERROR* qxl too old, doesn't support client_monitors_config, use xf86-video-qxl in user mode [ 2.773122] qxl: probe of 0000:00:02.0 failed with error -22 [ 47.712170] ACPI: PCI Interrupt Link [LNKD] enabled at IRQ 11 [ 47.719395] virtio-pci 0000:00:06.0: irq 24 for MSI/MSI-X [ 47.719417] virtio-pci 0000:00:06.0: irq 25 for MSI/MSI-X It freezes after [ 2.773122] every time, when block stream finished, it continue [ 47.712170] and below steps. So I think it hangs when it is trying to load device virtio-pci 0x6. Thanks, Qianqian Created attachment 1258246 [details]
complete dmesg
Here is the complete boot messages.
qianqianzhu: Thank you for the clarification and the boot log! I'll test this out today. Wow, yeah, confirmed. Easy to reproduce. Even without issuing further QMP commands on boot, QEMU and the guest will both freeze. Problem appears to be that virtio_blk_data_plane_stop is called with the BQL held, and then issues a bdrv_drained_begin->bdrv_drain_recurse which will not resolve until the block_stream job has finished. The guest writes to the VIRTIO_PCI_COMMON_STATUS register for the virtio-pci device to trigger virtio_pci_stop_ioeventfd, which causes the drain which locks until the block stream process finishes. This process uses bdrv_drain instead of bdrv_drain_all. bdrv_drain, unlike bdrv_drain_all, does not attempt to pause any relevant jobs, so the drain is not allowed to complete until the job finishes. Fixing it would hopefully be as simple as adding a job pause around bdrv_drained_begin and bdrv_drained_end, but since these functions are used where jobs are added between the drain begin/end, we need to take care not to attempt to resume newly created jobs. I'll have to poke at this a little bit, but hopefully it's not too hard. Yeah, pause/resume would be a good workaround. The full solution is much more complex, see https://lists.gnu.org/archive/html/qemu-devel/2016-10/msg02016.html for some discussion between me and Kevin. upstream: 600ac6a0ef5c06418446ef2f37407bddcc51b21c blockjob: add devops to blockjob backends f4d9cc88ee69a5b04a843424e50f466e36fcad4e block-backend: add drained_begin / drained_end ops e3796a245ad0efa65ca8d2dc6424562a8fbaeb6a blockjob: add block_job_start_shim Included in 2.9.0-rc2; will need to be backported unless we rebase to rc2+. Branch has been rebased and should now include a fix. Verified on: qemu-kvm-rhev-2.9.0-1.el7.x86_64 kernel-3.10.0-640.el7.x86_64 Step same as comment 0. Results: qemu works well and guest reboot succeed during block stream. Moving to VERIFIED therefore. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392 |