Bug 1814157
| Summary: | Create multipath target paths in domain namespace | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux Advanced Virtualization | Reporter: | gaojianan <jgao> |
| Component: | libvirt | Assignee: | Michal Privoznik <mprivozn> |
| Status: | CLOSED ERRATA | QA Contact: | gaojianan <jgao> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 8.2 | CC: | hhan, jdenemar, jgao, jsuchane, jtomko, lmen, meili, mprivozn, mtessun, pkrempa, virt-maint, xuzhang |
| Target Milestone: | rc | Keywords: | Triaged, Upstream |
| Target Release: | 8.0 | Flags: | pm-rhel:
mirror+
|
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | libvirt-6.0.0-13.el8 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-05-05 09:59:00 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
gaojianan
2020-03-17 08:55:58 UTC
I can't reproduce this upstream, but downstream it happens reliably. Looks like memory corruption though. Also reliably does not happen if persistent reservations are not used. ==3621175== Thread 6: ==3621175== Invalid free() / delete / delete[] / realloc() ==3621175== at 0x483AA0C: free (vg_replace_malloc.c:540) ==3621175== by 0x57B544C: g_free (in /usr/lib64/libglib-2.0.so.0.6200.5) ==3621175== by 0x16340AE0: g_autoptr_cleanup_generic_gfree (glib-autocleanups.h:28) ==3621175== by 0x16340AE0: qemuDomainNamespaceSetupDisk (qemu_domain.c:15916) ==3621175== by 0x16340CCB: qemuDomainStorageSourceAccessModify (qemu_domain.c:12050) ==3621175== by 0x16353B06: qemuDomainAttachDiskGeneric (qemu_hotplug.c:686) ==3621175== by 0x163551D4: qemuDomainAttachSCSIDisk (qemu_hotplug.c:981) ==3621175== by 0x163551D4: qemuDomainAttachDeviceDiskLiveInternal (qemu_hotplug.c:1055) ==3621175== by 0x163551D4: qemuDomainAttachDeviceDiskLive (qemu_hotplug.c:1111) ==3621175== by 0x163D1848: qemuDomainAttachDeviceLive (qemu_driver.c:7809) ==3621175== by 0x163D1848: qemuDomainAttachDeviceLiveAndConfig (qemu_driver.c:8712) ==3621175== by 0x163D1848: qemuDomainAttachDeviceFlags (qemu_driver.c:8764) ==3621175== by 0x4B4AF22: virDomainAttachDeviceFlags (libvirt-domain.c:8239) ==3621175== by 0x1478A8: remoteDispatchDomainAttachDeviceFlags (remote_daemon_dispatch_stubs.h:3713) ==3621175== by 0x1478A8: remoteDispatchDomainAttachDeviceFlagsHelper (remote_daemon_dispatch_stubs.h:3692) ==3621175== by 0x4A6ED7F: virNetServerProgramDispatchCall (virnetserverprogram.c:430) ==3621175== by 0x4A6ED7F: virNetServerProgramDispatch (virnetserverprogram.c:302) ==3621175== by 0x4A73EE7: virNetServerProcessMsg (virnetserver.c:136) ==3621175== by 0x4A73EE7: virNetServerHandleJob (virnetserver.c:153) ==3621175== by 0x4991DB6: virThreadPoolWorker (virthreadpool.c:163) ==3621175== Address 0x1ae191b0 is 0 bytes inside a block of size 20 free'd ==3621175== at 0x483AA0C: free (vg_replace_malloc.c:540) ==3621175== by 0x57B544C: g_free (in /usr/lib64/libglib-2.0.so.0.6200.5) ==3621175== by 0x490DDDA: virFree (viralloc.c:348) ==3621175== by 0x4989C11: virStringListFreeCount (virstring.c:341) ==3621175== by 0x16340ACE: qemuDomainNamespaceSetupDisk (qemu_domain.c:15960) ==3621175== by 0x16340CCB: qemuDomainStorageSourceAccessModify (qemu_domain.c:12050) ==3621175== by 0x16353B06: qemuDomainAttachDiskGeneric (qemu_hotplug.c:686) ==3621175== by 0x163551D4: qemuDomainAttachSCSIDisk (qemu_hotplug.c:981) ==3621175== by 0x163551D4: qemuDomainAttachDeviceDiskLiveInternal (qemu_hotplug.c:1055) ==3621175== by 0x163551D4: qemuDomainAttachDeviceDiskLive (qemu_hotplug.c:1111) ==3621175== by 0x163D1848: qemuDomainAttachDeviceLive (qemu_driver.c:7809) ==3621175== by 0x163D1848: qemuDomainAttachDeviceLiveAndConfig (qemu_driver.c:8712) ==3621175== by 0x163D1848: qemuDomainAttachDeviceFlags (qemu_driver.c:8764) ==3621175== by 0x4B4AF22: virDomainAttachDeviceFlags (libvirt-domain.c:8239) ==3621175== by 0x1478A8: remoteDispatchDomainAttachDeviceFlags (remote_daemon_dispatch_stubs.h:3713) ==3621175== by 0x1478A8: remoteDispatchDomainAttachDeviceFlagsHelper (remote_daemon_dispatch_stubs.h:3692) ==3621175== by 0x4A6ED7F: virNetServerProgramDispatchCall (virnetserverprogram.c:430) ==3621175== by 0x4A6ED7F: virNetServerProgramDispatch (virnetserverprogram.c:302) ==3621175== Block was alloc'd at ==3621175== at 0x483980B: malloc (vg_replace_malloc.c:309) ==3621175== by 0x57B5358: g_malloc (in /usr/lib64/libglib-2.0.so.0.6200.5) ==3621175== by 0x57CF613: g_strdup (in /usr/lib64/libglib-2.0.so.0.6200.5) ==3621175== by 0x16340A75: qemuDomainNamespaceSetupDisk (qemu_domain.c:15944) ==3621175== by 0x16340CCB: qemuDomainStorageSourceAccessModify (qemu_domain.c:12050) ==3621175== by 0x16353B06: qemuDomainAttachDiskGeneric (qemu_hotplug.c:686) ==3621175== by 0x163551D4: qemuDomainAttachSCSIDisk (qemu_hotplug.c:981) ==3621175== by 0x163551D4: qemuDomainAttachDeviceDiskLiveInternal (qemu_hotplug.c:1055) ==3621175== by 0x163551D4: qemuDomainAttachDeviceDiskLive (qemu_hotplug.c:1111) ==3621175== by 0x163D1848: qemuDomainAttachDeviceLive (qemu_driver.c:7809) ==3621175== by 0x163D1848: qemuDomainAttachDeviceLiveAndConfig (qemu_driver.c:8712) ==3621175== by 0x163D1848: qemuDomainAttachDeviceFlags (qemu_driver.c:8764) ==3621175== by 0x4B4AF22: virDomainAttachDeviceFlags (libvirt-domain.c:8239) ==3621175== by 0x1478A8: remoteDispatchDomainAttachDeviceFlags (remote_daemon_dispatch_stubs.h:3713) ==3621175== by 0x1478A8: remoteDispatchDomainAttachDeviceFlagsHelper (remote_daemon_dispatch_stubs.h:3692) ==3621175== by 0x4A6ED7F: virNetServerProgramDispatchCall (virnetserverprogram.c:430) ==3621175== by 0x4A6ED7F: virNetServerProgramDispatch (virnetserverprogram.c:302) ==3621175== by 0x4A73EE7: virNetServerProcessMsg (virnetserver.c:136) ==3621175== by 0x4A73EE7: virNetServerHandleJob (virnetserver.c:153) ==3621175== The issue is that the '*dmPath' string is double-free'd as the pointer it's not cleared when it's being made part of the string list in qemuDomainNamespaceSetupDisk. This was caused by upstream commit ce36e33c105 and later fixed accidentally by a30078cb8326461. Fixed upstream as: a30078cb83 qemu: Create multipath targets for PRs v6.1.0-79-ga30078cb83 commit ce36e33c10579c4e6d0816ce6f154884891fc247
qemu: use g_strdup instead of VIR_STRDUP
merely altered the allocation function, the (lack of) pointer zeroing
and the odd freeing functions were untouched.
commit a80ebd2a2a8afa3b85c0d207d5500af6dd28f95d
qemu: Create NVMe disk in domain namespace
is the one who switched the string free function from the shallow free
to freeing the list
Note to QE: The patch linked in comment 7 does more than just fix the crasher. It also fixes the problem mentioned in bug 1711045#c61. Long story short, libvirt does not only need to allow all devices that a multipath target consists of in CGroups (fixed in bug 1557769) but also should create the nodes in namespaces (per Paolo's comment). And the patch I've backported does that. Plus fixes the crasher. One more case associated with this crash -- Copy to scsi reservations destination
pr-dest.xml:
<disk type="block" device="lun">
<driver name="qemu" type="raw"/>
<source dev="/dev/sdb">
<encryption format="luks">
<secret type="passphrase" usage="luks"/>
</encryption>
<reservations managed="yes"/>
</source>
<target dev="sdb" bus="scsi"/>
</disk>
# virsh blockcopy pc sdb --xml pr-dest.xml --wait --pivot --verbose --transient-job
(In reply to Han Han from comment #10) > One more case associated with this crash -- Copy to scsi reservations > destination > pr-dest.xml: > <disk type="block" device="lun"> > <driver name="qemu" type="raw"/> > <source dev="/dev/sdb"> > <encryption format="luks"> > <secret type="passphrase" usage="luks"/> > </encryption> > <reservations managed="yes"/> > </source> > <target dev="sdb" bus="scsi"/> > </disk> > > > # virsh blockcopy pc sdb --xml pr-dest.xml --wait --pivot --verbose > --transient-job I guess this is the same bug but it just demonstrates itself differently. Because in case of blockcopy we need to create the node in the namespace which tickles the problematic code. Verified on :
libvirt-6.0.0-14.module+el8.2.0+6069+78a1cb09.x86_64
Step:
1.Prepare a guest and disk xml with element "reservation":
# cat disk.xml
<disk device="lun" type="block">
<target bus="scsi" dev="sdb" />
<driver name="qemu" type="raw" />
<source dev="/dev/sdh">
<reservations managed="yes" />
</source>
</disk>
2.Hot-plug the disk to the guest
# virsh attach-device test1 disk.xml
Device attached successfully
3.Check the qemu cmd:
2020-03-30 02:22:12.294+0000: 630435: info : qemuMonitorIOWrite:453 : QEMU_MONITOR_IO_WRITE: mon=0x7f260c2fa6f0 buf={"execute":"object-add","arguments":{"qom-type":"pr-manager-helper","id":"pr-helper0","props":{"path":"/var/lib/libvirt/qemu/domain-4-test1/pr-helper0.sock"}},"id":"libvirt-369"}^M
len=180 ret=180 errno=0
...
2020-03-30 02:22:12.296+0000: 630438: info : qemuMonitorSend:996 : QEMU_MONITOR_SEND_MSG: mon=0x7f260c2fa6f0 msg={"execute":"blockdev-add","arguments":{"driver":"host_device","filename":"/dev/sdh","pr-manager":"pr-helper0","node-name":"libvirt-2-storage","auto-read-only":true,"discard":"unmap"},"id":"libvirt-370"}
4.Do blockcopy and check the qemu cmd:
# virsh blockcopy test1 sdb --xml disk.xml --wait --pivot --verbose --transient-job
Block Copy: [100 %]
Successfully pivoted
qemu cmd:
2020-03-30 02:33:34.707+0000: 630439: info : qemuMonitorSend:996 : QEMU_MONITOR_SEND_MSG: mon=0x7f260c2fa6f0 msg={"execute":"blockdev-add","arguments":{"driver":"host_device","filename":"/dev/sdh","pr-manager":"pr-helper0","node-name":"libvirt-3-storage","auto-read-only":true,"discard":"unmap"},"id":"libvirt-375"}^M
fd=-1
2020-03-30 02:33:34.707+0000: 630435: info : virObjectRef:386 : OBJECT_REF: obj=0x7f260c2fa6f0
2020-03-30 02:33:34.708+0000: 630435: info : qemuMonitorIOWrite:453 : QEMU_MONITOR_IO_WRITE: mon=0x7f260c2fa6f0 buf={"execute":"blockdev-add","arguments":{"driver":"host_device","filename":"/dev/sdh","pr-manager":"pr-helper0","node-name":"libvirt-3-storage","auto-read-only":true,"discard":"unmap"},"id":"libvirt-375"}^M
len=204 ret=204 errno=0
...
2020-03-30 02:33:34.709+0000: 630439: info : qemuMonitorSend:996 : QEMU_MONITOR_SEND_MSG: mon=0x7f260c2fa6f0 msg={"execute":"blockdev-add","arguments":{"node-name":"libvirt-3-format","read-only":false,"driver":"raw","file":"libvirt-3-storage"},"id":"libvirt-376"}^M
fd=-1
2020-03-30 02:33:34.709+0000: 630435: info : virObjectRef:386 : OBJECT_REF: obj=0x7f260c2fa6f0
2020-03-30 02:33:34.709+0000: 630435: info : qemuMonitorIOWrite:453 : QEMU_MONITOR_IO_WRITE: mon=0x7f260c2fa6f0 buf={"execute":"blockdev-add","arguments":{"node-name":"libvirt-3-format","read-only":false,"driver":"raw","file":"libvirt-3-storage"},"id":"libvirt-376"}^M
len=152 ret=152 errno=0
Node name has been added successfully
Work as expected
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2017 |