1350889 – aarch64: hotplug removing drive causes segfault in hmp_drive_del > qdict_get_str > qstring_get_str

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1350889 - aarch64: hotplug removing drive causes segfault in hmp_drive_del > qdict_get_str > qstring_get_str

Summary: aarch64: hotplug removing drive causes segfault in hmp_drive_del > qdict_get_...

Keywords:
Status:	CLOSED DUPLICATE of bug 1341531
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	qemu-kvm-rhev
Sub Component:
Version:	7.3
Hardware:	aarch64
OS:	Unspecified
Priority:	low
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Andrew Jones
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-06-28 15:32 UTC by Richard W.M. Jones
Modified:	2016-08-02 13:48 UTC (History)
CC List:	12 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1221569
Environment:
Last Closed:	2016-08-02 13:07:30 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Richard W.M. Jones 2016-06-28 15:32:53 UTC

Description of problem:

I'm running the libguestfs hotplug tests on aarch64
(https://github.com/libguestfs/libguestfs/tree/master/tests/hotplug).

test-hot-add.pl is working fine.

test-hot-remove.pl fails.  It causes qemu to crash.  The kernel dumps the
following information.

[   69.703520] qemu-kvm[1869]: unhandled level 2 translation fault (11) at 0x00000010, esr 0x92000006
[   69.712449] pgd = fffffe7fe4b60000
[   69.715849] [00000010] *pgd=0000000000000000, *pud=0000000000000000, *pmd=0000000000000000
[   69.724119] 
[   69.725612] CPU: 2 PID: 1869 Comm: qemu-kvm Not tainted 4.5.0-0.40.el7.aarch64 #1
[   69.733069] Hardware name: AppliedMicro Mustang/Mustang, BIOS 1.1.0 Jan 26 2016
[   69.740351] task: fffffe7ff152e800 ti: fffffe7ffad34000 task.ti: fffffe7ffad34000
[   69.747808] PC is at 0x2aab3378ef4
[   69.751197] LR is at 0x2aab321a0a4
[   69.754591] pc : [<000002aab3378ef4>] lr : [<000002aab321a0a4>] pstate: 80000000
[   69.761957] sp : 000003fffe88f2d0
[   69.765267] x29: 000003fffe88f2d0 x28: 000002aad0d76000 
[   69.770583] x27: 000002aab34ff000 x26: 0000000000000000 
[   69.775904] x25: 000002aacf757750 x24: 000002aacf7fa2d0 
[   69.781225] x23: 000002aacfa74400 x22: 000002aae26e4800 
[   69.786545] x21: 000002aab34f35b8 x20: 000003fffe88f380 
[   69.791864] x19: 000002aab34ff000 x18: 0000000000000001 
[   69.797184] x17: 000003ff94be20a0 x16: 000002aab34fdf68 
[   69.802504] x15: 003b9aca00000000 x14: 001d24f7bc000000 
[   69.807826] x13: ffffffffa88e9f71 x12: 0000000000000018 
[   69.813148] x11: 65722e6d6f635f5f x10: 65722e6d6f635f5f 
[   69.818470] x9 : 0000000000000020 x8 : 6972645f74616864 
[   69.823794] x7 : 0000000000000004 x6 : 000002aab316ccf4 
[   69.829117] x5 : 00000000471e3447 x4 : 0000000000000000 
[   69.834437] x3 : 0000000000000c80 x2 : 00000000000001e6 
[   69.839760] x1 : 000002aab3419d38 x0 : 0000000000000000 
[   69.845083] 

If I'm understanding all that correctly, that is just qemu segfaulting, not
any kernel problem.

The stack trace inside qemu is:

Thread 1 (Thread 0x3ffb3406700 (LWP 2712)):
#0  qstring_get_str (qstring=0x0) at qobject/qstring.c:129
#1  0x000002aab63b9574 in qdict_get_str (qdict=<optimized out>, 
    key=key@entry=0x2aab6459d38 "id") at qobject/qdict.c:279
#2  0x000002aab625a0a4 in hmp_drive_del (mon=<optimized out>, 
    qdict=<optimized out>) at blockdev.c:2843
#3  0x000002aab61ac7e0 in handle_qmp_command (parser=<optimized out>, 
    tokens=<optimized out>) at /usr/src/debug/qemu-2.6.0/monitor.c:3922
#4  0x000002aab63bb0d0 in json_message_process_token (lexer=0x2aad6b5a340, 
    input=0x2aad6b00fa0, type=JSON_RCURLY, x=<optimized out>, 
    y=<optimized out>) at qobject/json-streamer.c:94
#5  0x000002aab63cfd44 in json_lexer_feed_char (
    lexer=lexer@entry=0x2aad6b5a340, ch=<optimized out>, 
    flush=flush@entry=false) at qobject/json-lexer.c:310
#6  0x000002aab63cfe2c in json_lexer_feed (lexer=0x2aad6b5a340, 
    buffer=<optimized out>, size=<optimized out>) at qobject/json-lexer.c:360
#7  0x000002aab63bb1b0 in json_message_parser_feed (parser=<optimized out>, 
    buffer=<optimized out>, size=<optimized out>)
    at qobject/json-streamer.c:114
#8  0x000002aab61aad58 in monitor_qmp_read (opaque=<optimized out>, 
    buf=<optimized out>, size=<optimized out>)
    at /usr/src/debug/qemu-2.6.0/monitor.c:3938
#9  0x000002aab6261284 in qemu_chr_be_write_impl (len=<optimized out>, 
    buf=<optimized out>, s=<optimized out>) at qemu-char.c:389
#10 qemu_chr_be_write (s=<optimized out>, buf=<optimized out>, 
    len=<optimized out>) at qemu-char.c:401
#11 0x000002aab6261660 in tcp_chr_read (
    chan=<error reading variable: value has been optimized out>, 
    cond=<error reading variable: value has been optimized out>, 
    opaque=0x2aad6b50880, 
    opaque@entry=<error reading variable: value has been optimized out>)
    at qemu-char.c:2895
#12 0x000002aab638b5dc in qio_channel_fd_source_dispatch (
    source=<optimized out>, callback=<optimized out>, 
    user_data=<optimized out>) at io/channel-watch.c:84
#13 0x000003ffb4dde508 in g_main_context_dispatch ()
   from /lib64/libglib-2.0.so.0
#14 0x000002aab6332cec in glib_pollfds_poll () at main-loop.c:213
#15 os_host_main_loop_wait (timeout=<optimized out>) at main-loop.c:258
#16 main_loop_wait (nonblocking=<optimized out>) at main-loop.c:506
#17 0x000002aab6178414 in main_loop () at vl.c:1934
#18 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>)
    at vl.c:4667

Versions:

kernel 4.5.0-0.40.el7.aarch64  *NB* with acpi=off because of another bug
qemu-kvm-rhev-2.6.0-8.el7.aarch64
libguestfs-1.32.5-6.el7.aarch64 still using virtio-mmio

Comment 2 Andrew Jones 2016-08-02 12:57:38 UTC

I couldn't reproduce this. First I used test-hot-remove.pl, but with some added print statements

creating temp disks...
hot adding temp disks...
removing temp disks...   (this phase took a longish time, 30-40 seconds)
adding disks again...
using disks...
removing disks...
SUCCESS...

Since that worked, I manually did virsh attach-disk/detach-disk with a rhel7.3 guest, and that worked too.

I have latest RHELSA bits installed on both host and guest.

kernel-4.5.0-0.48.el7.aarch64 (booting with ACPI)
qemu-kvm-rhev-2.6.0-14.el7.aarch64
libguestfs-1.32.6-2.el7.aarch64 (from brewweb, batcave has 1.32.5-5.el7)

Comment 3 Richard W.M. Jones 2016-08-02 13:07:30 UTC

I cannot reproduce this now either.  Probably it was fixed in
the updated qemu.

Comment 4 Richard W.M. Jones 2016-08-02 13:14:28 UTC

At a guess I would say it's this bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1341531
If you look at the stack trace (attached to that bug) you'll
see it's pretty much identical to this one.

Comment 5 Andrew Jones 2016-08-02 13:45:02 UTC

(In reply to Richard W.M. Jones from comment #4)
> At a guess I would say it's this bug:
> https://bugzilla.redhat.com/show_bug.cgi?id=1341531
> If you look at the stack trace (attached to that bug) you'll
> see it's pretty much identical to this one.

I agree. Thanks for following up on it. I'm satisfied enough with the two traces matching, and the fact that the bug is gone, to let this bug rest without further investigation (i.e. proof that it's fixed)

Comment 6 Andrew Jones 2016-08-02 13:48:19 UTC

I'm so convinced, let's dup it.

*** This bug has been marked as a duplicate of bug 1341531 ***

Note You need to log in before you can comment on or make changes to this bug.