RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1054785 - inappropriate error message when guest crashes while connecting to agent monitor
Summary: inappropriate error message when guest crashes while connecting to agent monitor
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt
Version: 7.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: rc
: ---
Assignee: Peter Krempa
QA Contact: Virtualization Bugs
URL:
Whiteboard: virt
Depends On: 1047659
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-01-17 13:22 UTC by Peter Krempa
Modified: 2016-07-26 15:10 UTC (History)
18 users (show)

Fixed In Version: libvirt-1.1.1-19.el7
Doc Type: Bug Fix
Doc Text:
Clone Of: 1047659
Environment:
Last Closed: 2014-06-13 12:38:31 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
libvirtd log (2.89 MB, application/x-bzip)
2016-04-14 09:52 UTC, Fangge Jin
no flags Details

Description Peter Krempa 2014-01-17 13:22:22 UTC
+++ This bug was initially created as a clone of Bug #1047659 +++

Description of problem:
When the VM crashes while attempting to connect to the guest agent monitor socket, the regular monitor object is removed. It's later accessed at the startup phase. This now doesn't cause a crash but reports inappropriate error message:

error: invalid argument: monitor must not be NULL

instead of the real error message. This situation is likely to happen when doing migrations under heavy host load.

Version-Release number of selected component (if applicable):
libvirt-1.1.1-18.el7

--- Additional comment from Peter Krempa on 2014-01-14 19:29:36 CET ---

Reproducer:

commit ba95d2d1220061ce86fd0b66f0a2f19a90c6b51c
Author: Peter Krempa <pkrempa>
Date:   Tue Jan 14 19:09:36 2014 +0100

    DO NOT APPLY UPSTREAM: Reproducer
    
    Apply this patch and run a patched libvirt daemon. Then start a VM and
    kill it's process after few seconds:
    
     # virsh start VM & sleep 3; killall -9 qemu-kvm
    
    Upstream version doesn't crash but the virsh command returns cryptic
    error message:
     error: invalid argument: monitor must not be NULL
    
    0.10.2 and similar downstream versions segfault:
    
    Program received signal SIGSEGV, Segmentation fault.
    [Switching to Thread 0x7fffed467700 (LWP 29256)]
    __pthread_mutex_lock (mutex=0x10) at pthread_mutex_lock.c:50
    50        unsigned int type = PTHREAD_MUTEX_TYPE (mutex);
    Missing separate debuginfos, use: debuginfo-install VirtualBox-4.2-4.2.22_91556_el6-1.x86_64
    (gdb) bt
     #0  __pthread_mutex_lock (mutex=0x10) at pthread_mutex_lock.c:50
     #1  0x00007fffe891585e in qemuDomainObjEnterMonitorInternal (driver=0x7fffe00858f0, driver_locked=true, obj=0x7fffe01689c0, asyncJob=<value optimized out>)
         at qemu/qemu_domain.c:1008
     #2  0x00007fffe892662a in qemuProcessDetectVcpuPIDs (driver=0x7fffe00858f0, vm=0x7fffe01689c0) at qemu/qemu_process.c:1831
     #3  0x00007fffe892c417 in qemuProcessStart (conn=0x7fffdc000ae0, driver=0x7fffe00858f0, vm=0x7fffe01689c0, migrateFrom=0x0, stdin_fd=-1, stdin_path=0x0, snapshot=0x0,
         vmop=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=1) at qemu/qemu_process.c:4022
     #4  0x00007fffe8973f3e in qemuDomainObjStart (conn=0x7fffdc000ae0, driver=0x7fffe00858f0, vm=0x7fffe01689c0, flags=<value optimized out>) at qemu/qemu_driver.c:6025

diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
index 9331744..a27eded 100644
--- a/src/qemu/qemu_process.c
+++ b/src/qemu/qemu_process.c
@@ -226,40 +226,42 @@ qemuConnectAgent(virQEMUDriverPtr driver, virDomainObjPtr vm)
         return 0;

     if (virSecurityManagerSetDaemonSocketLabel(driver->securityManager,
                                                vm->def) < 0) {
         VIR_ERROR(_("Failed to set security context for agent for %s"),
                   vm->def->name);
         goto cleanup;
     }

     /* Hold an extra reference because we can't allow 'vm' to be
      * deleted while the agent is active */
     virObjectRef(vm);

     ignore_value(virTimeMillisNow(&priv->agentStart));
     virObjectUnlock(vm);

     agent = qemuAgentOpen(vm,
                           config,
                           &agentCallbacks);

+    sleep(10);
+
     virObjectLock(vm);
     priv->agentStart = 0;

     if (virSecurityManagerClearSocketLabel(driver->securityManager,
                                            vm->def) < 0) {
         VIR_ERROR(_("Failed to clear security context for agent for %s"),
                   vm->def->name);
         goto cleanup;
     }

     if (agent == NULL)
         virObjectUnref(vm);

     if (!virDomainObjIsActive(vm)) {
         qemuAgentClose(agent);
         goto cleanup;
     }
     priv->agent = agent;

     if (priv->agent == NULL) {

--- Additional comment from Peter Krempa on 2014-01-15 18:11:14 CET ---

Fixed upstream:

commit b952cbbccafd5ead8b5a70b2608a1d5a7f03b31e
Author: Peter Krempa <pkrempa>
Date:   Tue Jan 14 19:13:30 2014 +0100

    qemu: Avoid operations on NULL monitor if VM fails early
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1047659
    
    If a VM dies very early during an attempted connect to the guest agent
    while the locks are down the domain monitor object will be freed. The
    object is then accessed later as any failure during guest agent startup
    isn't considered fatal.
    
    In the current upstream version this doesn't lead to a crash as
    virObjectLock called when entering the monitor in
    qemuProcessDetectVcpuPIDs checks the pointer before attempting to
    dereference (lock) it. The NULL pointer is then caught in the monitor
    helper code.
    
    Before the introduction of virObjectLockable - observed on 0.10.2 - the
    pointer is locked directly via virMutexLock leading to a crash.
    
    To avoid this problem we need to differentiate between the guest agent
    not being present and the VM quitting when the locks were down. The fix
    reorganizes the code in qemuConnectAgent to add the check and then adds
    special handling to the callers.

Comment 3 zhenfeng wang 2014-01-23 09:30:00 UTC
I can reproduce this bug with the reproducer in comment0 with libvirt.1.1-18, and I could get the expect error after i update the libvirt to the libvirt-1.1.1-19. so this bug can be marked verified

Reproduce steps
1. rebuild the libvirtd packet as the reproducer in comment0
2. Excute the following command
# virsh start rheltest2 & sleep 3; killall -9 qemu-kvm
[1] 29173
error: invalid argument: monitor must not be NULL

# ps aux|grep libvirtd
root     14377  6.8  0.2 1127840 23560 ?       Ssl  16:29   2:55 /usr/sbin/libvirtd


Verify steps
1. rebuild the libvirtd packet with the reproducer in comment0
2. Excute the following command, the libvirtd didn't crash also get the expect error
# virsh start rheltest2 & sleep 3; killall -9 qemu-kvm
error: Failed to start domain rheltest2
error: internal error guest crashed while connecting to the guest agent

# ps aux|grep libvirtd
root      9784  5.1  0.0 1058464 18544 ?       Sl   19:09   2:03 libvirtd --daemon


3.Start the guest,after the guest start successfully, do S4 with the guest

#virsh start rheltest2
#virsh dompmsuspend rheltest2 --target disk
Domain rheltest2 successfully suspended
# virsh list --all
 Id    Name                           State
----------------------------------------------------
 -     rheltest2                      shut off

4.Excute the step2's command, the libvirtd didn't crash  also get the expect error
#virsh start rheltest2 & sleep 3; killall -9 qemu-kvm
error: Failed to start domain rheltest2
error: internal error guest crashed while connecting to the guest agent

# ps aux|grep libvirtd
root      9784  5.1  0.0 1058464 18544 ?       Sl   19:09   2:03 libvirtd --daemon


5.Start the guest, after the guest start completely, do managedsave with the guest

#virsh start rheltest2
#virsh managedsave rheltest2

Domain rheltest2 state saved by libvirt

6.Excute the step2's command, the libvirtd didn't crash  also get an error
#virsh start rheltest2 & sleep 3; killall -9 qemu-kvm
error: internal error: guest crashed while connecting to the guest agent

# ps aux|grep libvirtd
root      9784  5.1  0.0 1058464 18544 ?       Sl   19:09   2:03 libvirtd --daemon

7.Test the upper scenario without rebuild the libvirt packet, the guest will be in shutoff status and didn't report the error, also the libvirtd didn't crash

#  virsh start rheltest2 & sleep 3; killall -9 qemu-kvm
[1] 14584

# virsh list --all
 Id    Name                           State
----------------------------------------------------
 -     rheltest2                      shut off

8.Start the guest, then check the qemu-guest-agent's function with the following command, all the following command works as expect
#virsh start rheltest2
# virsh setvcpus rheltest2 2 --guest

# virsh vcpucount rheltest2 --guest
2
#virsh dompmsuspend rheltest2 --target mem
#virsh dompmwakeup rheltest2
#virsh dompmsuspend rheltest2 --target disk
#virsh start rheltest2
#virsh shutdown rheltest2 --mode agent
#virsh reboot rheltest2 --mode agent

Comment 4 Ludek Smid 2014-06-13 12:38:31 UTC
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.

Comment 6 Fangge Jin 2016-04-14 09:51:31 UTC
Hi Peter,
I'm testing this bug on build libvirt-1.3.3-1.el7.x86_64, but I can't get this error "guest crashed while connecting to the guest agent". I'm not sure if this scenario is still testable, please have a look. Thank you.

Steps:
1. Rebuild libvirt with the following change in src code:

    236     agent = qemuAgentOpen(vm,
    237                           &config->source,
    238                           &agentCallbacks);
    239     sleep(20);   ===> add this line
    240 
    241     virObjectLock(vm);

2. In a terminal:
# tailf /var/log/libvirt/libvirtd.log|grep qemuAgentOpen

3. In another terminal:
# virsh start rhel7
Domain rhel7 started

4. After this log is printed, kill qemu-kvm process:
# tailf /var/log/libvirt/libvirtd.log|grep qemuAgentOpen
2016-04-14 09:37:42.343+0000: 25074: debug : qemuAgentOpen:778 : New mon 0x7f21f41fb870 fd =22 watch=122

# kill -9 qemu-kvm

5. After 20s, check libvirtd log, no "guest crashed while connecting to the guest agent" log is found.

Comment 7 Fangge Jin 2016-04-14 09:52:49 UTC
Created attachment 1147103 [details]
libvirtd log

Comment 8 Peter Krempa 2016-07-26 15:10:22 UTC
It won't be possible at this point to get the message at the point when it would be emitted. The agent connection is now usually made in the callback handler for the VSERPORT_CHANGE event and the handler takes a job. The destroy action uses a libvirt job too so they are interlocked. With a recent enough qemu the message won't appear.


Note You need to log in before you can comment on or make changes to this bug.