Bug 1501450

Summary: Race starting multiple libvirtd user sessions at the same time
Product: Red Hat Enterprise Linux 7 Reporter: Sanjay Upadhyay <supadhya>
Component: libvirtAssignee: Michal Privoznik <mprivozn>
Status: CLOSED ERRATA QA Contact: Yanqiu Zhang <yanqzhan>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.4CC: chhu, dyuan, fjin, jdenemar, xuzhang, yafu, yalzhang, yanqzhan, zpeng
Target Milestone: rcKeywords: Upstream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-4.5.0-12.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-08-06 13:13:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Sanjay Upadhyay 2017-10-12 15:15:05 UTC
Description of problem:

for user session libvirtd, seeing this ocassionally - 
$ ps aux | grep libvirtd
root      22671  0.0  0.0 1208324 19252 ?       Ssl  Oct10   0:00 /usr/sbin/libvirtd
stack    134841  0.0  0.0 1243896 19452 ?       Sl   Oct11   0:03 /usr/sbin/libvirtd --timeout=30
stack    139578  0.0  0.0 1242940 22360 ?       Sl   Oct11   0:02 /usr/sbin/libvirtd --timeout=30

The user session (stack) has 2 libvirtd daemons - 

stack    134841  0.0  0.0 1243896 19452 ?       Sl   Oct11   0:03 /usr/sbin/libvirtd --timeout=30
stack    135357  169  9.3 13271540 12294616 ?   Sl   Oct11 2817:12 /usr/libexec/qemu-kvm -name undercloud -S -machine pc-i440fx-rhel7.0.0,accel=kvm,usb=off,dump-guest-core=off -cpu host,-avx
,-avx2 -m 12288 -realtime mlock=off -smp 6,sockets=6,cores=1,threads=1 -uuid b303c589-f917-4677-8fcd-d8e060007e13 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/home/stack/.config/libvirt/qemu/lib/domain-2-undercloud/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -boot menu=off,strict=on -kernel /home/stack/overcloud-full.vmlinuz -initrd /home/stack/overcloud-full.initrd -append console=ttyS0 root=/dev/vda -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/home/stack/pool/undercloud.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=24,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:f8:dc:78:54:e5,bus=pci.0,addr=0x3 -netdev tap,fd=25,id=hostnet1 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:f8:dc:78:54:e3,bus=pci.0,addr=0x4 -netdev tap,fd=26,id=hostnet2 -device virtio-net-pci,netdev=hostnet2,id=net2,mac=00:f8:dc:78:54:e7,bus=pci.0,addr=0x5 -netdev tap,fd=27,id=hostnet3 -device virtio-net-pci,netdev=hostnet3,id=net3,mac=00:f8:dc:78:54:e9,bus=pci.0,addr=0x6 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:0 -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -msg timestamp=on

stack    194774  0.5  0.0 1046000 17080 ?       Sl   08:23   0:00 /usr/sbin/libvirtd --timeout=30
stack    195006  244  4.0 13386292 5270676 ?    Sl   08:23   2:04 /usr/libexec/qemu-kvm -name undercloud -S -machine pc-i440fx-rhel7.0.0,accel=kvm,usb=off,dump-guest-core=off -cpu host,-avx,-avx2 -m 12288 -realtime mlock=off -smp 6,sockets=6,cores=1,threads=1 -uuid b303c589-f917-4677-8fcd-d8e060007e13 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/home/stack/.config/libvirt/qemu/lib/domain-1-undercloud/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -boot menu=off,strict=on -kernel /home/stack/overcloud-full.vmlinuz -initrd /home/stack/overcloud-full.initrd -append console=ttyS0 root=/dev/vda -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/home/stack/pool/undercloud.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=23,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:f8:dc:78:54:e5,bus=pci.0,addr=0x3 -netdev tap,fd=24,id=hostnet1 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:f8:dc:78:54:e3,bus=pci.0,addr=0x4 -netdev tap,fd=25,id=hostnet2 -device virtio-net-pci,netdev=hostnet2,id=net2,mac=00:f8:dc:78:54:e7,bus=pci.0,addr=0x5 -netdev tap,fd=26,id=hostnet3 -device virtio-net-pci,netdev=hostnet3,id=net3,mac=00:f8:dc:78:54:e9,bus=pci.0,addr=0x6 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:1 -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -msg timestamp=on

Version-Release number of selected component (if applicable):

libvirt version running is - 

libvirt.x86_64                      3.2.0-14.el7_4.3

How reproducible:
Intermittent and generally after a day, when the domain is on autostart and we log into the user and use any of the virsh commands

Steps to Reproduce:
1. start libvirtd as a user process
2. create a domain, and let it run for a day
3. log into the host and change as user and virsh any command.

Actual results:


Expected results:


Additional info:

This bug is similar to https://bugzilla.redhat.com/show_bug.cgi?id=1208176, but not sure if the cause is same, as a domain with not autostart seems like working fine.

Comment 2 Sanjay Upadhyay 2017-10-17 03:40:02 UTC
Here are some more details, as I am able to reproduce it.

create polkit poicy as such -
/* Allow users in kvm group to manage the libvirt
daemon without authentication */
polkit.addRule(function(action, subject) {
    if (action.id == "org.libvirt.unix.manage" &&
        subject.isInGroup("wheel")) {
            return polkit.Result.YES;
    }
});

restart libvirtd.

1. now login as a user, and create a VM. 
2. Make the VM as autostart
3. Let the VM run for a 24 hrs.
4. Log into another window to the same window as the same user.
5. run virsh list --all (or any virsh command, which communicates to libvirtd).
6. Voila a new libvirtd --timeout=30 starts up and a new qemu-kvm with the same VM starts up.

[root@ho301 ~]# su - stack
Last login: Tue Oct 17 01:08:22 UTC 2017 from 10.60.17.37 on pts/2
[stack@ho301 ~]$ ps -f -C libvirtd,qemu-kvm,virtlogd
UID        PID  PPID  C STIME TTY          TIME CMD
root      1025     1  0 Oct10 ?        00:00:01 /usr/sbin/libvirtd
stack    18474     1  0 00:52 ?        00:00:04 /usr/sbin/libvirtd --timeout=30
stack    19002     1  0 00:56 ?        00:00:00 /usr/sbin/virtlogd --timeout=30
stack    19013     1 99 00:56 ?        02:35:22 /usr/libexec/qemu-kvm -name undercloud -S -machine pc-i440fx-rhel7.0.0,accel=kvm,usb=off,dump-guest-core=off -cpu SandyBridge,+vme,+vmx,+osxsa

After virsh list --all.. (this hangs..)

[root@ho301 ~]# ps -f -C libvirtd,qemu-kvm
UID        PID  PPID  C STIME TTY          TIME CMD
root      1025     1  0 Oct10 ?        00:00:01 /usr/sbin/libvirtd
stack    18474     1  0 00:52 ?        00:00:04 /usr/sbin/libvirtd --timeout=30
stack    19013     1 99 00:56 ?        02:36:34 /usr/libexec/qemu-kvm -name undercloud -S -machine pc-i440fx-rhel7.0.0,accel=kvm,usb=off,dump-guest-core=off -cpu SandyBridge,+vme,+vmx,+osxsa
stack    26366     1  0 03:31 ?        00:00:00 /usr/sbin/libvirtd --timeout=30
stack    26457     1  0 03:31 ?        00:00:00 /usr/libexec/qemu-kvm -name undercloud -S -machine pc-i440fx-rhel7.0.0,accel=kvm,usb=off,dump-guest-core=off -cpu SandyBridge,+vme,+vmx,+osxsa
[root@ho301 ~]# date
Tue Oct 17 03:33:01 UTC 2017

Now there are 2 sets of tap interfaces for the same VM and network gets broken.

Comment 5 Michal Privoznik 2019-02-12 16:17:45 UTC
Use of autostart for session daemon is discouraged. Exactly beacuse session daemon may come and go. Fixed upstream as:

commit 61b4e8aaf1bce07f282c152de556c3d6aa8d65be
Author:     Michal Privoznik <mprivozn>
AuthorDate: Mon Dec 17 14:42:51 2018 +0100
Commit:     Michal Privoznik <mprivozn>
CommitDate: Mon Dec 17 18:27:32 2018 +0100

    src: Document autostart for session demon
    
    The autostart under session daemon might not behave as you'd
    expect it to behave. This patch is inspired by latest
    libvirt-users discussion:
    
    https://www.redhat.com/archives/libvirt-users/2018-December/msg00047.html
    
    Signed-off-by: Michal Privoznik <mprivozn>
    Reviewed-by: Daniel P. Berrangé <berrange>

v4.10.0-92-g61b4e8aaf1

Comment 9 Yanqiu Zhang 2019-04-12 06:08:38 UTC
Verified with libvirt-debuginfo-4.5.0-12.el7.x86_64:

#vim /usr/src/debug/libvirt-4.5.0/src/libvirt-domain.c 
 6687 /**
 6688  * virDomainGetAutostart:
 6689  * @domain: a domain object
 6690  * @autostart: the value returned
 6691  *
 6692  * Provides a boolean value indicating whether the domain
 6693  * configured to be automatically started when the host
 6694  * machine boots.
 6695  *
 6696  * Please note that this might result in unexpected behaviour if
 6697  * used for some session URIs. Since the session daemon is started
 6698  * with --timeout it comes and goes and as it does so it
 6699  * autostarts domains which might have been shut off recently.
 6700  *
 6701  * Returns -1 in case of error, 0 in case of success
 6702  */
 6703 int
 6704 virDomainGetAutostart(virDomainPtr domain,
 6705                       int *autostart)

Comment 11 errata-xmlrpc 2019-08-06 13:13:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:2294