Bug 1388389

Summary: qemu i6300 watchdog does not reboot after 'halt -fin'
Product: Red Hat Enterprise Linux 7 Reporter: michal novacek <mnovacek>
Component: qemu-kvmAssignee: Bandan Das <bdas>
Status: CLOSED NOTABUG QA Contact: FuXiangChun <xfu>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.3CC: bdas, chayang, jinzhao, juzhang, knoel, michen, mnovacek, rbalakri, virt-maint, xfu
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-24 12:34:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
qemu machine configuration xml none

Description michal novacek 2016-10-25 08:39:03 UTC
Created attachment 1213753 [details]
qemu machine configuration xml

Description of problem:
I'm having pacemaker cluster with sbd (storage based death) fencing configured
on qemu nodes. SBD fencing means that cluster node is rebooted as soon as it looses
pacemaker quorum.

All works correctly when inducing kernel panic with 'echo c > /proc/sysrq-trigger' or cutting the node from other nodes.

However running 'halt -fin' will just leave the node hanging.

Version-Release number of selected component (if applicable):
RHEL7.3 and RHEL6.7 (with appropriate offical version)

How reproducible: always

Steps to Reproduce:
1. configure pacemaker cluster with sbd fencing (node reboot on quorum loss)
2. on one of the cluster nodes issue 'halt -fin'

Actual results: node doing nothing

Expected results: node rebooted as when 'echo c > /proc/sysrq-trigger'

Additional info:
I'm logging this on qemu-kvm because it seems to me to be the most probable one
to blame. I can easily provide configured cluster to reproduce.

Configuration of the watchdog card in qemu machines looks like this:

<devices>
    ...
    <watchdog model='i6300esb' action='reset'>
        <alias name='watchdog0'/>
        <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </watchdog>
    ...
</devices>

Comment 2 Karen Noel 2016-10-27 14:26:42 UTC
> Description of problem:
> I'm having pacemaker cluster with sbd (storage based death) fencing configured
on qemu nodes. SBD fencing means that cluster node is rebooted as soon as it looses pacemaker quorum.

Is clustering in the hosts or between guests? Assume hosts.

> All works correctly when inducing kernel panic with 'echo c > /proc/sysrq-trigger' or cutting the node from other nodes.

Is this on the host or in a guest? Guest?

> Version-Release number of selected component (if applicable):
> RHEL7.3 and RHEL6.7 (with appropriate offical version)

Please distinguish between hosts and guests versions. RHEL7 host and RHEL6 guests?

For the host, please provide the exact package versions for kernel, qemu and libvirt.

For the guests, please provide the exact kernel version. 

Also, please provide the full qemu command line. The XML will give us a start. Interesting that the machine type is rhel6.3.0... Thanks.

Comment 3 michal novacek 2016-11-21 09:50:14 UTC
It's cluster between guests. Host is not part of the cluster. 

As for the versions, it happens in all the instances I tried (different or same host/guest versions, rhel6 and rhel7). 

As an example I have the following configuration where it fails:

host (rhel6.7):
[root@big-01 ~]# rpm -q qemu libvirt kernel
package qemu is not installed
libvirt-0.10.2-54.el6_7.2.x86_64
kernel-2.6.32-573.8.1.el6.x86_64

guests (rhel6.9):
[root@virt-031 ~]# rpm -q qemu libvirt kernel
package qemu is not installed
package libvirt is not installed
kernel-2.6.32-671.el6.x86_64
[root@virt-031 ~]# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 6.9 Beta (Santiago)

full line to run qemu-kvm:
/usr/libexec/qemu-kvm -name virt-031.cluster-qe.lab.eng.brq.redhat.com -S -M rhel6.3.0 -enable-kvm -m 1024 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid d4c1b147-1303-3ddf-5568-35d7fccd1d3d -nographic -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/virt-031.cluster-qe.lab.eng.brq.redhat.com.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/dev/storage-big/root-virt-031.cluster-qe.lab.eng.brq.redhat.com,if=none,id=drive-virtio-disk0,format=raw,cache=unsafe,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 -netdev tap,fd=140,id=hostnet0,vhost=on,vhostfd=141 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=1a:00:00:00:00:1f,bus=pci.0,addr=0x3,bootindex=1 -netdev tap,fd=144,id=hostnet1,vhost=on,vhostfd=145 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:01:00:1f,bus=pci.0,addr=0x5 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device i6300esb,id=watchdog0,bus=pci.0,addr=0x7 -watchdog-action reset -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -msg timestamp=on

Comment 5 michal novacek 2017-06-28 08:14:42 UTC
Did you succeed in reproducing the problem?

Comment 6 Bandan Das 2017-07-18 22:17:15 UTC
(In reply to michal novacek from comment #5)
> Did you succeed in reproducing the problem?

Sorry for the delay. I can reproduce using "halt -fin" BTW, what's the "-i" option ? 

However, I am wondering what the expected behavior is. According to the watchdog manpage, the watchdog daemon can be stopped without causing a reboot if /dev/watchdog is closed correctly. I am wondering whether halt (even with the f flag) does result in closing /dev/watchdog. 

CONFIG_WATCHDOG_NOWAYOUT (which overrides this behavior) is not enabled in the RHEL kernel. I will try to build a kernel with that option enabled and see if it makes a difference.

Comment 7 Bandan Das 2017-07-20 17:37:47 UTC
I compiled a guest kernel with CONFIG_WATCHDOG_NOWAYOUT and sure enough, halt -fin execution results in a boot. This confirms my suspicion in comment 6 that "halt -fin" closes /dev/watchdog which means the watchdog will never fire. 

Is the test procedure essential or more importantly, do you know what the behavior with real watchdog hardware is as far as "halt -fin" is concerned ?

Comment 8 michal novacek 2017-08-24 12:34:47 UTC
The procedure with 'halt -f' (-in is not really doing anything) is from other test we use to test cluster fencing. 

It seems that the behavior is expected and not a bug.

Thanks for the valuable feedback, saved me ton of time. 

Closing as NOTABUG.