Bug 1388389 - qemu i6300 watchdog does not reboot after 'halt -fin'
Summary: qemu i6300 watchdog does not reboot after 'halt -fin'
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm
Version: 7.3
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Bandan Das
QA Contact: FuXiangChun
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-10-25 08:39 UTC by michal novacek
Modified: 2019-03-27 06:34 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-08-24 12:34:47 UTC
Target Upstream Version:


Attachments (Terms of Use)
qemu machine configuration xml (2.67 KB, text/plain)
2016-10-25 08:39 UTC, michal novacek
no flags Details

Description michal novacek 2016-10-25 08:39:03 UTC
Created attachment 1213753 [details]
qemu machine configuration xml

Description of problem:
I'm having pacemaker cluster with sbd (storage based death) fencing configured
on qemu nodes. SBD fencing means that cluster node is rebooted as soon as it looses
pacemaker quorum.

All works correctly when inducing kernel panic with 'echo c > /proc/sysrq-trigger' or cutting the node from other nodes.

However running 'halt -fin' will just leave the node hanging.

Version-Release number of selected component (if applicable):
RHEL7.3 and RHEL6.7 (with appropriate offical version)

How reproducible: always

Steps to Reproduce:
1. configure pacemaker cluster with sbd fencing (node reboot on quorum loss)
2. on one of the cluster nodes issue 'halt -fin'

Actual results: node doing nothing

Expected results: node rebooted as when 'echo c > /proc/sysrq-trigger'

Additional info:
I'm logging this on qemu-kvm because it seems to me to be the most probable one
to blame. I can easily provide configured cluster to reproduce.

Configuration of the watchdog card in qemu machines looks like this:

<devices>
    ...
    <watchdog model='i6300esb' action='reset'>
        <alias name='watchdog0'/>
        <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </watchdog>
    ...
</devices>

Comment 2 Karen Noel 2016-10-27 14:26:42 UTC
> Description of problem:
> I'm having pacemaker cluster with sbd (storage based death) fencing configured
on qemu nodes. SBD fencing means that cluster node is rebooted as soon as it looses pacemaker quorum.

Is clustering in the hosts or between guests? Assume hosts.

> All works correctly when inducing kernel panic with 'echo c > /proc/sysrq-trigger' or cutting the node from other nodes.

Is this on the host or in a guest? Guest?

> Version-Release number of selected component (if applicable):
> RHEL7.3 and RHEL6.7 (with appropriate offical version)

Please distinguish between hosts and guests versions. RHEL7 host and RHEL6 guests?

For the host, please provide the exact package versions for kernel, qemu and libvirt.

For the guests, please provide the exact kernel version. 

Also, please provide the full qemu command line. The XML will give us a start. Interesting that the machine type is rhel6.3.0... Thanks.

Comment 3 michal novacek 2016-11-21 09:50:14 UTC
It's cluster between guests. Host is not part of the cluster. 

As for the versions, it happens in all the instances I tried (different or same host/guest versions, rhel6 and rhel7). 

As an example I have the following configuration where it fails:

host (rhel6.7):
[root@big-01 ~]# rpm -q qemu libvirt kernel
package qemu is not installed
libvirt-0.10.2-54.el6_7.2.x86_64
kernel-2.6.32-573.8.1.el6.x86_64

guests (rhel6.9):
[root@virt-031 ~]# rpm -q qemu libvirt kernel
package qemu is not installed
package libvirt is not installed
kernel-2.6.32-671.el6.x86_64
[root@virt-031 ~]# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 6.9 Beta (Santiago)

full line to run qemu-kvm:
/usr/libexec/qemu-kvm -name virt-031.cluster-qe.lab.eng.brq.redhat.com -S -M rhel6.3.0 -enable-kvm -m 1024 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid d4c1b147-1303-3ddf-5568-35d7fccd1d3d -nographic -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/virt-031.cluster-qe.lab.eng.brq.redhat.com.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/dev/storage-big/root-virt-031.cluster-qe.lab.eng.brq.redhat.com,if=none,id=drive-virtio-disk0,format=raw,cache=unsafe,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 -netdev tap,fd=140,id=hostnet0,vhost=on,vhostfd=141 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=1a:00:00:00:00:1f,bus=pci.0,addr=0x3,bootindex=1 -netdev tap,fd=144,id=hostnet1,vhost=on,vhostfd=145 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:01:00:1f,bus=pci.0,addr=0x5 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device i6300esb,id=watchdog0,bus=pci.0,addr=0x7 -watchdog-action reset -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -msg timestamp=on

Comment 5 michal novacek 2017-06-28 08:14:42 UTC
Did you succeed in reproducing the problem?

Comment 6 Bandan Das 2017-07-18 22:17:15 UTC
(In reply to michal novacek from comment #5)
> Did you succeed in reproducing the problem?

Sorry for the delay. I can reproduce using "halt -fin" BTW, what's the "-i" option ? 

However, I am wondering what the expected behavior is. According to the watchdog manpage, the watchdog daemon can be stopped without causing a reboot if /dev/watchdog is closed correctly. I am wondering whether halt (even with the f flag) does result in closing /dev/watchdog. 

CONFIG_WATCHDOG_NOWAYOUT (which overrides this behavior) is not enabled in the RHEL kernel. I will try to build a kernel with that option enabled and see if it makes a difference.

Comment 7 Bandan Das 2017-07-20 17:37:47 UTC
I compiled a guest kernel with CONFIG_WATCHDOG_NOWAYOUT and sure enough, halt -fin execution results in a boot. This confirms my suspicion in comment 6 that "halt -fin" closes /dev/watchdog which means the watchdog will never fire. 

Is the test procedure essential or more importantly, do you know what the behavior with real watchdog hardware is as far as "halt -fin" is concerned ?

Comment 8 michal novacek 2017-08-24 12:34:47 UTC
The procedure with 'halt -f' (-in is not really doing anything) is from other test we use to test cluster fencing. 

It seems that the behavior is expected and not a bug.

Thanks for the valuable feedback, saved me ton of time. 

Closing as NOTABUG.


Note You need to log in before you can comment on or make changes to this bug.