Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1388389

Summary:

qemu i6300 watchdog does not reboot after 'halt -fin'

Product:

Red Hat Enterprise Linux 7

Reporter:

michal novacek <mnovacek>

Component:

qemu-kvm

Assignee:

Bandan Das <bdas>

Status:

CLOSED NOTABUG

QA Contact:

FuXiangChun <xfu>

Severity:

unspecified

Docs Contact:

Priority:

unspecified

Version:

7.3

CC:

bdas, chayang, jinzhao, juzhang, knoel, michen, mnovacek, rbalakri, virt-maint, xfu

Target Milestone:

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2017-08-24 12:34:47 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
qemu machine configuration xml	none

Description michal novacek 2016-10-25 08:39:03 UTC

Created attachment 1213753 [details]
qemu machine configuration xml

Description of problem:
I'm having pacemaker cluster with sbd (storage based death) fencing configured
on qemu nodes. SBD fencing means that cluster node is rebooted as soon as it looses
pacemaker quorum.

All works correctly when inducing kernel panic with 'echo c > /proc/sysrq-trigger' or cutting the node from other nodes.

However running 'halt -fin' will just leave the node hanging.

Version-Release number of selected component (if applicable):
RHEL7.3 and RHEL6.7 (with appropriate offical version)

How reproducible: always

Steps to Reproduce:
1. configure pacemaker cluster with sbd fencing (node reboot on quorum loss)
2. on one of the cluster nodes issue 'halt -fin'

Actual results: node doing nothing

Expected results: node rebooted as when 'echo c > /proc/sysrq-trigger'

Additional info:
I'm logging this on qemu-kvm because it seems to me to be the most probable one
to blame. I can easily provide configured cluster to reproduce.

Configuration of the watchdog card in qemu machines looks like this:

<devices>
    ...
    <watchdog model='i6300esb' action='reset'>
        <alias name='watchdog0'/>
        <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </watchdog>
    ...
</devices>

Comment 2 Karen Noel 2016-10-27 14:26:42 UTC

> Description of problem:
> I'm having pacemaker cluster with sbd (storage based death) fencing configured
on qemu nodes. SBD fencing means that cluster node is rebooted as soon as it looses pacemaker quorum.

Is clustering in the hosts or between guests? Assume hosts.

> All works correctly when inducing kernel panic with 'echo c > /proc/sysrq-trigger' or cutting the node from other nodes.

Is this on the host or in a guest? Guest?

> Version-Release number of selected component (if applicable):
> RHEL7.3 and RHEL6.7 (with appropriate offical version)

Please distinguish between hosts and guests versions. RHEL7 host and RHEL6 guests?

For the host, please provide the exact package versions for kernel, qemu and libvirt.

For the guests, please provide the exact kernel version. 

Also, please provide the full qemu command line. The XML will give us a start. Interesting that the machine type is rhel6.3.0... Thanks.

Comment 3 michal novacek 2016-11-21 09:50:14 UTC

It's cluster between guests. Host is not part of the cluster. 

As for the versions, it happens in all the instances I tried (different or same host/guest versions, rhel6 and rhel7). 

As an example I have the following configuration where it fails:

host (rhel6.7):
[root@big-01 ~]# rpm -q qemu libvirt kernel
package qemu is not installed
libvirt-0.10.2-54.el6_7.2.x86_64
kernel-2.6.32-573.8.1.el6.x86_64

guests (rhel6.9):
[root@virt-031 ~]# rpm -q qemu libvirt kernel
package qemu is not installed
package libvirt is not installed
kernel-2.6.32-671.el6.x86_64
[root@virt-031 ~]# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 6.9 Beta (Santiago)

full line to run qemu-kvm:
/usr/libexec/qemu-kvm -name virt-031.cluster-qe.lab.eng.brq.redhat.com -S -M rhel6.3.0 -enable-kvm -m 1024 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid d4c1b147-1303-3ddf-5568-35d7fccd1d3d -nographic -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/virt-031.cluster-qe.lab.eng.brq.redhat.com.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/dev/storage-big/root-virt-031.cluster-qe.lab.eng.brq.redhat.com,if=none,id=drive-virtio-disk0,format=raw,cache=unsafe,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 -netdev tap,fd=140,id=hostnet0,vhost=on,vhostfd=141 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=1a:00:00:00:00:1f,bus=pci.0,addr=0x3,bootindex=1 -netdev tap,fd=144,id=hostnet1,vhost=on,vhostfd=145 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:01:00:1f,bus=pci.0,addr=0x5 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device i6300esb,id=watchdog0,bus=pci.0,addr=0x7 -watchdog-action reset -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -msg timestamp=on

Comment 5 michal novacek 2017-06-28 08:14:42 UTC

Did you succeed in reproducing the problem?

Comment 6 Bandan Das 2017-07-18 22:17:15 UTC

(In reply to michal novacek from comment #5)
> Did you succeed in reproducing the problem?

Sorry for the delay. I can reproduce using "halt -fin" BTW, what's the "-i" option ? 

However, I am wondering what the expected behavior is. According to the watchdog manpage, the watchdog daemon can be stopped without causing a reboot if /dev/watchdog is closed correctly. I am wondering whether halt (even with the f flag) does result in closing /dev/watchdog. 

CONFIG_WATCHDOG_NOWAYOUT (which overrides this behavior) is not enabled in the RHEL kernel. I will try to build a kernel with that option enabled and see if it makes a difference.

Comment 7 Bandan Das 2017-07-20 17:37:47 UTC

I compiled a guest kernel with CONFIG_WATCHDOG_NOWAYOUT and sure enough, halt -fin execution results in a boot. This confirms my suspicion in comment 6 that "halt -fin" closes /dev/watchdog which means the watchdog will never fire. 

Is the test procedure essential or more importantly, do you know what the behavior with real watchdog hardware is as far as "halt -fin" is concerned ?

Comment 8 michal novacek 2017-08-24 12:34:47 UTC

The procedure with 'halt -f' (-in is not really doing anything) is from other test we use to test cluster fencing. 

It seems that the behavior is expected and not a bug.

Thanks for the valuable feedback, saved me ton of time. 

Closing as NOTABUG.