Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Description of problem:
After force kill sanlock service and start it , config something need use it first like libvirt ,wdmd will keep failed rem .It will lead the host auto reboot on low memory box , through my test , both 2 8G's reboot and a 12G pass it after hang a long while.
Version-Release number of selected component (if applicable):
sanlock-2.6-2.el6.x86_64
libvirt-0.10.2-12.el6.x86_64
kernel-2.6.32-345.el6.x86_64
How reproducible:
100%
Steps to Reproduce:
1.Config libvirt for using sanlock
#tail -5 /etc/libvirt/qemu-sanlock.conf
user = "sanlock"
group = "sanlock"
host_id = 1
auto_disk_leases = 1
disk_lease_dir = "/var/lib/libvirt/sanlock"
# tail -1 /etc/libvirt/qemu.conf
lock_manager = "sanlock"
# getsebool -a | grep sanlock
sanlock_use_fusefs --> off
sanlock_use_nfs --> on
sanlock_use_samba --> off
virt_use_sanlock --> on
2.Kill and start sanlock service
# ps aux | grep sanlock
root 1740 0.0 0.0 103244 824 pts/0 S+ 17:11 0:00 grep sanlock
root 1773 0.0 0.0 13548 3316 ? SLs Dec17 0:00 wdmd -G sanlock
sanlock 1795 0.0 0.3 342108 23068 ? SLsl Dec17 0:16 sanlock daemon -U sanlock -G sanlock
root 1796 0.0 0.0 23076 288 ? S Dec17 0:00 sanlock daemon -U sanlock -G sanlock
root 1810 0.0 0.0 18832 396 ? Ss Dec17 0:00 fence_sanlockd -w
#kill -9 1795
#service sanlock start
#service libvirtd restart
3.Check it
//use a non relevent virsh command
#virsh nodeinfo
//It will hang here , and there are lots of log below appeared
wdmd[1819]: test failed rem 59 now 229186 ping 229175 close 229185 renewal 229106 expire 229186 client 1841 sanlock___LIBVIRT__DISKS__:1
BTW if no virsh command , the log also appear , it's just a dirct way to check if hang.And the keep request will lead to host auto reboot . On my test , two 8G memory machines appeared the situation . For another 12G's , it will ok after hang a long while.
4.
If virsh command works well , just force kill sanlock and restart it again
Actual results:
hang and reboot
Expected results:
Should work well
Additional info:
A part of /var/log/message log:
wdmd[1819]: test warning now 229185 ping 229175 close 0 renewal 229106 expire 229186 client 1841 sanlock___LIBVIRT__DISKS__:1
Dec 17 10:36:20 localhost wdmd[1819]: /dev/watchdog closed unclean
Dec 17 10:36:20 localhost kernel: iTCO_wdt: Unexpected close, not stopping watchdog!
Dec 17 10:36:21 localhost wdmd[1819]: test failed rem 59 now 229186 ping 229175 close 229185 renewal 229106 expire 229186 client 1841 sanlock___LIBVIRT__DISKS__:1
Dec 17 10:36:22 localhost wdmd[1819]: test failed rem 58 now 229187 ping 229175 close 229185 renewal 229106 expire 229186 client 1841 sanlock___LIBVIRT__DISKS__:1
Dec 17 10:36:23 localhost wdmd[1819]: test failed rem 57 now 229188 ping 229175 close 229185 renewal 229106 expire 229186 client 1841 sanlock___LIBVIRT__DISKS__:1
........snip
Dec 17 10:37:14 localhost wdmd[1819]: test failed rem 6 now 229239 ping 229175 close 229185 renewal 229106 expire 229186 client 1841 sanlock___LIBVIRT__DISKS__:1
Dec 17 10:37:15 localhost wdmd[1819]: test failed rem 5 now 229240 ping 229175 close 229185 renewal 229106 expire
------------------------Here should be the time line of reboot---------------
Dec 17 10:38:09 localhost kernel: imklog 5.8.10, log source = /proc/kmsg started.
Dec 17 10:38:09 localhost rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="1666" x-info="http://www.rsyslog.com"] start
Dec 17 10:38:09 localhost kernel: Initializing cgroup subsys cpuset
Dec 17 10:38:09 localhost kernel: Initializing cgroup subsys cpu
Dec 17 10:38:09 localhost kernel: Linux version 2.6.32-345.el6.x86_64 (mockbuild.bos.redhat.com) (gcc version 4.4.6 20120305 (Red Hat 4.4.6-4) (GCC) ) #1 SMP Wed Nov 28 21:10:19 EST 2012
libvirt is using sanlock when you restart it, so the reboot is expected.
If you don't want a reboot, then you need to cleanly shut down libvirt
before restarting sanlock.
(Also, you should not be running fence_sanlockd, that is *only* for use with the fence_sanlock agent in the cluster product.)
Sorry, after thinking about this again, comment 2 was probably not entirely correct. If a pid holding a lease (like libvirt) restarts, sanlock should simply release the leases, it should not cause a wdmd reboot. To get a wdmd reboot, the access to the lease storage (the __LIBVIRT_DISKS__ file) must have been lost. If you include sanlock errors/warnings in /var/log/messages or /var/log/sanlock.log, then I was probably explain what happened.
(In reply to comment #4)
> Ah, I think I see now -- you appear to be doing kill -9 on the sanlock
> daemon while it's being used. That correctly causes wdmd to reboot the
> machine.
Thanks your replay , David
En..Looks like this can be closed as Not bug though it's not nice enough for using and testing...
Are there some plans to improvement it? As some users like me just want to restart the sanlock service after change the config files , but when sanlock service locked by something others , force kill and restart it is a fast and simple method , however i know it's not a suggessed action :)
To shut down sanlock without causing a wdmd reboot, you can run the following command: "sanlock client shutdown -f 1"
This will cause sanlock to kill any pid's that are holding leases, release those leases, and then exit.
Comment 7RHEL Program Management
2012-12-24 06:49:43 UTC
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.