Bug 985824
| Summary: | libvirt sanlock stopping problem | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Mustafa Cantürk <zifiribir> |
| Component: | sanlock | Assignee: | David Teigland <teigland> |
| Status: | CLOSED NOTABUG | QA Contact: | |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 6.4 | CC: | agk, ajia, cluster-maint, cwei, devin.bougie, dyuan, lsu, rbalakri, teigland, xuzhang, zifiribir |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2015-09-30 14:14:29 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Attachments: | |||
Hi Mustafa , I'm a libvirt qe , as my experience , you can use #sanlock client shutdown -f 1 to force stop sanlock service.Otherwise , it will locked by wdmd or something others. And from your comment , i don't think it's neither a libvirt issue , nor sanlock maybe. So would you like to provie more infomation with the format below for further debug? Thanks. Such as packages version , sanlock , wdmd , libvirtd ,service status , your sanlock , libvirtd's config file's contents. Logs about /var/log/message , /var/log/libvirt/libvirtd.log , sanlock client log_dump , your expected result and the actual result , your machines status ,etc. Thanks. Descriptions: XXXXX Version-Release number of selected component (if applicable) //libvirt , sanlock , kernel i think is ok How reproducible: Steps: 1 2 3 Expected Result: XX Actual Result: XX Additional info: XX Created attachment 791402 [details]
/var/log/messages of kvm1, after try to poweroff kvm2
Created attachment 791403 [details]
sanlock client log_dump of kvm1, after try to poweroff kvm2
Created attachment 791405 [details]
/var/log/libvirt/libvirtd.log of kvm1, after try to poweroff kvm2
Hi, I was on a vacation. Sorry for delay. I don't want to force stopping sanlock service. When "service sanlock stop" command executed, it should be stopped. I don't use wdmd. I have to machines in cluster: kvm2 (shutting down machine) kvm1 (other machine) The problem is; I want to shutdown kvm2 machine. kvm2 not not shutting down because GFS2 cannot be unmounted. Also this situation kills virtual machines of kvm1. Logs are belong to kvm1 machine. I uploaded them as attachments. Versions: cman-3.0.12.1-49.el6.x86_64 gfs2-utils-3.0.12.1-49.el6.x86_64 libvirt-0.10.2-18.el6_4.8.x86_64 sanlock-2.6-2.el6.x86_64 libvirt-lock-sanlock-0.10.2-18.el6_4.8.x86_64 Service statuses: [root@kvm1 ~]# service cman status cluster is running. [root@kvm1 ~]# service gfs2 status Configured GFS2 mountpoints: /KVM Active GFS2 mountpoints: /KVM [root@kvm1 ~]# service sanlock status sanlock (pid 1856 1855) is running... [root@kvm1 ~]# service wdmd status wdmd is stopped [root@kvm1 ~]# service libvirtd status libvirtd (pid 2660) is running... Expected result: Shutting down machine (kvm2) to be shut. Actual result: Shutting down machine (kvm2) couldn't be shut because of GFS2 volume that cannot be unmounted. --- As I understand, the problem arises from "libvirt-lock-sanlock" package. Libvirt uses "sanlock.so" module file to manage locks inside the package. When libvirtd service started, "sanlock.so" module creates a lockspace as "__LIBVIRT__DISKS__". The lockspace uses "/KVM/Sanlock/__LIBVIRT__DISKS__" file on my system, it's inside GFS2 volume. When stopping libvirtd service (because of shutting down the machine), "sanlock.so" module doesn't delete the lockspace! Afterwards, sanlock file on GFS2 volume remains opened. GFS2 volume cannot be unmounted when shutting down the machine (kvm2). Because there is an open file! However, the machine quits from the cluster network. By the way, the other machine says: "the shutting down machine (kvm2) quit from cluster network, but didn't unmount GFS2 volume. I must cut my GFS2 I/O." kvm1 cuts own GFS2 I/O. Then sanlock process kill all virtual machine processes. I lost my all virtual machines on kvm1. kernel version: [root@kvm1 ~]# uname -r 2.6.32-358.6.2.el6.x86_64 Libvirt cannot remove the lock space file since it is stored on the shared storage and is used by all hosts. If the file is still open on kvm2, I guess it could be because sanlock was not shutdown yet. The shutdown procedure should stop libvirt-guests, libvirtd, and sanlock services before attempting to unmount GFS. If it is not doing that, it's a problem of GFS stopping too early. Can you check the shutdown sequence is correct? (In reply to Jiri Denemark from comment #10) > Libvirt cannot remove the lock space file since it is stored on the shared > storage and is used by all hosts. If the file is still open on kvm2, I guess > it could be because sanlock was not shutdown yet. The shutdown procedure > should stop libvirt-guests, libvirtd, and sanlock services before attempting > to unmount GFS. If it is not doing that, it's a problem of GFS stopping too > early. Can you check the shutdown sequence is correct? No. I migrate all the virtual machines manually to the other node before shutting down kvm2. There is no virtual machine resource or sanlock resource remained on kvm2. Last resource remain on GFS2 filesystem is sanlock lockspace file. I can stop libvirtd service manually. I can remove manually lockspace file via sanlock rem_lockspace -s {lockspace_name}. However, when stopping libvirtd service, sanlock lockspace should be removed. I looked at libvirtd-sanlock module (/usr/lib64/libvirt/lock-driver/sanlock.so) source code. I didn't find any code to remove sanlock lockspace file. Libvirtd sanlock module doesn't remove the lockspace file, because there is no code to remove it. Well, I can remove manual way. Could you please check the shutdown sequence as requested in comment #10? If sanlock service stops before unmounting GFS, I believe unmounting GFS should succeed. No. Sanlock service doesn't stop on shutdown sequence. Shutdown sequence tries to stop sanlock before GFS2. However, cannot stop. Our main problem is that sanlock is not stopping.
Sanlock service doesn't stop, because there is libvirt lockspace on it. When I remove libvirt lockspace (not file), it shuts down.
I'm providing 2 screenshots: {shutdown sequence}, {stopping sanlock in manual way successfully}. On this scenerio; I'm shuttung down kvm1, and the other node kvm2.
On the second screenshot, you can see why sanlock not stopping. I stopped libvirtd service. Then, I tried to stop sanlock service. It cannot stop because there is libvirt lockspace attached to the sanlock service. I removed libvirt lockspace from the sanlock service (Not removed lockspace file). Then, I tried stopping sanlock service. It stops successfully.
Created attachment 894882 [details]
Shutdown sequence
Created attachment 894883 [details]
stopping sanlock in manual way successfully
Ah, good, that makes it clear to me. And I was even able to reproduce this by enabling auto_disk_leases in qemu-sanlock.conf. However, I don't think sanlock should rely on libvirt to remove the lockspace since if libvirtd crashes or deadlocks, it won't do it and sanlock should still be able to stop. David, what do you think? Am I missing something obvious or is this really a sanlock issue? Hi,
I'm providing my {sanlock configuration}.
--
This is not an sanlock issue. This is libvirt-sanlock driver issue.
As I understand; sanlock is a single SAN locking mechanism, it has no relationship to virtualization.
Libvirtd is a virtualization hypervisor. It has no relationship to SAN concept and its locking mechanisms.
There is one place both "virtualization" and "SAN locking" concepts standed together: libvirt-sanlock driver.
Without libvirt-sanlock driver, libvirtd service doesn't create "__LIBVIRT_DISKS__" lockspace.
I think this code:
http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/locking/lock_driver_sanlock.c;h=113fd479c767583724c9f9185d3d42e427dfc3d6;hb=v0.10.2-maint
virLockManagerSanlockSetupLockspace function:
-> stage 1: creates "__LIBVIRT_DISKS__" lockspace file
-> stage 2: attaches/registers node to the "__LIBVIRT_DISKS__" lockspace.
creating "__LIBVIRT_DISKS__" lockspace and attaching/registering node to it triggered by "service libvirtd start".
Before starting libvirtd service, no __LIBVIRT_DISKS__ lockspace created on the sanlock service. After started libvirtd service, "sanlock status" command shows "__LIBVIRT_DISKS__" lockspace.
I expect that deattach/unregister from "__LIBVIRT_DISKS__" lockspace when I executed "service libvirtd stop" (if there is no sanlock resources / if no virtual machines running).
I think there should be codes to be executed that "unregisters from lockspace if there is no resource/no virtual machines running", when triggered "service libvirt-guests stop" and "service libvirtd stop".
Created attachment 895137 [details]
sanlock configuration
"sanlock shutdown -f 1" will stop sanlock when lockspaces exist. In the general case we assume that lockspaces imply sanlock is being used. PS. I would not recommend using sanlock on top of gfs2. Use file locks on top of gfs2, or use a shared block device under sanlock. Hi David, (In reply to David Teigland from comment #20) > PS. I would not recommend using sanlock on top of gfs2. Use file locks on > top of gfs2, or use a shared block device under sanlock. I would greatly appreciate a little more clarification on this. We followed the virtual machine disk locking documentation (http://libvirt.org/locking.html), and have our disk_lease_dir set to a shared GFS2 file system. Likewise, our VM XML definitions are on the same GFS2 file system. However, our KVM virtual machines are using clustered logical volumes for their block devices. For example, here are a few configuration excerpts. - From /etc/cluster/cluster.conf: <vm autostart="0" domain="fd14" migrate="live" name="lnx91" path="/gfs/cluster/libvirt/qemu" recovery="relocate"/> - From /gfs/cluster/libvirt/qemu/lnx91.xml: <disk type='block' device='disk'> <driver name='qemu' type='raw' cache='none' io='native'/> <source dev='/dev/vgift1/lnx91'/> <target dev='vda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </disk> - From /etc/libvirt/quemu-sanlock.conf: host_id = 4 auto_disk_leases = 1 disk_lease_dir = "/gfs/cluster/libvirt/sanlock" In general this works well, but periodically we see warnings and eventual errors that break sanlock and subsequently each clustered VM. This is consistent across all three of our SL6 clusters (two at SL6.4, one at fully updated SL6.5). We see no other indications or problems with our iSCSI setup or GFS2 file systems. Please see below for excerpts from /var/log/messages. I would greatly appreciate any suggestions for resolving or debugging this problem. I would be happy to provide entire configuration or log files, or open a separate ticket or thread with linux-cluster if appropriate. Many thanks, Devin ------ May 5 20:10:09 lnx904 sanlock[20306]: 2014-05-05 20:10:09-0400 4519547 [20460]: s1 renewed 4519542 delta_length 28 too long May 5 20:23:24 lnx904 sanlock[20306]: 2014-05-05 20:23:24-0400 4520341 [20460]: s1 renewed 4520312 delta_length 37 too long May 5 20:36:11 lnx904 sanlock[20306]: 2014-05-05 20:36:11-0400 4521109 [20460]: s1 renewed 4521082 delta_length 27 too long May 5 20:45:31 lnx904 sanlock[20306]: 2014-05-05 20:45:31-0400 4521668 [20460]: s1 renewed 4521652 delta_length 24 too long May 5 20:58:44 lnx904 sanlock[20306]: 2014-05-05 20:58:44-0400 4522461 [20460]: s1 renewed 4522438 delta_length 25 too long May 5 21:12:13 lnx904 sanlock[20306]: 2014-05-05 21:12:13-0400 4523270 [20460]: s1 renewed 4523249 delta_length 21 too long May 5 21:26:00 lnx904 sanlock[20306]: 2014-05-05 21:26:00-0400 4524097 [20460]: s1 renewed 4524072 delta_length 31 too long May 5 21:39:44 lnx904 sanlock[20306]: 2014-05-05 21:39:44-0400 4524922 [20306]: s1 check_our_lease warning 60 last_success 4524862 May 5 21:39:45 lnx904 sanlock[20306]: 2014-05-05 21:39:45-0400 4524923 [20306]: s1 check_our_lease warning 61 last_success 4524862 May 5 21:39:46 lnx904 sanlock[20306]: 2014-05-05 21:39:46-0400 4524924 [20306]: s1 check_our_lease warning 62 last_success 4524862 May 5 21:39:47 lnx904 sanlock[20306]: 2014-05-05 21:39:47-0400 4524925 [20306]: s1 check_our_lease warning 63 last_success 4524862 May 5 21:39:48 lnx904 sanlock[20306]: 2014-05-05 21:39:48-0400 4524926 [20306]: s1 check_our_lease warning 64 last_success 4524862 May 5 21:39:49 lnx904 sanlock[20306]: 2014-05-05 21:39:49-0400 4524927 [20306]: s1 check_our_lease warning 65 last_success 4524862 May 5 21:39:50 lnx904 sanlock[20306]: 2014-05-05 21:39:50-0400 4524928 [20306]: s1 check_our_lease warning 66 last_success 4524862 May 5 21:39:51 lnx904 sanlock[20306]: 2014-05-05 21:39:51-0400 4524929 [20306]: s1 check_our_lease warning 67 last_success 4524862 May 5 21:39:52 lnx904 sanlock[20306]: 2014-05-05 21:39:52-0400 4524930 [20306]: s1 check_our_lease warning 68 last_success 4524862 May 5 21:39:53 lnx904 sanlock[20306]: 2014-05-05 21:39:53-0400 4524931 [20306]: s1 check_our_lease warning 69 last_success 4524862 May 5 21:39:54 lnx904 sanlock[20306]: 2014-05-05 21:39:54-0400 4524931 [20460]: s1 renewed 4524900 delta_length 48 too long May 5 21:53:38 lnx904 sanlock[20306]: 2014-05-05 21:53:38-0400 4525755 [20306]: s1 check_our_lease warning 60 last_success 4525695 May 5 21:53:39 lnx904 sanlock[20306]: 2014-05-05 21:53:39-0400 4525756 [20306]: s1 check_our_lease warning 61 last_success 4525695 May 5 21:53:39 lnx904 sanlock[20306]: 2014-05-05 21:53:39-0400 4525757 [20460]: s1 renewed 4525724 delta_length 41 too long May 5 22:21:42 lnx904 sanlock[20306]: 2014-05-05 22:21:42-0400 4527439 [20460]: s1 renewed 4527420 delta_length 37 too long May 5 22:35:47 lnx904 sanlock[20306]: 2014-05-05 22:35:47-0400 4528285 [20306]: s1 check_our_lease warning 60 last_success 4528225 May 5 22:35:48 lnx904 sanlock[20306]: 2014-05-05 22:35:48-0400 4528286 [20306]: s1 check_our_lease warning 61 last_success 4528225 May 5 22:35:49 lnx904 sanlock[20306]: 2014-05-05 22:35:49-0400 4528287 [20306]: s1 check_our_lease warning 62 last_success 4528225 May 5 22:35:50 lnx904 sanlock[20306]: 2014-05-05 22:35:50-0400 4528288 [20306]: s1 check_our_lease warning 63 last_success 4528225 May 5 22:35:51 lnx904 sanlock[20306]: 2014-05-05 22:35:51-0400 4528289 [20306]: s1 check_our_lease warning 64 last_success 4528225 May 5 22:35:52 lnx904 sanlock[20306]: 2014-05-05 22:35:52-0400 4528290 [20306]: s1 check_our_lease warning 65 last_success 4528225 May 5 22:35:53 lnx904 sanlock[20306]: 2014-05-05 22:35:53-0400 4528290 [20460]: s1 renewed 4528266 delta_length 45 too long May 5 23:30:55 lnx904 sanlock[20306]: 2014-05-05 23:30:55-0400 4531592 [20460]: s1 renewed 4531559 delta_length 35 too long May 6 21:26:28 lnx904 sanlock[20306]: 2014-05-06 21:26:28-0400 4610525 [20306]: s1 check_our_lease warning 60 last_success 4610465 May 6 21:26:29 lnx904 sanlock[20306]: 2014-05-06 21:26:29-0400 4610526 [20306]: s1 check_our_lease warning 61 last_success 4610465 May 6 21:26:30 lnx904 sanlock[20306]: 2014-05-06 21:26:30-0400 4610527 [20306]: s1 check_our_lease warning 62 last_success 4610465 May 6 21:26:31 lnx904 sanlock[20306]: 2014-05-06 21:26:31-0400 4610528 [20306]: s1 check_our_lease warning 63 last_success 4610465 May 6 21:26:32 lnx904 sanlock[20306]: 2014-05-06 21:26:32-0400 4610529 [20306]: s1 check_our_lease warning 64 last_success 4610465 May 6 21:26:33 lnx904 sanlock[20306]: 2014-05-06 21:26:33-0400 4610530 [20306]: s1 check_our_lease warning 65 last_success 4610465 May 6 21:26:34 lnx904 sanlock[20306]: 2014-05-06 21:26:34-0400 4610531 [20306]: s1 check_our_lease warning 66 last_success 4610465 May 6 21:26:35 lnx904 sanlock[20306]: 2014-05-06 21:26:35-0400 4610532 [20306]: s1 check_our_lease warning 67 last_success 4610465 May 6 21:26:36 lnx904 sanlock[20306]: 2014-05-06 21:26:36-0400 4610533 [20306]: s1 check_our_lease warning 68 last_success 4610465 May 6 21:26:37 lnx904 sanlock[20306]: 2014-05-06 21:26:37-0400 4610534 [20306]: s1 check_our_lease warning 69 last_success 4610465 May 6 21:26:38 lnx904 sanlock[20306]: 2014-05-06 21:26:38-0400 4610535 [20306]: s1 check_our_lease warning 70 last_success 4610465 May 6 21:26:39 lnx904 sanlock[20306]: 2014-05-06 21:26:39-0400 4610536 [20306]: s1 check_our_lease warning 71 last_success 4610465 May 6 21:26:40 lnx904 sanlock[20306]: 2014-05-06 21:26:40-0400 4610537 [20306]: s1 check_our_lease warning 72 last_success 4610465 May 6 21:26:41 lnx904 sanlock[20306]: 2014-05-06 21:26:41-0400 4610538 [20306]: s1 check_our_lease warning 73 last_success 4610465 May 6 21:26:42 lnx904 sanlock[20306]: 2014-05-06 21:26:42-0400 4610539 [20306]: s1 check_our_lease warning 74 last_success 4610465 May 6 21:26:43 lnx904 sanlock[20306]: 2014-05-06 21:26:43-0400 4610540 [20306]: s1 check_our_lease warning 75 last_success 4610465 May 6 21:26:44 lnx904 wdmd[3489]: test warning now 4610541 ping 4610531 close 4174791 renewal 4610465 expire 4610545 client 20306 sanlock___LIBVIRT__DISKS__:4 May 6 21:26:44 lnx904 sanlock[20306]: 2014-05-06 21:26:44-0400 4610541 [20306]: s1 check_our_lease warning 76 last_success 4610465 May 6 21:26:45 lnx904 wdmd[3489]: test warning now 4610542 ping 4610531 close 4610541 renewal 4610465 expire 4610545 client 20306 sanlock___LIBVIRT__DISKS__:4 May 6 21:26:45 lnx904 sanlock[20306]: 2014-05-06 21:26:45-0400 4610542 [20306]: s1 check_our_lease warning 77 last_success 4610465 May 6 21:26:46 lnx904 wdmd[3489]: test warning now 4610543 ping 4610531 close 4610541 renewal 4610465 expire 4610545 client 20306 sanlock___LIBVIRT__DISKS__:4 May 6 21:26:46 lnx904 sanlock[20306]: 2014-05-06 21:26:46-0400 4610543 [20306]: s1 check_our_lease warning 78 last_success 4610465 May 6 21:26:47 lnx904 wdmd[3489]: test warning now 4610544 ping 4610531 close 4610541 renewal 4610465 expire 4610545 client 20306 sanlock___LIBVIRT__DISKS__:4 May 6 21:26:47 lnx904 sanlock[20306]: 2014-05-06 21:26:47-0400 4610544 [20306]: s1 check_our_lease warning 79 last_success 4610465 May 6 21:26:48 lnx904 wdmd[3489]: test failed rem 56 now 4610545 ping 4610531 close 4610541 renewal 4610465 expire 4610545 client 20306 sanlock___LIBVIRT__DISKS__:4 May 6 21:26:48 lnx904 sanlock[20306]: 2014-05-06 21:26:48-0400 4610545 [20306]: s1 check_our_lease failed 80 May 6 21:26:48 lnx904 sanlock[20306]: 2014-05-06 21:26:48-0400 4610545 [20306]: s1 kill 20901 sig 15 count 1 May 6 21:26:48 lnx904 sanlock[20306]: 2014-05-06 21:26:48-0400 4610545 [20306]: dead 20901 ci 2 count 1 May 6 21:26:48 lnx904 sanlock[20306]: 2014-05-06 21:26:48-0400 4610546 [20306]: s1 all pids clear May 6 21:27:44 lnx904 sanlock[20306]: 2014-05-06 21:27:44-0400 4610601 [20460]: s1 renewed 4610601 delta_length 115 too long Just think about what sanlock and gfs2/dlm are conceptually and existentially, and it doesn't make any sense to put sanlock on top of gfs2/dlm. sanlock was designed explicitly for the case where no internode locking is available, so it uses disk i/o to provide a primitive equivalent. gfs2/dlm are designed explicitly to use sophisticated and tightly coupled internode locking. If you have gfs2/dlm, you already have far more locking capability than you need, and there's no need to emulate locks by writing to disk blocks which gfs2/dlm then translate to specific locking. An analogy would be writing a program that would be suitable to write in c++. Instead of writing it in c++, you write it in assembly language with a translator to translate your assembly code into c++, which you then compile into the final program. Hi Devin, We encountered "delta_length 115 too long" problem in our GFS2 cluster. In our case, GFS2 more than 85 percent of disk is full. GFS2 slows down when disk capacity more than 85 percent full. Because it searches free inodes. Under the circumstances, sanlock struggles with writing its own locks. If sanlock can't write its own locks, kills all virtual machines on the node. Maybe you have the same case. Our solution was extending the disk. -- Hi David, I want to stop sanlock natural way. If I want to execute a command manually, I can also execute "sanlock rem_lockspace ..." command. I have a solution to this. However, if I change "/etc/init.d/sanlock" stop() function to remove lockspace before stopping it, the "/etc/init.d/sanlock" file will change in next "yum update". It's unmanageable and complex. This is not a solution. I need a simple and manageable way. I think this is a more general problem GFS2-sanlock-libvirt triple cannot work together a compatible way. Can libvirtd use GFS2/DLM as a locking mechanism? How can we use? I didn't understand what you said about "shared block device" mechanism? Could you tell about it? From a product perspective, sanlock is only used by the RHEV product, so the RHEV requirements are the only ones that have had any influence on the design and behavior. No one in RH has tried to use gfs2/sanlock/libvirt, and I doubt that we would. That said, I'd like to get it to work as you need. We could come up with various work-arounds for the init script, including: - a shutdown force option in /etc/sysconfig/sanlock - using a separate script that coordinates stopping the various pieces that you have put together. My best advice, though, is to either: - Use file locks on gfs2. The virtlockd was written to do this. - Use plain shared LVs under sanlock. sanlock and the algorithms it uses are designed around a shared block device. The sanlock man page gives some examples of this, but I don't know how to configure libvirt to do this. (In reply to David Teigland from comment #24) > My best advice, though, is to either: > - Use file locks on gfs2. The virtlockd was written to do this. Does RHEL6 support virtlockd? Unfortunately I haven't found it yet. > - Use plain shared LVs under sanlock. I've long felt like this was the right approach and that it didn't make sense to configure sanlock to store the leases on gfs2. However, I haven't yet figured out how to do this. Moreover, the virtual machine disk locking documentation states that "The sanlock plugin needs to create leases in a directory that is on a filesystem shared between all hosts running virtual machines. Obvious choices for this include NFS or GFS2." (http://libvirt.org/locking.html#sanlockstorage). Thanks again for all of your time and help. I feel like I'm hijacking this thread, so maybe I'll continue this on linux-cluster or a new bugzilla report. (And in response to Mustafa, our GFS2 file systems are only about 10% full. Thanks, though!) Hi David, How can I add "shutdown force" option to /etc/sysconfig/sanlock file? SANLOCKOPTS variable is starting parameters to my knowledge. I looked out sanlock man page and I didn't find parameter to force shutdown at program start. If you know, could you help? I had wrote a script to manage shutdown process. However, it was too complex and I gave it up. Managing shutdown process manually is more simple :) Indeed, I opened this bugzilla case, if Redhat change the repository code and make GFS2/Sanlock/Libvirt work in compatible way in RHEL6. Because this case is problematic. As I understand, we should wait RHEL 7 for a regular solution to locking virtual machines between nodes? RHEL7 comes with virtlockd. Could you try to write a patch for the sanlock init script so that it could optionally use "sanlock shutdown -f 1" if configured under sysconfig? If that works for you, I'm suggesting that we could try to include it in a RHEL6 update. Thanks David. I will try to write it. When I completed, I will write here and e-mail to you. Hi Mustafa, Getting back to your initial bug report, we don't see this problem on any of our EL6 clusters (two 9-node, one 4-node). We are able to restart cluster members cleanly even if they're running a VM and have an active sanlock lease. Other than the fact that we don't run any two-node clusters, our libvirtd and sanlock configuration looks very similar. The one problem I see when rebooting a cluster member is that VM's are restarted on different cluster members rather than being migrated. I would like to see an attempt at migrating clustered VM's first, and if that fails, then proceed with a restart. If possible, I might try adding a third cluster member and seeing if that resolves your problem. Devin Hi Devin, I will try tree node cluster system in another test environment. Are sanlock service stopping regularly? Is there a __LIBVIRT_DISKS__ lockspace? Is it unregistered from __LIBVIRT_DISKS__ lockspace? And how? Could you provide information? Are you using fencing? On my two-node cluster, fencing get rid of crashing my other node. However, I didn't find this situation healthy. I am making migration manually. I wrote a script to move virtual machines to other node. Before shutting down the node, I execute this script manually. I'm not using rgmanager to manage my virtual machines. Are you using rgmanager? Hi Mustafa, (In reply to Mustafa Cantürk from comment #30) > Are sanlock service stopping regularly? Yes. > Is there a __LIBVIRT_DISKS__ lockspace? Yes. [root@lnx904 ~]# sanlock client status daemon ff492f80-b71b-4cdf-99a2-80e7191ffac9.lnx904.cla p -1 helper p -1 listener p 16122 lnx91 p -1 status s __LIBVIRT__DISKS__:4:/gfs/cluster/libvirt/sanlock/__LIBVIRT__DISKS__:0 r __LIBVIRT__DISKS__:b5aba85e1694b1e744fbe53e007bc251:/gfs/cluster/libvirt/sanlock/b5aba85e1694b1e744fbe53e007bc251:0:30 p 16122 > Is it unregistered from __LIBVIRT_DISKS__ lockspace? And how? > Could you provide information? If I issue a "shutdown -r now" (for example) on a node that currently hosts a VM, rgmanager stops the VM, sanlock shuts down gracefully, the GFS file systems unmount, and the system reboots gracefully. Here's a few excerpts from /var/log/messages with respect to the shutdown sequence and rgmanager. ------ ... May 15 10:35:12 lnx904 init: tty (/dev/tty2) main process (4142) killed by TERM signal May 15 10:35:12 lnx904 init: tty (/dev/tty3) main process (4144) killed by TERM signal May 15 10:35:12 lnx904 init: tty (/dev/tty4) main process (4147) killed by TERM signal May 15 10:35:12 lnx904 init: tty (/dev/tty5) main process (4149) killed by TERM signal May 15 10:35:12 lnx904 init: tty (/dev/tty6) main process (4151) killed by TERM signal May 15 10:35:12 lnx904 init: serial (ttyS0) main process (4153) killed by TERM signal May 15 10:35:17 lnx904 modclusterd: shutdown succeeded May 15 10:35:18 lnx904 rgmanager[3895]: Shutting down May 15 10:35:18 lnx904 rgmanager[3895]: Shutting down May 15 10:35:18 lnx904 rgmanager[3895]: Stopping service vm:lnx91 ... May 15 10:35:44 lnx904 modclusterd: startup succeeded ... May 15 10:36:03 lnx904 kernel: br0: port 2(vnet0) entering disabled state May 15 10:36:03 lnx904 kernel: device vnet0 left promiscuous mode May 15 10:36:03 lnx904 kernel: br0: port 2(vnet0) entering disabled state May 15 10:36:03 lnx904 rgmanager[3895]: Service vm:lnx91 is stopped May 15 10:36:04 lnx904 rgmanager[3895]: Disconnecting from CMAN ... May 15 10:36:19 lnx904 rgmanager[3895]: Exiting May 15 10:36:21 lnx904 ricci: shutdown succeeded May 15 10:36:21 lnx904 oddjobd: oddjobd shutdown succeeded May 15 10:36:44 lnx904 saslauthd[4081]: server_exit : master exited: 4081 ... May 15 10:36:50 lnx904 xinetd[3286]: Exiting... May 15 10:36:51 lnx904 kernel: nfsd: last server has exited, flushing export cache May 15 10:36:51 lnx904 rpc.mountd[7594]: Caught signal 15, un-registering and exiting. May 15 10:36:54 lnx904 pcscd: pcscdaemon.c:581:signal_trap() Preparing for suicide May 15 10:36:55 lnx904 pcscd: readerfactory.c:1267:RFCleanupReaders() entering cleaning function May 15 10:36:55 lnx904 pcscd: pcscdaemon.c:531:at_exit() cleaning /var/run May 15 10:36:56 lnx904 ntpd[3294]: ntpd exiting on signal 15 May 15 10:36:56 lnx904 gfs_controld[2675]: cpg_dispatch error 2 May 15 10:37:04 lnx904 init: Disconnected from system bus May 15 10:37:04 lnx904 multipathd: --------shut down------- May 15 10:37:04 lnx904 rpcbind: rpcbind terminating on signal. Restart with "rpcbind -w" May 15 10:37:05 lnx904 auditd[1614]: The audit daemon is exiting. May 15 10:37:05 lnx904 kernel: type=1305 audit(1400164625.036:77937): audit_pid=0 old=1614 auid=4294967295 ses=4294967295 res=1 May 15 10:37:05 lnx904 kernel: type=1305 audit(1400164625.088:77938): audit_enabled=0 old=1 auid=4294967295 ses=4294967295 res=1 May 15 10:37:05 lnx904 kernel: Kernel logging (proc) stopped. May 15 10:37:05 lnx904 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="30900" x-info="http://www.rsyslog.com"] exiting on signal 15. May 15 10:40:00 lnx904 kernel: imklog 5.8.10, log source = /proc/kmsg started. ... ------ > Are you using fencing? On my two-node cluster, fencing get rid of crashing > my other node. However, I didn't find this situation healthy. Yes, we are using fencing. Moreover, we can reboot a system without manually fencing that system. ------ [root@lnx904 ~]# fence_tool ls fence domain member count 4 victim count 0 victim now 0 master nodeid 1 wait state none members 1 2 3 4 ------ > I am making migration manually. I wrote a script to move virtual machines to > other node. Before shutting down the node, I execute this script manually. > I'm not using rgmanager to manage my virtual machines. Are you using > rgmanager? Yes, and we also migrate machines manually before manually rebooting a system. However, it would be nice if rgmanager could at least attempt to migrate a VM before stopping it when the host shuts down. I hope this helps. Devin Thanks for your help, Devin. However, it is not working also in three node cluster. Sanlock service doesn't stop. Have you ever monitored restart process from ILO, iDRAC or something else to see local console/tty screen? I see there sanlock not stopping and GFS2 not unmounting. I didn't see it over SSH connection. SSH connections drop. However, on my situation; reboot reason of the machine is not my "reboot" command. Master cluster node fences it. Could you look at /var/log/cluster/fenced.log in your master cluster node? Maybe your reboot of machines arises from fencing? If it is not, what unregisters the __LIBVIRT_LOCKSPACE lockspace in your situation? Do you have any idea about it? Hi Mustafa, (In reply to Mustafa Cantürk from comment #32) > Have you ever monitored restart process from ILO, iDRAC or something else to > see local console/tty screen? I see there sanlock not stopping and GFS2 not > unmounting. I didn't see it over SSH connection. SSH connections drop. > However, on my situation; reboot reason of the machine is not my "reboot" > command. Master cluster node fences it. > Could you look at /var/log/cluster/fenced.log in your master cluster node? > Maybe your reboot of machines arises from fencing? > If it is not, what unregisters the __LIBVIRT_LOCKSPACE lockspace in your > situation? Do you have any idea about it? After taking a closer look, we also see that sanlock is not stopping, the GFS2 file system used by sanlock is not unmounting cleanly, and the system is being fenced. We've been strongly advised that the multiple sources of documentation available that recommend using GFS2 for the disk_lease_dir are wrong. It would be much better to store the leases on block devices, and since we don't know how to do that, on an NFS file system instead of GFS2. Of course the optimal / long-term solution is to move to ovirt. We'll be testing moving the disk_lease_dir to NFS, and I'll followup on whether or not that appears to resolve our issues. Thanks, Devin Hi Devin, Thanks for feedback and design recommendations. If I have 3 KVM clusters, I may think about NFS nodes (2 nodes needed for failover). You have 22 KVM nodes and I have 2 KVM nodes. In my case, extra 2 NFS nodes are unnecessary. I don't want to allocate two machines in my data center for just lock management. It will waste my two machines. Writing code and uploading RHEL repo is more simple solution. However, you have lock writing problem. This will not solve your problem. You may think also OpenStack Compute for large virtualization environments. Thanks. Created attachment 898431 [details]
proposed patch
I wrote the patch for sanlock init script. It worked on our test environment. I taught and tested it so many sides.
In design side, the patch should be written to libvirtd and libvirt-guests. However, I think there will be no side effects when sanlock init script has this code block.
Comment on attachment 898431 [details]
proposed patch
it is a zip file. you may save as the attachment.
Moving to sanlock as suggested by comment 27. Mustafa, are you still having this problem? Would a patch for this still be useful for you? Hi David, Yes. We are still having the problem. Patch will be useful for us. Thanks. Created attachment 982473 [details]
sanlock init script
Could you try the attached sanlock init script and see if it does what you need?
You'll need to set SANLOCK_SHUTDOWN_FORCE="yes" in /etc/sysconfig/sanlock.
Hi David, This script changes sanlock behaviour. If someone accidently restart sanlock service when SANLOCK_SHUTDOWN_FORCE="yes", all virtual machines will be shutdown on the node. Libvirt services are strictly related to sanlock service. If we shutdown without checking them, something go wrong on some situtations. What is wrong with my patch proposal? Actually, it manages removing left libvirt lockspace and helps sanlock to shutdown in regularly. It is not removing design error between libvirt and sanlock, however helps them to make work on this erronous design state. Any changes we make to the sanlock init script must be very small, and must have no impact on the existing behavior, so this is the best we could do (and even this minimal patch may not make it into RHEL6 at this point). Stopping the sanlock service accidentally is not a case that we can be very concerned about. It's too late in RHEL6 to be making sensitive changes like this, especially for modes of usage that we don't recommend. (If my minimal patch was useful, there's a chance that could be applied if the the sanlock package was being updated in RHEL6 for some other reason.) |
Created attachment 775244 [details] cluster node cannot be shut Hi, I'm trying to install two node Redhat 6.4 KVM cluster. I use sanlock to protect vm images. I'm using GFS2 to keep KVM images. Also I'm keeping libvirt/qemu vm configuration files and sanlock lock files in the same GFS2 filesystem. I disabled selinux. The problem is; when shutting down a cluster node, sanlock service doesn't stop. Because libvirtd lockspace is still active. Accordingly, gfs2 filesystem not umounting, because sanlock uses lockspace file. On the other node, all virtual machines killed by sanlock. (because the node cuts IO on GFS2.) Fencing is a solution, but I want to stop sanlock in a regular way. I added a screenshot, that cluster node cannot be shut. Active sanlock lockspace when stopped libvirtd: [root@kvm1 ~]# service libvirtd status libvirtd is stopped [root@kvm1 ~]# sanlock status daemon 5e71a3f4-beb6-4e87-b38e-621dceed1982.kvm1.test. p -1 helper p -1 listener p -1 status [root@kvm1 ~]# service libvirtd start Starting libvirtd daemon: [ OK ] [root@kvm1 ~]# sanlock status daemon 5e71a3f4-beb6-4e87-b38e-621dceed1982.kvm1.test. p -1 helper p -1 listener p -1 add_lockspace p -1 status s __LIBVIRT__DISKS__:1:/KVM/Sanlock/__LIBVIRT__DISKS__:0 [root@kvm1 ~]# service libvirtd stop Stopping libvirtd daemon: [ OK ] [root@kvm1 ~]# sanlock status daemon 5e71a3f4-beb6-4e87-b38e-621dceed1982.kvm1.test. p -1 helper p -1 listener p -1 add_lockspace p -1 status s __LIBVIRT__DISKS__:1:/KVM/Sanlock/__LIBVIRT__DISKS__:0