Red Hat Bugzilla – Bug 462609
in HALVM (Multi-LV) RHCS 5.2 configuration, only 1 service -relying on a given LVM/FS resources- can be used
Last modified: 2009-04-16 18:38:40 EDT
Description of problem:
in a HALVM configuration (multi-LV), on a 2-node cluster, it is not possible to have 2 (or more) Services using the same LVM/FS resources.
The 1st defined Service will start/relocate/stop correctly.
The 2nd/subsequent defined Service will start/relocate/stop properly, except the FS never gets mounted.
No error in /var/log/messages, status is green.
Version-Release number of selected component (if applicable):
5.2 (updated as of 09/17/08)
Reproduced on-site on real cluster config (2 x rhel5.2 nodes)
Reproduced here in a lab in a xen-VM virtual cluster (2 x rhel 5.2 guests)
Steps to Reproduce:
1.configure a HALVM Multi LV cluster
a) lvm.conf, initrd (newer than lvm.conf) correctly configured
b) LVM resource defined as follows : (no lv_name)
<lvm name="vgiscsi2" vg_name="vgiscsi2"/>
c) FS resource defined as follows
<fs device="/dev/vgiscsi2/lviscsi2" force_fsck="0" force_unmount="0" fsid="26297" fstype="ext3" mountpoint="/data_halvm" name="data_halvm" self_fence="0"/>
d) 2 identical Services defined as follows
<service autostart="0" domain="domain1" exclusive="0" name="HALVM2_fs2" recovery="restart">
<service autostart="0" domain="domain1" exclusive="0" name="HALVM_fs" recovery="restart">
Suppose HALVM2_fs2 has been defined BEFORE HALVM_fs
2. Start HALVM2_fs2 service
- The service starts and the filesystem (/data_halvm) gets mounted
- /var/log/messages displays
Sep 17 16:13:06 guest127 clurgmgrd: <notice> Starting disabled service service:HALVM2_fs2
Sep 17 16:13:07 guest127 kernel: kjournald starting. Commit interval 5 seconds
Sep 17 16:13:07 guest127 kernel: EXT3 FS on dm-0, internal journal
Sep 17 16:13:07 guest127 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Sep 17 16:13:07 guest127 clurgmgrd: <notice> Service service:HALVM2_fs2 started
Sep 17 16:13:10 guest127 clurgmgrd: : <notice> Getting status
3. Stop HALVM2_fs2 service
Sep 17 16:13:24 guest127 clurgmgrd: <notice> Stopping service service:HALVM2_fs2
Sep 17 16:13:25 guest127 clurgmgrd: <notice> Service service:HALVM2_fs2 is disabled
4. Start HALVM_fs service
- Commands succeeds, status of HALVM_fs Service in Luci becomes green.
- service starts but filesystem does NOT get mounted
Sep 17 16:13:42 guest127 clurgmgrd: <notice> Starting disabled service service:HALVM_fs
Sep 17 16:13:43 guest127 clurgmgrd: <notice> Service service:HALVM_fs started
==> there is indeed no mention of "kernel: EXT3-fs" message
5. Stop HALVM_fs Service and Start HALVM2_fs2
HALVM_fs stops correctly
HALVM2_fs2 starts correctly and filesystem is mounted.
This Service will always be the functional one (unless deleted)
6. Stop HALVM2_fs2 Service
Delete HALVM2_fs2 Service
Start HALVM_fs Service : it starts, and filesystem gets mounted.
only 1 Service can use a given LVM/FS combination and be functional.
If multiple services are defined, only the 1st one will function as intended.
If 1st service gets deleted, then the remaining will become functional.
It should be possible to have N defined Services using the same LVM/FS combination, and usable ONE at a time when needed.
This is by-design, actually, but we'll call it a bug. The second reference to the fs resource is dropped at configuration-time. If you look at the output of "rg_test test /etc/cluster/cluster.conf", it will say something like:
Warning: Max references exceeded for resource data_halvm (type fs)
This is because the states of individual resources are not stored in shared state, only the status of the service as a whole is.
This is something we can fix with pacemaker in a future release of RHEL (which stores the states of everything), but not without a fair bit of work within rgmanager. We would either need to:
* distribute states of all resource instances cluster-wide (and add reference counts to some unused portion of the state structure), or
* check for conflicts at run (e.g. start) time - i.e. if there's a service running with a reference on a resource that we also reference, then we fail to start.
If you have a really good use case where this behavior is specifically required, please add it, otherwise this will be a NOTABUG.
You can always use central_processing and define three services:
* one LVM
* one service-A
* one service-B
... set a dependency on service-B and service-A on LVM, then add special policies (e.g. write some of your own) to ensure service-A and service-B exist only on the same node as LVM, if running, but can never run together. Be creative :>
Corner cases like this is why we added central_processing.
We can work on adding an example to the event scripting interface to achieve this desired behavior using 3 services.