462609 – in HALVM (Multi-LV) RHCS 5.2 configuration, only 1 service -relying on a given LVM/FS resources- can be used

Bug 462609 - in HALVM (Multi-LV) RHCS 5.2 configuration, only 1 service -relying on a given LVM/FS resources- can be used

Summary: in HALVM (Multi-LV) RHCS 5.2 configuration, only 1 service -relying on a give...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	rgmanager
Sub Component:
Version:	5.2
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Lon Hohberger
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2008-09-17 14:28 UTC by Herve Lemaitre
Modified:	2009-04-16 22:38 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2008-12-03 18:08:05 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Herve Lemaitre 2008-09-17 14:28:34 UTC

Description of problem:
in a HALVM configuration (multi-LV), on a 2-node cluster, it is not possible to have 2 (or more) Services using the same LVM/FS resources.
The 1st defined Service will start/relocate/stop correctly.
The 2nd/subsequent defined Service will start/relocate/stop properly, except the FS never gets mounted.
No error in /var/log/messages, status is green.

Version-Release number of selected component (if applicable):
5.2 (updated as of 09/17/08)

How reproducible:
Always
Reproduced on-site on real cluster config (2 x rhel5.2 nodes)
Reproduced here in a lab in a xen-VM virtual cluster (2 x rhel 5.2 guests)

Steps to Reproduce:
1.configure a HALVM Multi LV cluster
  a) lvm.conf, initrd (newer than lvm.conf) correctly configured
  b) LVM resource defined as follows : (no lv_name)
    <lvm name="vgiscsi2" vg_name="vgiscsi2"/>
  c) FS resource defined as follows
    <fs device="/dev/vgiscsi2/lviscsi2" force_fsck="0" force_unmount="0" fsid="26297" fstype="ext3" mountpoint="/data_halvm" name="data_halvm" self_fence="0"/>
  d) 2 identical Services defined as follows 
     <service autostart="0" domain="domain1" exclusive="0" name="HALVM2_fs2" recovery="restart">
                        <lvm ref="vgiscsi2">
                                <fs ref="data_halvm"/>
                        </lvm>
                </service>
      <service autostart="0" domain="domain1" exclusive="0" name="HALVM_fs" recovery="restart">
                        <lvm ref="vgiscsi2">
                                <fs ref="data_halvm"/>
                        </lvm>
                </service>

Suppose HALVM2_fs2 has been defined BEFORE HALVM_fs

2. Start HALVM2_fs2 service
  - The service starts and the filesystem (/data_halvm) gets mounted 
  - /var/log/messages displays
Sep 17 16:13:06 guest127 clurgmgrd[2065]: <notice> Starting disabled service service:HALVM2_fs2 
Sep 17 16:13:07 guest127 kernel: kjournald starting.  Commit interval 5 seconds
Sep 17 16:13:07 guest127 kernel: EXT3 FS on dm-0, internal journal
Sep 17 16:13:07 guest127 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Sep 17 16:13:07 guest127 clurgmgrd[2065]: <notice> Service service:HALVM2_fs2 started 
Sep 17 16:13:10 guest127 clurgmgrd: [2065]: <notice> Getting status 


3. Stop HALVM2_fs2 service
  /var/log/messages displays
Sep 17 16:13:24 guest127 clurgmgrd[2065]: <notice> Stopping service service:HALVM2_fs2 
Sep 17 16:13:25 guest127 clurgmgrd[2065]: <notice> Service service:HALVM2_fs2 is disabled 

4. Start HALVM_fs service
  - Commands succeeds, status of HALVM_fs Service in Luci becomes green.
  - service starts but filesystem does NOT get mounted
  /var/log/messages displays
Sep 17 16:13:42 guest127 clurgmgrd[2065]: <notice> Starting disabled service service:HALVM_fs 
Sep 17 16:13:43 guest127 clurgmgrd[2065]: <notice> Service service:HALVM_fs started 

==> there is indeed no mention of "kernel: EXT3-fs" message
  
5. Stop HALVM_fs Service and Start HALVM2_fs2 
  HALVM_fs stops correctly
  HALVM2_fs2 starts correctly and filesystem is mounted.
  This Service will always be the functional one (unless deleted)

6. Stop HALVM2_fs2 Service 
   Delete HALVM2_fs2 Service
   Start HALVM_fs Service : it starts, and filesystem gets mounted.


Actual results:
only 1 Service can use a given LVM/FS combination and be functional.
If multiple services are defined, only the 1st one will function as intended.
If 1st service gets deleted, then the remaining will become functional.

Expected results:
It should be possible to have N defined Services using the same LVM/FS combination, and usable ONE at a time when needed.

Additional info:

Comment 1 Lon Hohberger 2008-12-03 18:08:05 UTC

This is by-design, actually, but we'll call it a bug. The second reference to the fs resource is dropped at configuration-time. If you look at the output of "rg_test test /etc/cluster/cluster.conf", it will say something like:

Warning: Max references exceeded for resource data_halvm (type fs)

This is because the states of individual resources are not stored in shared state, only the status of the service as a whole is.

This is something we can fix with pacemaker in a future release of RHEL (which stores the states of everything), but not without a fair bit of work within rgmanager. We would either need to:

* distribute states of all resource instances cluster-wide (and add reference counts to some unused portion of the state structure), or
* check for conflicts at run (e.g. start) time - i.e. if there's a service running with a reference on a resource that we also reference, then we fail to start.

If you have a really good use case where this behavior is specifically required, please add it, otherwise this will be a NOTABUG.

You can always use central_processing and define three services:

* one LVM
* one service-A
* one service-B

... set a dependency on service-B and service-A on LVM, then add special policies (e.g. write some of your own) to ensure service-A and service-B exist only on the same node as LVM, if running, but can never run together. Be creative :>

Corner cases like this is why we added central_processing.

See:

http://sources.redhat.com/cluster/wiki/EventScripting

We can work on adding an example to the event scripting interface to achieve this desired behavior using 3 services.

Note You need to log in before you can comment on or make changes to this bug.