Bug 224462
| Summary: | clurgmgrd claim "service started" but it is not | ||
|---|---|---|---|
| Product: | [Retired] Red Hat Cluster Suite | Reporter: | Roger Pena-Escobio <orkcu> |
| Component: | rgmanager | Assignee: | Lon Hohberger <lhh> |
| Status: | CLOSED NOTABUG | QA Contact: | Cluster QE <mspqa-list> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 4 | CC: | cluster-maint, tmarshal |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | i386 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2007-01-26 17:58:54 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Attachments: | |||
Quite the interesting configuration there :) Ok, to start, have a look at:
# rg_test test /etc/cluster/cluster.conf
Unique/primary not unique type clusterfs, name=WWWData
Error storing clusterfs resource
Unique/primary not unique type clusterfs, name=WWWSoft
Error storing clusterfs resource
...
When rgmanager detects collisions between attributes of a resource type which
are required to be unique across the resource type, it stops parsing that branch
of the tree (so, your references to scripts in the apache26 service are not even
present in service trees that rgmanager constructs internally - see the bottom
of the output of rg_test). The apache26 service has two resource collisions
with the apache25 service:
(1) WWWData is defined twice, with basically identical components (except fsid,
which does not affect your configuration).
You should put this one in your <resources> block and pass it by reference (like
you did with scripts).
(2) WWWSoft is defined twice with a different device, but the same mount point,
causing a naming & mount point collision.
You need to rename one to something else to resolve the naming collision.
The mount point is also the same, and that must be unique. However, you can
make it not required to be unique tweaking the metadata in
/usr/share/cluster/clusterfs.sh:
* set "unique" to "0" for the "mountpoint" parameter.
* restart rgmanager on both nodes
Most users should *not* do this, but in your case, it looks safe to do (since
the two services will never coexist on the same node due to restricted failover
domains).
Warning: do not change the primary attribute ("name", in most cases), or you
will probably break stuff.
Anyway, if you change the 'unique' flag to the 'mountpoint' parameter to 0 in
/usr/share/cluster/clusterfs.sh, and restart rgmanager, the following
configuration should work:
<rm>
<failoverdomains>
<failoverdomain name="mysql" ordered="0" restricted="1">
<failoverdomainnode name="blade21"
priority="1"/>
<failoverdomainnode name="blade22"
priority="1"/>
</failoverdomain>
<failoverdomain name="apache25" ordered="0"
restricted="1">
<failoverdomainnode name="blade25"
priority="1"/>
</failoverdomain>
<failoverdomain name="apache26" ordered="0"
restricted="1">
<failoverdomainnode name="blade26"
priority="1"/>
</failoverdomain>
<failoverdomain name="ftp" ordered="0" restricted="1">
<failoverdomainnode name="blade25"
priority="1"/>
<failoverdomainnode name="blade26"
priority="1"/>
</failoverdomain>
</failoverdomains>
<resources>
<script file="/etc/init.d/httpd" name="apache start-stop"/>
<script file="/etc/init.d/vsftpd" name="vsftpd"/>
<clusterfs device="/dev/emcpowerd1" force_unmount="0"
fsid="41107" fstype="gfs" mountpoint="/opt/www" name="WWWData" options=""/>
</clusterfs>
</resources>
<service autostart="1" domain="mysql" name="mysqld"
recovery="restart">
<fs device="/dev/mapper/MysqlData-VarLibMysql"
force_fsck="0" force_unmount="1" fsid="30618" fstype="ext3"
mountpoint="/var/lib/mysql" name="MysqlData" options="" self_fence="1"/>
<ip address="172.17.0.123" monitor_link="1"/>
<script file="/etc/init.d/mysqld" name="mysql start-stop"/>
</service>
<service autostart="1" domain="apache25" name="apache25">
<clusterfs ref="WWWData"/>
<clusterfs device="/dev/emcpowera1" force_unmount="0"
fsid="30342" fstype="gfs" mountpoint="/opt/soft" name="WWWSoft1" options=""/>
<script ref="vsftpd"/>
<script ref="apache start-stop"/>
</service>
<service autostart="1" domain="apache26" name="apache26">
<clusterfs ref="WWWData"/>
<clusterfs device="/dev/emcpowerb1" force_unmount="0"
fsid="30343" fstype="gfs" mountpoint="/opt/soft" name="WWWSoft2" options=""/>
<script ref="vsftpd"/>
<script ref="apache start-stop"/>
</service>
</rm>
Now, if you don't change /usr/share/cluster/clusterfs.sh, you'll have to change
the mount point and make the scripts for apache context-sensitive. You can do
this by checking "OCF_RESKEY_service_name" and starting apache with a different
config based on that from the script if you use the above configuration; i.e.
(untested example, the idea is that it starts httpd based on the service it's
part of, and uses /etc/httpd/conf/httpd-<service_name>.conf).
--- /etc/init.d/httpd.old 2007-01-26 12:08:59.000000000 -0500
+++ /etc/init.d/httpd 2007-01-26 12:10:33.000000000 -0500
@@ -57,6 +57,9 @@
# when not running is also a failure. So we just do it the way init scripts
# are expected to behave here.
start() {
+ if [ "$OCF_RESKEY_service_name" ]; then
+ OPTIONS="$OPTIONS -f
/etc/httpd/conf/httpd-${OCF_RESKEY_service_name}.conf"
+ fi
echo -n $"Starting $prog: "
check13 || exit 1
LANG=$HTTPD_LANG daemon $httpd $OPTIONS
If you choose to do it this way, WWWSoft1 and WWWSoft2 in the above example
configuration will need different mount points (/opt/soft1 and /opt/soft2, for
example), and /etc/httpd/conf/httpd-apache25.conf and httpd-apache26.conf will
need whatever is pointing at /opt/soft set accordingly.
While you get things up and running, I will investigate the possibility of
allowing non-primary (but unique) namespace collisions across disjoint
restricted failover domains. This will not be solved overnight, mind you (and
may fall into the realm of the dependency code we're working on).
Generally, you should always design your services as though they can coexist - unless there is a device disconnect between the nodes (e.g. for example, /dev/emcpowera1 is not connected to blade25 and /dev/emcpowerb1 is not connected to blade26). Oh, the above configuration has an extraneous "</clusterfs>" thing in the <resources> section. Remove it before use ;) Created attachment 146689 [details]
Original configuration
Created attachment 146690 [details]
rg_test output of original configuration
Created attachment 146691 [details]
altered configuration
Created attachment 146692 [details]
rg_test output of new configuration, clusterfs.sh not modified yet
Created attachment 146693 [details]
rg_test output of new configuration, clusterfs.sh modified to set mountpoint unique="0"
Created attachment 146694 [details]
clusterfs.sh with unique for mountpoint set to 0
[Note: from RHEL5 branch, but should work on RHEL4]
I've filed a separate bugzilla feature request to allow reuse of "unique" attributes if the services will never collide, as well as add syslog-logging (rather than just "printf") for when resource collisions occur: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=224608 The current behavior concerning resource collisions is not a bug, but may be possible to expand the behavior as described previously (and in the above noted bugzilla). Additionally, the collisions might be something we can check for in the GUIs (system-config-cluster and Conga) - so that this does not quietly hit other users. |
Description of problem: If I have two identical services but with different failover domains, one of the services start but the other one claim that it start but it doesn't do anything Version-Release number of selected component (if applicable): rgmanager-1.9.54-1 Steps to Reproduce: just try this cluster.conf <rm> <failoverdomains> <failoverdomain name="mysql" ordered="0" restricted="1"> <failoverdomainnode name="blade21" priority="1"/> <failoverdomainnode name="blade22" priority="1"/> </failoverdomain> <failoverdomain name="apache25" ordered="0" restricted="1"> <failoverdomainnode name="blade25" priority="1"/> </failoverdomain> <failoverdomain name="apache26" ordered="0" restricted="1"> <failoverdomainnode name="blade26" priority="1"/> </failoverdomain> <failoverdomain name="ftp" ordered="0" restricted="1"> <failoverdomainnode name="blade25" priority="1"/> <failoverdomainnode name="blade26" priority="1"/> </failoverdomain> </failoverdomains> <resources> <script file="/etc/init.d/httpd" name="apache start- stop"/> <script file="/etc/init.d/vsftpd" name="vsftpd"/> </resources> <service autostart="1" domain="mysql" name="mysqld" recovery="restart"> <script file="/etc/init.d/mysqld" name="mysql start- stop"> <ip address="172.17.0.123" monitor_link="1"/> </script> <fs device="/dev/mapper/MysqlData-VarLibMysql" force_fsck="0" force_unmount="1" fsid="30618" fstype="ext3" mountpoint="/var/lib/mysql" name="MysqlData" options="" self_fence="1"/> </service> <service autostart="1" domain="apache25" name="apache25"> <clusterfs device="/dev/emcpowerd1" force_unmount="0" fsid="41106" fstype="gfs" mountpoint="/opt/www" name="WWWData" options=""> <script ref="vsftpd"/> </clusterfs> <clusterfs device="/dev/emcpowera1" force_unmount="0" fsid="30342" fstype="gfs" mountpoint="/opt/soft" name="WWWSoft" options=""> <script ref="apache start-stop"/> </clusterfs> </service> <service autostart="1" domain="apache26" name="apache26"> <clusterfs device="/dev/emcpowerd1" force_unmount="0" fsid="41107" fstype="gfs" mountpoint="/opt/www" name="WWWData" options=""> <script ref="vsftpd"/> </clusterfs> <clusterfs device="/dev/emcpowerb1" force_unmount="0" fsid="30343" fstype="gfs" mountpoint="/opt/soft" name="WWWSoft" options=""> <script ref="apache start-stop"/> </clusterfs> </service> </rm> Actual results: Jan 25 14:23:57 blade26 clurgmgrd[3494]: <notice> Starting disabled service apache26 Jan 25 14:23:57 blade26 clurgmgrd[3494]: <notice> Service apache26 started Expected results: Jan 25 15:32:31 blade25 clurgmgrd[3990]: <notice> Starting disabled service apache25 Jan 25 15:32:31 blade25 clurgmgrd: [3990]: <info> Executing /etc/init.d/vsftpd start Jan 25 15:32:31 blade25 vsftpd: vsftpd vsftpd succeeded Jan 25 15:32:31 blade25 clurgmgrd: [3990]: <info> Executing /etc/init.d/httpd start Jan 25 15:32:31 blade25 httpd: httpd startup succeeded Jan 25 15:32:31 blade25 clurgmgrd[3990]: <notice> Service apache25 started Jan 25 15:32:40 blade25 clurgmgrd: [3990]: <info> Executing /etc/init.d/vsftpd status Jan 25 15:32:40 blade25 clurgmgrd: [3990]: <info> Executing /etc/init.d/httpd status Additional info: I try to configure another service with no similar name but with same resources, the same happen. as I said, the cluster claim everything is ok but it isn't [root@blade26 ~]# clustat Member Status: Quorate Member Name Status ------ ---- ------ blade25 Online, rgmanager blade26 Online, Local, rgmanager blade21 Online, rgmanager blade22 Online, rgmanager Service Name Owner (Last) State ------- ---- ----- ------ ----- mysqld blade21 started apache25 blade25 started apache26 blade26 started [root@blade26 ~]# ps ax | grep http 16087 pts/0 S+ 0:00 grep http [root@blade26 ~]#