Bug 707118

Summary: rgmanager rebase
Product: Red Hat Enterprise Linux 6 Reporter: Fabio Massimo Di Nitto <fdinitto>
Component: rgmanagerAssignee: Fabio Massimo Di Nitto <fdinitto>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.2CC: cluster-maint, jkortus, ssaha
Target Milestone: rcKeywords: Rebase
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: Rebase
Fixed In Version: rgmanager-3.0.12.1-1.el6 Doc Type: Rebase: Bug Fixes and Enhancements
Doc Text:
This update adds a couple of minor features and fixes a few bugs, and will make future updates easier.
Story Points: ---
Clone Of:
: 917776 (view as bug list) Environment:
Last Closed: 2011-12-06 06:59:42 EST Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Bug Depends On:    
Bug Blocks: 917776    

Description Fabio Massimo Di Nitto 2011-05-24 02:54:16 EDT
Initial evaluation shows that we can rebase safely based on a new tarball
released from the RHEL6 branch.

Pros:

1) import a few minor fixes from upstream
2) kill patch queue from distcvs
3) simplify build / spec file

Verification process:

from distcvs RHEL6 branch in current release:

sed -i -e 's#\-b .*##g' rgmanager.spec
make prep

this would apply all patches to current 3.0.12 tree without extensions
(makes it easier to parse the diff)

replace 3.0.12 tarball with 3.0.12.1 tarball created from current git
RHEL6 branch.

diff -Naurd rgmanager-3.0.12 rgmanager-3.0.12.1 | filterdiff -x '*.orig'
| lsdiff

fixes imported from upstream:

rgmanager: fix compiler warning in clulog.c
rgmanager: Pause during exit if we stopped services (rhbz#619468)
rgmanager: Add resource-defaults section
rgmanager: Fix reference count handling (rhbz#692771)
rgmanager: Fix clustat help & version operations
rgmanager: Make clustat -f not query CCS/objdb

missing *.sl scripts (upstream is in, needs backport to RHEL6)
https://bugzilla.redhat.com/show_bug.cgi?id=693517 (POST)

TODO: update RHEL6 branch release script to drop cluster_conf.html from
rgmanager tarball.
Comment 2 Fabio Massimo Di Nitto 2011-05-26 06:50:11 EDT
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
This update adds a couple of minor features and fixes a few bug, and will make future updates easier.
Comment 3 Fabio Massimo Di Nitto 2011-05-26 06:50:21 EDT
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1 +1 @@
-This update adds a couple of minor features and fixes a few bug, and will make future updates easier.+This update adds a couple of minor features and fixes a few bugs, and will make future updates easier.
Comment 4 Fabio Massimo Di Nitto 2011-06-21 13:59:08 EDT
Unit test results to be added in BZ:

> rgmanager: fix compiler warning in clulog.c
> rgmanager: Pause during exit if we stopped services (rhbz#619468)
> rgmanager: Add resource-defaults section
> rgmanager: Fix reference count handling (rhbz#692771)
> rgmanager: Fix clustat help & version operations
> rgmanager: Make clustat -f not query CCS/objdb
> TODO: update RHEL6 branch release script to drop cluster_conf.html from
> rgmanager tarball.
Comment 5 Fabio Massimo Di Nitto 2011-06-21 14:07:24 EDT
(In reply to comment #4)
> Unit test results to be added in BZ:
> 
> > rgmanager: fix compiler warning in clulog.c

Jun 21 19:59:12 rgmanager [ip] Link detected on eth0
Jun 21 19:59:32 rgmanager [ip] Checking 192.168.0.128, Level 0
Jun 21 19:59:32 rgmanager [ip] 192.168.0.128 present on eth0
Jun 21 19:59:32 rgmanager [ip] Link for eth0: Detected
Jun 21 19:59:32 rgmanager [ip] Link detected on eth0

clulog is still logging.

> > rgmanager: Fix clustat help & version operations

[root@rhel6-node1 cluster]# clustat -h
usage: clustat <options>
    -i <interval>      Refresh every <interval> seconds.  May not be used
                       with -x.
    -I                 Display local node ID and exit
    -m <member>        Display status of <member> and exit
    -s <service>       Display status of <service> and exit
    -v                 Display version and exit
    -x                 Dump information as XML
    -Q                 Return 0 if quorate, 1 if not (no output)
    -f                 Enable fast clustat reports
    -l                 Use long format for services

[root@rhel6-node1 cluster]# clustat -v
clustat version 3.0.12.1

> > rgmanager: Make clustat -f not query CCS/objdb

$ strace -o with_ccs clustat
$ grep corosync with_ccs
$ connect(5, {sa_family=AF_FILE, path=@"corosync.ipc"}, 110) = 0
(followed by IPC connections generated by libccs usage to access objdb in corosync)

$ strace -o without_ccs clustat -f
$ grep corosync with_ccs
(empty == no connections to/from libccs)

> > TODO: update RHEL6 branch release script to drop cluster_conf.html from
> > rgmanager tarball.

[fabbione@daikengo rpms]$ rpm -q -p rgmanager-3.0.12.1-1.el6.x86_64.rpm |grep cluster_conf
[fabbione@daikengo rpms]$
Comment 6 Fabio Massimo Di Nitto 2011-06-21 14:08:52 EDT
Lon, we need unit test results from your side for the following changes:

> rgmanager: Pause during exit if we stopped services (rhbz#619468)
> rgmanager: Add resource-defaults section
> rgmanager: Fix reference count handling (rhbz#692771)
Comment 7 Lon Hohberger 2011-06-21 14:19:32 EDT
Test 1: Upgrade test w/o resource-agents should fail:

[root@snap ~]# rpm -Uvh rgmanager-3.0.12.1-1.el6.x86_64.rpm
error: Failed dependencies:
        resource-agents >= 3.9.1-1 is needed by rgmanager-3.0.12.1-1.el6.x86_64
        resource-agents is needed by (installed) pacemaker-1.1.5-5.el6.x86_64

OK

Test 2: Upgrade test w/ resource-agents should pass:

[root@snap ~]# rpm -Uvh rgmanager-3.0.12.1-1.el6.x86_64.rpm resource-agents-3.9.1-1.el6.x86_64.rpm
Preparing...                ########################################### [100%]
   1:resource-agents        ########################################### [ 50%]
   2:rgmanager              ########################################### [100%]
[root@snap ~]# echo $?
0

Test 3: verify installation

[root@snap ~]# rpm -qV rgmanager
[root@snap ~]# echo $?
0
Comment 8 Fabio Massimo Di Nitto 2011-06-21 14:37:10 EDT
(In reply to comment #7)
> Test 1: Upgrade test w/o resource-agents should fail:

> Test 2: Upgrade test w/ resource-agents should pass:

> Test 3: verify installation

those test results are in the respective BZs #693517 and #693518
Comment 9 Lon Hohberger 2011-06-21 14:46:34 EDT
Test 4: resource-defaults test

Using this configuration:

<?xml version="1.0"?>
<cluster config_version="1" name="lhh-rd-test">
        ...
        <rm>
                <vm name="foo" autostart="0" />
                <vm name="bar" autostart="0" migrate="pause" />
        </rm>
</cluster>

The following output appears in /var/lib/cluster/rgmanager-dump when you create a dump of rgmanager using kill -USR1 `pidof -s rgmanager`:

[snip]
=== Resource Tree ===
vm (S0) {
  name = "foo";
  autostart = "1";
  exclusive = "0";
  use_virsh = "";
  migrate = "live";
  snapshot = "";
  depend_mode = "hard";
  max_restarts = "0";
  restart_expire_time = "0";
  status_program = "";
  hypervisor = "auto";
  hypervisor_uri = "auto";
  migration_uri = "auto";
}
vm (S0) {
  name = "bar";
  autostart = "1";
  exclusive = "0";
  use_virsh = "";
  migrate = "live";
  snapshot = "";
  depend_mode = "hard";
  max_restarts = "0";
  restart_expire_time = "0";
  status_program = "";
  hypervisor = "auto";
  hypervisor_uri = "auto";
  migration_uri = "auto";
}

* Add: a resource-default for 'migrate', set it to 'pause'
  * Expected results
    * The 'migrate' attribute of 'foo' should change to 'pause'
    * The 'migrate' attribute of 'bar' should remain as 'live'
* Add: a dummy migration_mapping parameter, set it to 'node1:node1-p,node2:node2-p'
    * The 'migration_mapping' attribute of both instances should change
      to the value specified

The resulting cluster.conf looks like:

<?xml version="1.0"?>
<cluster config_version="2" name="lhh-rd-test">
        ...
        <rm>
                <resource-defaults>
                        <vm migrate="pause" migration_mapping="node1:node1-p,node2:node2-p" />
                </resource-defaults>

                <vm name="foo" autostart="0" />
                <vm name="bar" autostart="0" migrate="pause" />
        </rm>
</cluster>

Killing rgmanager with -USR1 again produces the following in /var/lib/cluster/rgmanager-dump for the resource-tree:

=== Resource Tree ===
vm (S2) {
  name = "foo";
  autostart = "0";
  exclusive = "0";
  migration_mapping = "node1:node1-p,node2:node2-p";
  use_virsh = "";
  migrate = "pause";
  snapshot = "";
  depend_mode = "hard";
  max_restarts = "0";
  restart_expire_time = "0";
  status_program = "";
  hypervisor = "auto";
  hypervisor_uri = "auto";
  migration_uri = "auto";
}
vm (S2) {
  name = "bar";
  autostart = "0";
  exclusive = "0";
  migration_mapping = "node1:node1-p,node2:node2-p";
  use_virsh = "";
  migrate = "live";
  snapshot = "";
  depend_mode = "hard";
  max_restarts = "0";
  restart_expire_time = "0";
  status_program = "";
  hypervisor = "auto";
  hypervisor_uri = "auto";
  migration_uri = "auto";
}

Result: PASS for rgmanager, however:
  1) rg_test is missing support for resource-defaults.  I have noted this
     as bug 715067
  2) the cluster schema I had installed did not have resource-defaults 
     support; this might be user error
Comment 10 Lon Hohberger 2011-06-21 14:52:32 EDT
Upon further investigation of but 715067, it turned out to be user error.
Comment 11 Lon Hohberger 2011-06-21 15:41:10 EDT
Test 5: reference counts for clustered file systems

        <resources>
                <clusterfs name="gfs2" fstype="gfs2" device="/dev/vdb2" mountpoint="/mnt/gfs2" force_unmount="1" />
        </resources>
        <service name="1" autostart="0">
                <clusterfs ref="gfs2"/>
        </service>
        <service name="2" autostart="0">
                <clusterfs ref="gfs2"/>
        </service>
        <service name="3" autostart="0">
                <clusterfs ref="gfs2"/>
        </service>

Test 5.1: one service, one node online
  * enable service 1: OK, file system was mounted after enable
  * disable service 2: OK, file system was not mounted after disable

Test 5.2: two services, one node online
  * enable service 1: OK, file system mounted
  * enable service 2: OK, file system still mounted
  * disable service 1: OK, file system still mounted
  * disable service 1 (3 more times): OK, file system still mounted
  * enable service 1: OK, file system still mounted
  * disable service 2: OK, file system still mounted
  * disable service 2 (3 more times): OK, file system still mounted
  * disable service 1: OK, file system not mounted any more :)

Test 5.3: three services, two nodes online
  * enable 1 on node1: OK, file system mounted on 1 (only)
  * enable 2 on node2: OK, file system mounted on both hosts
  * enable 3 on node1: OK, file system mounted on both
  * relocate 3 to ndoe2: OK, file system mounted on both
  * relocate 2 to node1: OK, file system mounted on both
  * disable 3: OK, file system mounted only on node 1
  * disable 1: OK, file system still mounted on node 1
  * disable 1 (3x): OK, file system still mounted on node 1
  * disable 2: OK, file system not mounted

PASS
Comment 12 Lon Hohberger 2011-06-21 15:45:11 EDT
Test 6:

* Using config from test 5...
* Start a service and run 'service rgmanager stop'

Jun 21 15:42:37 rgmanager Shutting down
Jun 21 15:42:37 rgmanager Stopping service service:1
Jun 21 15:42:37 rgmanager [clusterfs] unmounting /mnt/gfs2
Jun 21 15:42:37 rgmanager Service service:1 is stopped
Jun 21 15:42:38 rgmanager Disconnecting from CMAN
Jun 21 15:42:53 rgmanager Exiting

Note the 15 second delay between 'disconnecting' and 'exiting' messages; this is the delay.

* Restart rgmanager
* Ensure no services are running
* Stop rgmanager using 'service rgmanager stop

Jun 21 15:44:25 rgmanager DBus Released
Jun 21 15:44:25 rgmanager Stopped 0 services
Jun 21 15:44:25 rgmanager Disconnecting from CMAN
Jun 21 15:44:25 rgmanager Exiting

With no services running, there was no delay between 'disconnecting' and 'exiting' messages.

PASS
Comment 13 Lon Hohberger 2011-06-21 15:46:40 EDT
(In reply to comment #6)
> Lon, we need unit test results from your side for the following changes:
> 
> > rgmanager: Pause during exit if we stopped services (rhbz#619468)

Test 5: https://bugzilla.redhat.com/show_bug.cgi?id=707118#c11

> > rgmanager: Add resource-defaults section

Test 4: https://bugzilla.redhat.com/show_bug.cgi?id=707118#c9

> > rgmanager: Fix reference count handling (rhbz#692771)

Test 6: https://bugzilla.redhat.com/show_bug.cgi?id=707118#c12

All PASS
Comment 16 Lon Hohberger 2011-11-07 11:37:43 EST
(In reply to comment #9)
> Test 4: resource-defaults test

Resource-defaults is not intended to be supported with this rebase/release, neither the cluster package's schema nor the administration tools support it.
Comment 18 errata-xmlrpc 2011-12-06 06:59:42 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1595.html