Bug 1000986 - upgrade: peer probe fails to add new machine in cluster once the ISO upgrade is performed.
Summary: upgrade: peer probe fails to add new machine in cluster once the ISO upgrade ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: glusterd
Version: 2.1
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: ---
: ---
Assignee: Kaushal
QA Contact: Rahul Hinduja
URL:
Whiteboard:
Depends On:
Blocks: 840810
TreeView+ depends on / blocked
 
Reported: 2013-08-26 09:13 UTC by Rahul Hinduja
Modified: 2013-09-23 22:25 UTC (History)
8 users (show)

Fixed In Version: glusterfs-3.4.0.25rhs-1
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-09-23 22:25:05 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Rahul Hinduja 2013-08-26 09:13:02 UTC
Description of problem:
======================

Once the systems are upgraded from RHS2.0 to RHS2.1 using ISO method(where /var/lib/glusterd is copied) the operating-version remains 1, now once we try to add the RHS2.1 machine it fails as it has operating-version 2.

[root@dj ~]# gluster peer probe 10.70.34.93
peer probe: failed: Peer 10.70.34.93 is already at a higher op-version
[root@dj ~]# 

logs:
=====

Logs from where tried to probe:
===============================

[2013-08-26 01:49:18.236043] I [socket.c:3502:socket_init] 0-management: using system polling thread
[2013-08-26 01:49:18.240441] I [glusterd-handler.c:2886:glusterd_friend_add] 0-management: connect returned 0
[2013-08-26 01:49:18.242335] E [glusterd-handshake.c:901:__glusterd_mgmt_hndsk_version_cbk] 0-management: failed to validate the operating version of peer (10.70.34.93)
[2013-08-26 01:49:18.242554] I [mem-pool.c:539:mem_pool_destroy] 0-management: size=2236 max=2 total=2
[2013-08-26 01:49:18.242643] I [mem-pool.c:539:mem_pool_destroy] 0-management: size=124 max=2 total=2
(END) 


New RHS2.1 machine logs:
========================

[2013-08-26 06:59:56.677515] E [glusterd-store.c:1407:glusterd_retrieve_uuid] 0-: Unable to get store handle!
[2013-08-26 06:59:56.677526] I [glusterd-store.c:1377:glusterd_restore_op_version] 0-management: Detected new install. Setting op-version to maximum : 2


Version-Release number of selected component (if applicable):
=============================================================

glusterfs-3.4.0.20rhs-2.el6rhs.x86_64


How reproducible:
=================
1/1


Steps to Reproduce:
===================
1. Have a RHS2.0 setup
2. Upgrade to RHS2.1 using ISO method where the /var/lib/glusterd is copied back to retain configuration as before.
3. Peer probe the new machine having RHS2.1 in cluster. 

Actual results:
===============

Peer probe fails.


Expected results:
=================

Peer probe should be successful.


Additional info:
================

There is similar upstream bug 885591 which talks about peer probe with two different version of glusterfs fails. Bug is in verified state and not been picked up for RHS.

Comment 4 Rahul Hinduja 2013-08-26 12:28:33 UTC
Since, No error reported when the system is part of cluster is different issue than the original operating-version mismatch after upgrade, raised a separate bug 1001056 to track it separately.

Comment 5 Sayan Saha 2013-08-26 19:31:59 UTC
I need the following test to be performed to assess the seriousness of this bug

1. Start with a 2 or 4 node RHSS 2.0 u4/5/6 cluster.
2. Update this using the ISO method
3. After all the servers are up, peer probe and re-create the cluster 2/4 node cluster. Now all nodes should be at 2.1.
4. Try adding new RHSS 2.1 servers to the existing cluster and then rebalance.

Let me know the results.

Comment 6 Rahul Hinduja 2013-08-27 10:51:23 UTC
(In reply to Sayan Saha from comment #5)
> I need the following test to be performed to assess the seriousness of this
> bug
> 
> 1. Start with a 2 or 4 node RHSS 2.0 u4/5/6 cluster.

Done

> 2. Update this using the ISO method

Done

> 3. After all the servers are up, peer probe and re-create the cluster 2/4
> node cluster. Now all nodes should be at 2.1.

Once all the servers are up with RHS2.1, for upgrade we copy back the configuration files /var/lib/glusterd and start the volume, this automatically recreates the cluster and volumes are up and running. We need not to probe the machines again to re-create a cluster.

If we do a peer probe before copying the configuration files, the peer probe works as it is the fresh installation of RHS2.1 machines.

But for upgrade we need to copy back /var/lib/glusterd and that's where it fails because after copying the operating-version changes to 1 

> 4. Try adding new RHSS 2.1 servers to the existing cluster and then
> rebalance.

This step can not be performed as the probe of new machine with RHS2.1 fails as the new system would have operating-version 2.

> 
> Let me know the results.

Comment 7 Amar Tumballi 2013-08-28 08:36:42 UTC
http://review.gluster.org/#/c/5450/3/doc/release-notes/3.4.0.md has been added in upstream code, which talks about the work arounds.

Should we try this and if works, go ahead for now? The code fix looks more harder at the time.

Comment 8 Amar Tumballi 2013-08-29 08:45:31 UTC
https://code.engineering.redhat.com/gerrit/12129

Comment 9 Rahul Hinduja 2013-09-02 10:16:41 UTC
Verified with the upgraded build: glusterfs-server-3.4.0.30rhs-2.el6rhs.x86_64

1. Upgraded the cluster from update5 to RHS2.1
2. Tried to probe the new machine (dj) in cluster. Probe is successful.

Before the machine (dj) was probed the operating-version was 2 but than it reduced the operating-version to 1 as the fix describes.

Before probing:
==============

[root@dj ~]# cat /var/lib/glusterd/glusterd.info 
UUID=0d06c2c6-a5dd-4179-bd83-dbdaf66233df
operating-version=2
[root@dj ~]# 

After Probing:
==============
[root@dj ~]# cat /var/lib/glusterd/glusterd.info 
UUID=0d06c2c6-a5dd-4179-bd83-dbdaf66233df
operating-version=1

[root@upgrade-1 ~]# gluster peer probe 10.70.34.90
peer probe: success. 
[root@upgrade-1 ~]# 


Moving the bug to verified state.

Comment 10 Scott Haines 2013-09-23 22:25:05 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html


Note You need to log in before you can comment on or make changes to this bug.