Bug 963082

Summary: glusterd : - rpm upgrade from 3.4.0.7rhs-1.el6rhs.x86_64 to 3.4.0.8rhs-1.el6rhs.x86_64 (without stopping glusterd ) is causing problems - peer is shown as disconnect, not able to add new peer to cluster, glusterd.info is missing
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Rachana Patel <racpatel>
Component: glusterdAssignee: Amar Tumballi <amarts>
Status: CLOSED ERRATA QA Contact: amainkar
Severity: urgent Docs Contact:
Priority: urgent    
Version: 2.1CC: rhs-bugs, sdharane, vbellur, vraman
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.4.0.8rhs-1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-09-23 22:39:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Rachana Patel 2013-05-15 06:26:29 UTC
Description of problem:
glusterd : - rpm upgrade from 3.4.0.7rhs-1.el6rhs.x86_64 to 3.4.0.8rhs-1.el6rhs.x86_64 (without stopping glusterd ) is causing problems - peer is shown as disconnect, not able to add new peer to cluster, glusterd.info is missing

Version-Release number of selected component (if applicable):
3.4.0.8rhs-1.el6rhs.x86_64

How reproducible:
always

Steps to Reproduce:
1.had a cluster of 3 server. upgrade rpm without stopping glusterd
from 3.4.0.7rhs-1.el6rhs.x86_64 to 3.4.0.8rhs-1.el6rhs.x86_64 

2.after that - 

a)one peer is always discoonected
[root@mia ~]# gluster peer status
Number of Peers: 2

Hostname: fan.lab.eng.blr.redhat.com
Uuid: 7b693192-9015-46d9-9b46-d7e5154bd9c8
State: Peer in Cluster (Connected)

Hostname: 10.70.34.80
Uuid: daa7bdb9-de87-4b6e-9f86-e8bff3d47fc0
State: Peer in Cluster (Disconnected)

[root@fan ~]# gluster peer status
Number of Peers: 2

Hostname: mia.lab.eng.blr.redhat.com
Uuid: d665808d-a42a-4eac-bf05-ca53c595486d
State: Peer in Cluster (Connected)

Hostname: 10.70.34.80
Uuid: daa7bdb9-de87-4b6e-9f86-e8bff3d47fc0
State: Peer in Cluster (Disconnected)

[root@fred ~]# gluster peer status
Number of Peers: 2

Hostname: mia.lab.eng.blr.redhat.com
Uuid: d665808d-a42a-4eac-bf05-ca53c595486d
State: Peer in Cluster (Connected)

Hostname: fan.lab.eng.blr.redhat.com
Uuid: 7b693192-9015-46d9-9b46-d7e5154bd9c8
State: Peer in Cluster (Connected)


b) not able to add new peer to cluster

[root@mia ~]# gluster peer probe cutlass.lab.eng.blr.redhat.compeer probe: failed: Failed to get handshake ack from remote server

c) glustrd.info is ,missing on 2 server

[root@cutlass ~]# ls -lh /var/lib/glusterd/
total 16K
drwxr-xr-x. 2 root root 4.0K May 14 23:16 geo-replication
drwxr-xr-x. 2 root root 4.0K May 14 00:12 groups
drwxr-xr-x. 3 root root 4.0K May 14 20:20 hooks
drwxr-xr-x. 2 root root 4.0K May 14 23:22 peers

[root@fred ~]# ls /var/lib/glusterd/
geo-replication  groups  hooks  peers  vols


d) log is filled with below msg
less /var/log/glusterfs/etc-glusterfs-glusterd.vol.log 
<snip>

  
Actual results:


Expected results:


Additional info:

Comment 1 Rachana Patel 2013-05-15 06:28:54 UTC
d) log is filled with below msg
less /var/log/glusterfs/etc-glusterfs-glusterd.vol.log 
<snip>

[2013-05-15 07:26:36.154030] E [glusterd-store.c:1690:glusterd_store_global_info] 0-management: Failed to store glusterd global-info
[2013-05-15 07:26:36.154041] E [glusterd-handshake.c:557:__glusterd_mgmt_hndsk_versions_ack] 0-management: Failed to store op-version
[2013-05-15 07:26:38.805311] I [glusterd-handshake.c:553:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 1
[2013-05-15 07:26:38.805357] E [glusterd-store.c:1648:glusterd_store_global_info] 0-management: chmod error for glusterd.info: No such
 file or directory
[2013-05-15 07:26:38.805372] E [glusterd-store.c:1690:glusterd_store_global_info] 0-management: Failed to store glusterd global-info
[2013-05-15 07:26:38.805383] E [glusterd-handshake.c:557:__glusterd_mgmt_hndsk_versions_ack] 0-management: Failed to store op-version
[2013-05-15 07:26:39.158819] I [glusterd-handshake.c:553:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 1
[2013-05-15 07:26:39.158854] E [glusterd-store.c:1648:glusterd_store_global_info] 0-management: chmod error for glusterd.info: No such
 file or directory
[2013-05-15 07:26:39.158868] E [glusterd-store.c:1690:glusterd_store_global_info] 0-management: Failed to store glusterd global-info
[2013-05-15 07:26:39.158879] E [glusterd-handshake.c:557:__glusterd_mgmt_hndsk_versions_ack] 0-management: Failed to store op-version
[2013-05-15 07:26:41.809993] I [glusterd-handshake.c:553:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 1
[2013-05-15 07:26:41.810071] E [glusterd-store.c:1648:glusterd_store_global_info] 0-management: chmod error for glusterd.info: No such
 file or directory

Comment 3 Amar Tumballi 2013-05-16 08:43:53 UTC
Below is the summary of things which were broken with 3.4.0.7rhs RPMs. with 3.4.0.8rhs all of these are fixed (and hence all updates will work fine).

Can't do anything with the already available binary of 3.4.0.7rhs build other than pointing out to the work-around, just that this will not happen from 3.4.0.8rhs-> anything else.

--------------------
Hi all,
Another small update on the steps to be taken when updating from build 7 to any newer releases.

1. Backup /var/lib/glusterd
2. Upgrade
3. Stop gluster
4. Restore /var/lib/glusterd
5. Delete the /var/lib/glusterd/options file if empty. This will be recreated by glusterd.
6. Start gluster and continue with your testing.

The /var/lib/glusterd/options file being empty causes syncing problems on glusterd restart. Build7 cleared this file. If you hadn't done any server-quorum test with build7, this file is most probably still empty.

So, if anyone is facing any volume syncing issues, do step 5 and restart glusterd.

Thanks,
Kaushal

----- Original Message -----
> From: "Kaushal M" <kaushal>
> To: storage-qa
> Sent: Wednesday, May 15, 2013 12:10:08 PM
> Subject: Re: Warning on upgrade from gluster v3.4.0.7 to v3.4.0.8
>
> A small clarification. The upgrade will not delete all the files in
> /var/lib/glusterd. Only some files/directories like glusterd.info and nfs
> directory can be deleted. This is due to a packaging bug in build 7, in
> which these files/directories were a part of the package itself.
> This may be avoided by uninstalling and installing, instead of and upgrade (I
> haven't tested this). But to be on the safer side, backup and restore the
> /var/lib/glusterd directory.
>
> - Kaushal
>
> ----- Original Message -----
>> From: "Kaushal M" <kaushal>
>> To: storage-qa
>> Sent: Wednesday, May 15, 2013 11:48:05 AM
>> Subject: Warning on upgrade from gluster v3.4.0.7 to v3.4.0.8
>>
>> Hi all,
>>
>> Because of bugs in packaging of build 7, an upgrade from build 7 tp build 8
>> will cause files /var/lib/glusterd/ to be deleted. As you can probably
>> guess
>> this will lead to all sorts of problems.
>> So, before upgrading, backup your /var/lib/glusterd directory. Follow the
>> below steps to make sure you don't break your existing setup,
>>
>> 1. Backup /var/lib/glusterd
>> 2. Upgrade
>> 3. Stop gluster
>> 4. Restore /var/lib/glusterd
>> 5. Start gluster and continue with your testing.
>>
>>
>> Regards,
>> Kaushal
>
---------------

Comment 4 Rachana Patel 2013-05-22 05:52:16 UTC
going with work around and marking this bug as verified. If it will come again for rpm upgrade from 3.4.0.8rhs-> will reopen the same.

Comment 5 Scott Haines 2013-09-23 22:39:47 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Comment 6 Scott Haines 2013-09-23 22:43:48 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html