Bug 1031613 - [RHS-C] Adding 2nd RHS node failed with error "Failed to fetch gluster peer list from server Server1 on Cluster"
Summary: [RHS-C] Adding 2nd RHS node failed with error "Failed to fetch gluster peer l...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: rhsc
Version: 2.1
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: ---
: RHGS 2.1.2
Assignee: Bala.FA
QA Contact: Prasanth
URL:
Whiteboard:
Depends On: 972619
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-11-18 11:48 UTC by Prasanth
Modified: 2015-05-15 18:19 UTC (History)
8 users (show)

Fixed In Version: rhsc-2.1.2-0.25.master.el6_5
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-02-25 08:04:17 UTC
Target Upstream Version:


Attachments (Terms of Use)
screensot of error (140.70 KB, image/png)
2013-11-18 11:48 UTC, Prasanth
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2014:0208 0 normal SHIPPED_LIVE Red Hat Storage 2.1 enhancement and bug fix update #2 2014-02-25 12:20:30 UTC

Description Prasanth 2013-11-18 11:48:56 UTC
Created attachment 825562 [details]
screensot of error

Description of problem:

Adding the 2nd RHS node failed and the Host went to non-operational.

Version-Release number of selected component (if applicable): cb8 build

---------------------
[root@vm10 ~]# rpm -qa |grep vdsm
vdsm-gluster-4.13.0-21.el6rhs.noarch
vdsm-cli-4.13.0-21.el6rhs.noarch
vdsm-python-4.13.0-21.el6rhs.x86_64
vdsm-python-cpopen-4.13.0-21.el6rhs.x86_64
vdsm-4.13.0-21.el6rhs.x86_64
vdsm-reg-4.13.0-21.el6rhs.noarch
vdsm-xmlrpc-4.13.0-21.el6rhs.noarch

[root@vm10 ~]# rpm -qa |grep glusterfs
glusterfs-server-3.4.0.43.1u2rhs-1.el6rhs.x86_64
glusterfs-libs-3.4.0.43.1u2rhs-1.el6rhs.x86_64
glusterfs-3.4.0.43.1u2rhs-1.el6rhs.x86_64
glusterfs-fuse-3.4.0.43.1u2rhs-1.el6rhs.x86_64
glusterfs-geo-replication-3.4.0.43.1u2rhs-1.el6rhs.x86_64
samba-glusterfs-3.6.9-160.7.el6rhs.x86_64
glusterfs-rdma-3.4.0.43.1u2rhs-1.el6rhs.x86_64
glusterfs-api-3.4.0.43.1u2rhs-1.el6rhs.x86_64
---------------------


How reproducible: Seen once in cb8 and a couple of times in older builds as well


Steps to Reproduce:

PS: Kindly note that both my RHS nodes were Freshly installed with http://download.lab.bos.redhat.com/composes/nightly/RHSS-2.1u2-20131117.n.0/2/RHS/x86_64/iso/

1. Install the latest cb8 build as per http://rhsm.pad.engineering.redhat.com/rhsc-build-cb8
2. Install 2 RHS nodes with http://download.lab.bos.redhat.com/composes/nightly/RHSS-2.1u2-20131117.n.0/2/RHS/x86_64/iso/
3. Create a new cluster and add 1 RHS node and wait till it's set to UP
4. Add a 2nd RHS node and watch the Events.

Actual results: 2nd RHS node was set to "Non-operational" and following messages are seen in the Events:

------------
2013-Nov-18, 16:55 State was set to Up for host Server1.
2013-Nov-18, 16:55 Network virbr0 is not attached to any interface on host Server1.
2013-Nov-18, 16:55 Host Server1 was autorecovered.
2013-Nov-18, 16:55 Network virbr0 is not attached to any interface on host Server1.
2013-Nov-18, 16:51 Gluster command [gluster peer status] failed on server Server1.
2013-Nov-18, 16:51 Failed to fetch gluster peer list from server Server1 on Cluster Cluster-CB8.
2013-Nov-18, 16:51 State was set to Up for host Server2.
2013-Nov-18, 16:51 Network virbr0 is not attached to any interface on host Server2.
2013-Nov-18, 16:51 Network virbr0 is not attached to any interface on host Server2.
2013-Nov-18, 16:51 Host Server2 installed
2013-Nov-18, 16:51 Network changes were saved on host Server2
2013-Nov-18, 16:51 Network virbr0 is not attached to any interface on host Server2.
2013-Nov-18, 16:51 Network virbr0 is not attached to any interface on host Server2.
2013-Nov-18, 16:51 Installing Host Server2. Stage: Termination.
------------

Expected results: Add Hosts should succeed all the time without any issues. 


Additional info: log collector report will be attached soon.

Comment 1 Prasanth 2013-11-18 11:52:40 UTC
rhsc-log-collector reports can be downloaded from: http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1031613/

Comment 2 Prasanth 2013-11-18 11:53:50 UTC
Ifconfig output before adding the host to cluster:

[root@vm10 ~]# ifconfig
eth0      Link encap:Ethernet  HWaddr 00:1A:4A:46:26:2D  
          inet addr:10.70.36.82  Bcast:10.70.37.255  Mask:255.255.254.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:4910 errors:0 dropped:0 overruns:0 frame:0
          TX packets:95 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000                                                                                                                                      
          RX bytes:307109 (299.9 KiB)  TX bytes:12543 (12.2 KiB)                                                                                                            
                                                                                                                                                                            
lo        Link encap:Local Loopback                                                                                                                                         
          inet addr:127.0.0.1  Mask:255.0.0.0                                                                                                                               
          UP LOOPBACK RUNNING  MTU:16436  Metric:1                                                                                                                          
          RX packets:700 errors:0 dropped:0 overruns:0 frame:0
          TX packets:700 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:111428 (108.8 KiB)  TX bytes:111428 (108.8 KiB)

virbr0    Link encap:Ethernet  HWaddr 52:54:00:13:24:07  
          inet addr:192.168.122.1  Bcast:192.168.122.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

Comment 3 Bala.FA 2013-11-20 10:59:02 UTC
I see below traceback in vdsm of vm10.lab.eng.blr.redhat.com

MainProcess|Thread-154::ERROR::2013-11-18 16:51:17,565::supervdsmServer::94::SuperVdsm.ServerCallback::(wrapper) Error in wrapper
Traceback (most recent call last):
  File "/usr/share/vdsm/supervdsmServer", line 92, in wrapper
    res = func(*args, **kwargs)
  File "/usr/share/vdsm/supervdsmServer", line 362, in wrapper
    return func(*args, **kwargs)
  File "/usr/share/vdsm/gluster/__init__.py", line 31, in wrapper
    return func(*args, **kwargs)
  File "/usr/share/vdsm/gluster/cli.py", line 876, in peerStatus
    raise ge.GlusterHostsListFailedException(rc=e.rc, err=e.err)
GlusterHostsListFailedException: Hosts list failed
error: (null)
return code: -22
MainProcess|Thread-155::DEBUG::2013-11-18 16:51:17,898::supervdsmServer::90::SuperVdsm.ServerCallback::(wrapper) call wrapper with (None,) {}


It looks like glusterfs fails on 'gluster peer info'

Comment 4 Bala.FA 2013-11-20 11:26:13 UTC
Below is the log from glusterfs.  It looks like 'peer probe' and 'peer status' are called immediately which causes 'peer status' to return empty.  The similar behaviour is also addressed in bz#972619


[2013-11-18 11:21:17.141761] I [glusterd-handler.c:821:__glusterd_handle_cli_probe] 0-glusterd: Received CLI probe req vm11.lab.eng.blr.redhat.com 24007
[2013-11-18 11:21:17.156228] I [glusterd-handler.c:2905:glusterd_probe_begin] 0-glusterd: Unable to find peerinfo for host: vm11.lab.eng.blr.redhat.com (24007)
[2013-11-18 11:21:17.340987] I [rpc-clnt.c:976:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2013-11-18 11:21:17.341045] I [socket.c:3505:socket_init] 0-management: SSL support is NOT enabled
[2013-11-18 11:21:17.341055] I [socket.c:3520:socket_init] 0-management: using system polling thread
[2013-11-18 11:21:17.347291] I [glusterd-handler.c:2886:glusterd_friend_add] 0-management: connect returned 0
[2013-11-18 11:21:17.563649] I [glusterd-handler.c:1018:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req
[2013-11-18 11:21:17.563864] W [dict.c:778:str_to_data] (-->/usr/lib64/glusterfs/3.4.0.43.1u2rhs/xlator/mgmt/glusterd.so(__glusterd_handle_cli_list_friends+0xa2) [0x7f9adce28842] (-->/usr/lib64/glusterfs/3.4.0.43.1u2rhs/xlator/mgmt/glusterd.so(glusterd_list_friends+0x114) [0x7f9adce283f4] (-->/usr/lib64/libglusterfs.so.0(dict_set_str+0x1c) [0x7f9ae1652cfc]))) 0-dict: value is NULL
[2013-11-18 11:21:18.025020] I [glusterd-handler.c:1073:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req
[2013-11-18 11:21:18.860709] I [glusterd-rpc-ops.c:235:__glusterd_probe_cbk] 0-glusterd: Received probe resp from uuid: e05b29ce-8b8c-4b62-9d81-747eac6916da, host: vm11.lab.eng.blr.redhat.com
[2013-11-18 11:21:19.200455] I [glusterd-rpc-ops.c:307:__glusterd_probe_cbk] 0-glusterd: Received resp to probe req
[2013-11-18 11:21:19.200596] I [glusterd-rpc-ops.c:357:__glusterd_friend_add_cbk] 0-glusterd: Received ACC from uuid: e05b29ce-8b8c-4b62-9d81-747eac6916da, host: vm11.lab.eng.blr.redhat.com, port: 0
[2013-11-18 11:21:19.728628] I [glusterd-handshake.c:556:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 2
[2013-11-18 11:21:20.136427] I [glusterd-handler.c:2324:__glusterd_handle_probe_query] 0-glusterd: Received probe from uuid: e05b29ce-8b8c-4b62-9d81-747eac6916da
[2013-11-18 11:21:20.136523] I [glusterd-handler.c:2376:__glusterd_handle_probe_query] 0-glusterd: Responded to 10.70.36.83, op_ret: 0, op_errno: 0, ret: 0
[2013-11-18 11:21:20.428050] I [glusterd-handler.c:2028:__glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from uuid: e05b29ce-8b8c-4b62-9d81-747eac6916da
[2013-11-18 11:21:20.428118] I [glusterd-handler.c:3059:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to vm11.lab.eng.blr.redhat.com (0), ret: 0
[2013-11-18 11:21:20.818363] I [glusterd-sm.c:495:glusterd_ac_send_friend_update] 0-: Added uuid: e05b29ce-8b8c-4b62-9d81-747eac6916da, host: vm11.lab.eng.blr.redhat.com
[2013-11-18 11:21:21.194497] I [glusterd-handler.c:2190:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: e05b29ce-8b8c-4b62-9d81-747eac6916da
[2013-11-18 11:21:21.194544] I [glusterd-handler.c:2235:__glusterd_handle_friend_update] 0-: Received uuid: 1f65487d-8859-4e70-b518-682368aaf91c, hostname:10.70.36.82
[2013-11-18 11:21:21.194553] I [glusterd-handler.c:2244:__glusterd_handle_friend_update] 0-: Received my uuid as Friend
[2013-11-18 11:21:21.336185] I [glusterd-rpc-ops.c:554:__glusterd_friend_update_cbk] 0-management: Received ACC from uuid: e05b29ce-8b8c-4b62-9d81-747eac6916da
[2013-11-18 11:24:29.680426] I [glusterd-handler.c:1018:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req

Comment 5 Bala.FA 2013-11-25 08:55:42 UTC
Moving to ON_QA as per bz#972619

Comment 6 Prasanth 2013-11-27 12:24:01 UTC
Verified in cb9 (rhsc-2.1.2-0.25.master.el6_5.noarch)

Haven't seen this issue so far during my testing with cb9 build. Able to add 2nd node immediately after node1 comes up. 

So marking it as verified. Will re-open the bug if it happens again.

Comment 8 Dusmant 2014-01-07 15:16:57 UTC
As this bug is similar to https://bugzilla.redhat.com/show_bug.cgi?id=972619, its not required to add another doctext to this bz.

Comment 10 errata-xmlrpc 2014-02-25 08:04:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-0208.html


Note You need to log in before you can comment on or make changes to this bug.