Hide Forgot
Created attachment 825562 [details] screensot of error Description of problem: Adding the 2nd RHS node failed and the Host went to non-operational. Version-Release number of selected component (if applicable): cb8 build --------------------- [root@vm10 ~]# rpm -qa |grep vdsm vdsm-gluster-4.13.0-21.el6rhs.noarch vdsm-cli-4.13.0-21.el6rhs.noarch vdsm-python-4.13.0-21.el6rhs.x86_64 vdsm-python-cpopen-4.13.0-21.el6rhs.x86_64 vdsm-4.13.0-21.el6rhs.x86_64 vdsm-reg-4.13.0-21.el6rhs.noarch vdsm-xmlrpc-4.13.0-21.el6rhs.noarch [root@vm10 ~]# rpm -qa |grep glusterfs glusterfs-server-3.4.0.43.1u2rhs-1.el6rhs.x86_64 glusterfs-libs-3.4.0.43.1u2rhs-1.el6rhs.x86_64 glusterfs-3.4.0.43.1u2rhs-1.el6rhs.x86_64 glusterfs-fuse-3.4.0.43.1u2rhs-1.el6rhs.x86_64 glusterfs-geo-replication-3.4.0.43.1u2rhs-1.el6rhs.x86_64 samba-glusterfs-3.6.9-160.7.el6rhs.x86_64 glusterfs-rdma-3.4.0.43.1u2rhs-1.el6rhs.x86_64 glusterfs-api-3.4.0.43.1u2rhs-1.el6rhs.x86_64 --------------------- How reproducible: Seen once in cb8 and a couple of times in older builds as well Steps to Reproduce: PS: Kindly note that both my RHS nodes were Freshly installed with http://download.lab.bos.redhat.com/composes/nightly/RHSS-2.1u2-20131117.n.0/2/RHS/x86_64/iso/ 1. Install the latest cb8 build as per http://rhsm.pad.engineering.redhat.com/rhsc-build-cb8 2. Install 2 RHS nodes with http://download.lab.bos.redhat.com/composes/nightly/RHSS-2.1u2-20131117.n.0/2/RHS/x86_64/iso/ 3. Create a new cluster and add 1 RHS node and wait till it's set to UP 4. Add a 2nd RHS node and watch the Events. Actual results: 2nd RHS node was set to "Non-operational" and following messages are seen in the Events: ------------ 2013-Nov-18, 16:55 State was set to Up for host Server1. 2013-Nov-18, 16:55 Network virbr0 is not attached to any interface on host Server1. 2013-Nov-18, 16:55 Host Server1 was autorecovered. 2013-Nov-18, 16:55 Network virbr0 is not attached to any interface on host Server1. 2013-Nov-18, 16:51 Gluster command [gluster peer status] failed on server Server1. 2013-Nov-18, 16:51 Failed to fetch gluster peer list from server Server1 on Cluster Cluster-CB8. 2013-Nov-18, 16:51 State was set to Up for host Server2. 2013-Nov-18, 16:51 Network virbr0 is not attached to any interface on host Server2. 2013-Nov-18, 16:51 Network virbr0 is not attached to any interface on host Server2. 2013-Nov-18, 16:51 Host Server2 installed 2013-Nov-18, 16:51 Network changes were saved on host Server2 2013-Nov-18, 16:51 Network virbr0 is not attached to any interface on host Server2. 2013-Nov-18, 16:51 Network virbr0 is not attached to any interface on host Server2. 2013-Nov-18, 16:51 Installing Host Server2. Stage: Termination. ------------ Expected results: Add Hosts should succeed all the time without any issues. Additional info: log collector report will be attached soon.
rhsc-log-collector reports can be downloaded from: http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1031613/
Ifconfig output before adding the host to cluster: [root@vm10 ~]# ifconfig eth0 Link encap:Ethernet HWaddr 00:1A:4A:46:26:2D inet addr:10.70.36.82 Bcast:10.70.37.255 Mask:255.255.254.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:4910 errors:0 dropped:0 overruns:0 frame:0 TX packets:95 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:307109 (299.9 KiB) TX bytes:12543 (12.2 KiB) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:700 errors:0 dropped:0 overruns:0 frame:0 TX packets:700 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:111428 (108.8 KiB) TX bytes:111428 (108.8 KiB) virbr0 Link encap:Ethernet HWaddr 52:54:00:13:24:07 inet addr:192.168.122.1 Bcast:192.168.122.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
I see below traceback in vdsm of vm10.lab.eng.blr.redhat.com MainProcess|Thread-154::ERROR::2013-11-18 16:51:17,565::supervdsmServer::94::SuperVdsm.ServerCallback::(wrapper) Error in wrapper Traceback (most recent call last): File "/usr/share/vdsm/supervdsmServer", line 92, in wrapper res = func(*args, **kwargs) File "/usr/share/vdsm/supervdsmServer", line 362, in wrapper return func(*args, **kwargs) File "/usr/share/vdsm/gluster/__init__.py", line 31, in wrapper return func(*args, **kwargs) File "/usr/share/vdsm/gluster/cli.py", line 876, in peerStatus raise ge.GlusterHostsListFailedException(rc=e.rc, err=e.err) GlusterHostsListFailedException: Hosts list failed error: (null) return code: -22 MainProcess|Thread-155::DEBUG::2013-11-18 16:51:17,898::supervdsmServer::90::SuperVdsm.ServerCallback::(wrapper) call wrapper with (None,) {} It looks like glusterfs fails on 'gluster peer info'
Below is the log from glusterfs. It looks like 'peer probe' and 'peer status' are called immediately which causes 'peer status' to return empty. The similar behaviour is also addressed in bz#972619 [2013-11-18 11:21:17.141761] I [glusterd-handler.c:821:__glusterd_handle_cli_probe] 0-glusterd: Received CLI probe req vm11.lab.eng.blr.redhat.com 24007 [2013-11-18 11:21:17.156228] I [glusterd-handler.c:2905:glusterd_probe_begin] 0-glusterd: Unable to find peerinfo for host: vm11.lab.eng.blr.redhat.com (24007) [2013-11-18 11:21:17.340987] I [rpc-clnt.c:976:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2013-11-18 11:21:17.341045] I [socket.c:3505:socket_init] 0-management: SSL support is NOT enabled [2013-11-18 11:21:17.341055] I [socket.c:3520:socket_init] 0-management: using system polling thread [2013-11-18 11:21:17.347291] I [glusterd-handler.c:2886:glusterd_friend_add] 0-management: connect returned 0 [2013-11-18 11:21:17.563649] I [glusterd-handler.c:1018:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req [2013-11-18 11:21:17.563864] W [dict.c:778:str_to_data] (-->/usr/lib64/glusterfs/3.4.0.43.1u2rhs/xlator/mgmt/glusterd.so(__glusterd_handle_cli_list_friends+0xa2) [0x7f9adce28842] (-->/usr/lib64/glusterfs/3.4.0.43.1u2rhs/xlator/mgmt/glusterd.so(glusterd_list_friends+0x114) [0x7f9adce283f4] (-->/usr/lib64/libglusterfs.so.0(dict_set_str+0x1c) [0x7f9ae1652cfc]))) 0-dict: value is NULL [2013-11-18 11:21:18.025020] I [glusterd-handler.c:1073:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2013-11-18 11:21:18.860709] I [glusterd-rpc-ops.c:235:__glusterd_probe_cbk] 0-glusterd: Received probe resp from uuid: e05b29ce-8b8c-4b62-9d81-747eac6916da, host: vm11.lab.eng.blr.redhat.com [2013-11-18 11:21:19.200455] I [glusterd-rpc-ops.c:307:__glusterd_probe_cbk] 0-glusterd: Received resp to probe req [2013-11-18 11:21:19.200596] I [glusterd-rpc-ops.c:357:__glusterd_friend_add_cbk] 0-glusterd: Received ACC from uuid: e05b29ce-8b8c-4b62-9d81-747eac6916da, host: vm11.lab.eng.blr.redhat.com, port: 0 [2013-11-18 11:21:19.728628] I [glusterd-handshake.c:556:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 2 [2013-11-18 11:21:20.136427] I [glusterd-handler.c:2324:__glusterd_handle_probe_query] 0-glusterd: Received probe from uuid: e05b29ce-8b8c-4b62-9d81-747eac6916da [2013-11-18 11:21:20.136523] I [glusterd-handler.c:2376:__glusterd_handle_probe_query] 0-glusterd: Responded to 10.70.36.83, op_ret: 0, op_errno: 0, ret: 0 [2013-11-18 11:21:20.428050] I [glusterd-handler.c:2028:__glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from uuid: e05b29ce-8b8c-4b62-9d81-747eac6916da [2013-11-18 11:21:20.428118] I [glusterd-handler.c:3059:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to vm11.lab.eng.blr.redhat.com (0), ret: 0 [2013-11-18 11:21:20.818363] I [glusterd-sm.c:495:glusterd_ac_send_friend_update] 0-: Added uuid: e05b29ce-8b8c-4b62-9d81-747eac6916da, host: vm11.lab.eng.blr.redhat.com [2013-11-18 11:21:21.194497] I [glusterd-handler.c:2190:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: e05b29ce-8b8c-4b62-9d81-747eac6916da [2013-11-18 11:21:21.194544] I [glusterd-handler.c:2235:__glusterd_handle_friend_update] 0-: Received uuid: 1f65487d-8859-4e70-b518-682368aaf91c, hostname:10.70.36.82 [2013-11-18 11:21:21.194553] I [glusterd-handler.c:2244:__glusterd_handle_friend_update] 0-: Received my uuid as Friend [2013-11-18 11:21:21.336185] I [glusterd-rpc-ops.c:554:__glusterd_friend_update_cbk] 0-management: Received ACC from uuid: e05b29ce-8b8c-4b62-9d81-747eac6916da [2013-11-18 11:24:29.680426] I [glusterd-handler.c:1018:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req
Moving to ON_QA as per bz#972619
Verified in cb9 (rhsc-2.1.2-0.25.master.el6_5.noarch) Haven't seen this issue so far during my testing with cb9 build. Able to add 2nd node immediately after node1 comes up. So marking it as verified. Will re-open the bug if it happens again.
As this bug is similar to https://bugzilla.redhat.com/show_bug.cgi?id=972619, its not required to add another doctext to this bz.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2014-0208.html