Description of problem: Latest image of gluster container fails to launch gluster container. glusterd service fails to start. docker images REPOSITORY TAG IMAGE ID CREATED SIZE brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhgs3/rhgs-server-rhel7 latest d670e1d14e3a 20 hours ago 281.9 MB snippet of etc-glusterfs-glusterd.vol.log [2016-11-21 08:00:10.318162] I [MSGID: 100030] [glusterfsd.c:2338:main] 0-glusterd: Started running glusterd version 3.7.9 (args: glusterd --xlator-option *.upgrade=on -N) [2016-11-21 08:00:10.327302] I [graph.c:269:gf_add_cmdline_options] 0-management: adding option 'upgrade' for volume 'management' with value 'on' [2016-11-21 08:00:10.327361] I [MSGID: 106478] [glusterd.c:1383:init] 0-management: Maximum allowed open file descriptors set to 65536 [2016-11-21 08:00:10.327377] I [MSGID: 106479] [glusterd.c:1432:init] 0-management: Using /var/lib/glusterd as working directory [2016-11-21 08:00:10.334289] E [rpc-transport.c:292:rpc_transport_load] 0-rpc-transport: /usr/lib64/glusterfs/3.7.9/rpc-transport/rdma.so: cannot open shared object file: No such file or directory [2016-11-21 08:00:10.334303] W [rpc-transport.c:296:rpc_transport_load] 0-rpc-transport: volume 'rdma.management': transport-type 'rdma' is not valid or not found on this machine [2016-11-21 08:00:10.334310] W [rpcsvc.c:1627:rpcsvc_create_listener] 0-rpc-service: cannot create listener, initing the transport failed [2016-11-21 08:00:10.334318] E [MSGID: 106243] [glusterd.c:1656:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport [2016-11-21 08:00:10.335084] E [MSGID: 101032] [store.c:435:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.info. [No such file or directory] [2016-11-21 08:00:10.335102] E [MSGID: 101032] [store.c:435:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.info. [No such file or directory] [2016-11-21 08:00:10.335106] I [MSGID: 106514] [glusterd-store.c:2093:glusterd_restore_op_version] 0-management: Detected new install. Setting op-version to maximum : 30712 [2016-11-21 08:00:10.335167] I [MSGID: 106194] [glusterd-store.c:3565:glusterd_store_retrieve_missed_snaps_list] 0-management: No missed snaps list. [2016-11-21 08:00:10.335213] E [MSGID: 101032] [store.c:435:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/options. [No such file or directory] Final graph: +------------------------------------------------------------------------------+ 1: volume management 2: type mgmt/glusterd 3: option rpc-auth.auth-glusterfs on 4: option rpc-auth.auth-unix on 5: option rpc-auth.auth-null on 6: option rpc-auth-allow-insecure on 7: option transport.socket.listen-backlog 128 8: option upgrade on 9: option event-threads 1 10: option ping-timeout 0 11: option transport.socket.read-fail-log off 12: option transport.socket.keepalive-interval 2 13: option transport.socket.keepalive-time 10 14: option transport-type rdma 15: option working-directory /var/lib/glusterd 16: end-volume 17: +------------------------------------------------------------------------------+ [2016-11-21 08:00:10.340245] W [glusterfsd.c:1251:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dc5) [0x7ff48729adc5] -->glusterd(glusterfs_sigwaiter+0xe5) [0x7ff488912915] -->glusterd(cleanup_and_exit+0x6b) [0x7ff48891278b] ) 0-: received signum (15), shutting down [2016-11-21 08:00:10.340269] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2016-11-21 08:00:10.340840] E [rpcsvc.c:1390:rpcsvc_program_unregister_portmap] 0-rpc-service: Could not unregister with portmap [2016-11-21 08:00:10.340855] E [rpcsvc.c:1516:rpcsvc_program_unregister] 0-rpc-service: portmap unregistration of program failed [2016-11-21 08:00:10.340860] E [rpcsvc.c:1554:rpcsvc_program_unregister] 0-rpc-service: Program unregistration failed: GlusterD svc cli, Num: 1238463, Ver: 2, Port: 0 [2016-11-21 08:00:10.340959] E [rpcsvc.c:1390:rpcsvc_program_unregister_portmap] 0-rpc-service: Could not unregister with portmap [2016-11-21 08:00:10.340967] E [rpcsvc.c:1516:rpcsvc_program_unregister] 0-rpc-service: portmap unregistration of program failed [2016-11-21 08:00:10.340971] E [rpcsvc.c:1554:rpcsvc_program_unregister] 0-rpc-service: Program unregistration failed: Gluster Handshake, Num: 14398633, Ver: 2, Port: 0 Version-Release number of selected component (if applicable): rhgs3/rhgs-server-rhel7:latest image, i.e., image id --> d670e1d14e3a How reproducible: 100% Steps to Reproduce: 1. Pull the latest gluster image (d670e1d14e3a) 2. Try to spin up a container out of this image Actual results: glusterd fails to start and hence gluster pod fails to spin Expected results: gluster pod should get spun successfully. Additional info: sosreports and other logs shall be attached shortly.
Is it something wrong with this "/var/lib/glusterd" path ? ~~~ [store.c:435:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.info. [No such file or directory] [2016-11-21 08:00:10.335102] E [MSGID: 101032] [store.c:435:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.info. [No such file or directory] ~~~ Or portmap is going wrong? --snip-- [2016-11-21 08:00:10.340840] E [rpcsvc.c:1390:rpcsvc_program_unregister_portmap] 0-rpc-service: Could not unregister with portmap [2016-11-21 08:00:10.340855] E [rpcsvc.c:1516:rpcsvc_program_unregister] 0-rpc-service: portmap unregistration of program failed [2016-11-21 08:00:10.340860] E [rpcsvc.c:1554:rpcsvc_program_unregister] 0-rpc-service: Program unregistration failed: GlusterD svc cli, Num: 1238463, Ver: 2, Port: 0 --/snip--
Created attachment 1222568 [details] output of 'journalctl -xe' Atin helped to figure out that there seems to be some dependency issues for glusterd to start. Hence, attaching the output of journalctl -xe.
Thanks for the logs. Looks like rpcbind is in problem, thus glusterd -- -- Unit sysinit.target has begun starting up. Nov 22 04:07:25 dhcp42-40.lab.eng.blr.redhat.com systemd[1]: rpcbind.socket failed to listen on sockets: Address already in use Nov 22 04:07:25 dhcp42-40.lab.eng.blr.redhat.com systemd[1]: Failed to listen on RPCbind Server Activation Socket. -- Subject: Unit rpcbind.socket has failed -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit rpcbind.socket has failed. -- -- The result is failed. Nov 22 04:07:25 dhcp42-40.lab.eng.blr.redhat.com systemd[1]: Dependency failed for RPC bind service. -- Subject: Unit rpcbind.service has failed -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit rpcbind.service has failed. -- -- The result is dependency. Nov 22 04:07:25 dhcp42-40.lab.eng.blr.redhat.com systemd[1]: Dependency failed for GlusterFS, a clustered file-system server. -- Subject: Unit glusterd.service has failed -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit glusterd.service has failed.
Ashiq, look at this angle https://bugzilla.redhat.com/show_bug.cgi?id=1357023
This looks to be the exact cause of this error, "rpcbind.socket failed to listen on sockets: Address already in use" and iic, it listens on "111". Is this port already in use?
Karthick, Please attach netstat output from this system
Created attachment 1222651 [details] netstat output
I disabled rpcbind.socket service Then I deployed gluster pod: [root@dhcp42-16 ~]# oc process glusterfs -v GLUSTERFS_NODE=dhcp42-40.lab.eng.blr.redhat.com | oc create -f - deploymentconfig "glusterfs-dc-dhcp42-40.lab.eng.blr.redhat.com" created The oc get pod says "its running" , however it has not updated the READY status: [root@dhcp42-16 ~]# oc get pods NAME READY STATUS RESTARTS AGE glusterfs-dc-dhcp42-40.lab.eng.blr.redhat.com-1-54mr8 0/1 Running 0 8m glusterfs-dc-dhcp42-40.lab.eng.blr.redhat.com-1-deploy 1/1 Running 0 8m glusterfs-dc-dhcp47-72.lab.eng.blr.redhat.com-1-deploy 0/1 Error 0 7h storage-project-router-1-btkke 1/1 Running 0 1d But I can exec into the pod and ran 'glusterd' status command. [root@dhcp42-40 /]# systemctl status glusterd ● glusterd.service - GlusterFS, a clustered file-system server Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled) Active: active (running) since Tue 2016-11-22 11:01:08 UTC; 8min ago Process: 159 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS) Main PID: 160 (glusterd) CGroup: /system.slice/docker-e0ff7fb62a21e62c316ef7fd7d00b05d4640794f430fee5b65c0885f1713024a.scope/system.slice/glusterd.service └─160 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO Nov 22 11:01:06 dhcp42-40.lab.eng.blr.redhat.com systemd[1]: Starting GlusterFS, a clustered file-system server... Nov 22 11:01:08 dhcp42-40.lab.eng.blr.redhat.com systemd[1]: Started GlusterFS, a clustered file-system server. [root@dhcp42-40 /]# glusterd --version glusterfs 3.7.9 built on Aug 24 2016 02:50:44 Repository revision: git://git.gluster.com/glusterfs.git Copyright (c) 2006-2013 Red Hat, Inc. <http://www.redhat.com/> GlusterFS comes with ABSOLUTELY NO WARRANTY. It is licensed to you under your choice of the GNU Lesser General Public License, version 3 or any later version (LGPLv3 or later), or the GNU General Public License, version 2 (GPLv2), in all cases as published by the Free Software Foundation. [root@dhcp42-40 /]#
Hi, rpcbind.socket in release rpcbind-0.2.0-38.el7.x86_64, explicitly binds to IP's port 111 on host. In previous release it was binding only on request. If we had used rpcbind of host(openshift node) for NFS server, we would have hit this before. This happens as we use --net=host. New image will work if we stop rpcbind.socket and rpcbind.service in host. http://pastebin.test.redhat.com/432506 There is another bug[1] in rpcbind which is caused while updating only rpcbind. I faced this when I updated only on host. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1396291 --Ashiq
As we are not using gluster nfs service in the container, we could *try* to disable it using a sed command in Dockerfile before the glusterd start. But, this need to be tested explictitly.
(In reply to Humble Chirammal from comment #14) > As we are not using gluster nfs service in the container, we could *try* to > disable it using a sed command in Dockerfile before the glusterd start. But, > this need to be tested explictitly. Need to see how this effect the container upgrade as the unit file will fall back to previous state if we upgrade from the container, this is not really recommended. If we container upgrade it will be taken care.
(In reply to Humble Chirammal from comment #15) > (In reply to Humble Chirammal from comment #14) > > As we are not using gluster nfs service in the container, we could *try* to > > disable it using a sed command in Dockerfile before the glusterd start. But, > > this need to be tested explictitly. > > Need to see how this effect the container upgrade as the unit file will fall > back to previous state if we upgrade from the container, this is not really > recommended. If we container upgrade it will be taken care. "If we container upgrade it will be taken care." Above should have been "if we do container upgrade it will be taken care"
Verified with the following docker image. REPOSITORY TAG IMAGE ID CREATED SIZE brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhgs3/rhgs-server-rhel7 latest f019b0d1abd5 4 days ago 281.9 MB other build info: [root@dhcp47-115 ~]# openshift version openshift v3.4.0.29+ca980ba kubernetes v1.4.0+776c994 etcd 3.1.0-rc.0 [root@dhcp47-115 ~]# heketi-cli version Error: unknown command "version" for "heketi-cli" Run 'heketi-cli --help' for usage. [root@dhcp47-115 ~]# heketi-cli --version heketi-cli 3.1.0 [root@dhcp47-115 ~]# rpm -qa | grep 'heketi' heketi-client-3.1.0-3.el7rhgs.x86_64 heketi-templates-3.1.0-3.el7rhgs.x86_64 [root@dhcp47-115 ~]# docker images gluster cluster setup is successful using the above image and builds. moving the bug to verified.
(In reply to krishnaram Karthick from comment #21) > Verified with the following docker image. > > REPOSITORY > TAG IMAGE ID CREATED SIZE > brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhgs3/rhgs-server-rhel7 > latest f019b0d1abd5 4 days ago 281.9 MB > > other build info: > [root@dhcp47-115 ~]# openshift version > openshift v3.4.0.29+ca980ba > kubernetes v1.4.0+776c994 > etcd 3.1.0-rc.0 > [root@dhcp47-115 ~]# heketi-cli version > Error: unknown command "version" for "heketi-cli" > Run 'heketi-cli --help' for usage. > [root@dhcp47-115 ~]# heketi-cli --version > heketi-cli 3.1.0 > [root@dhcp47-115 ~]# rpm -qa | grep 'heketi' > heketi-client-3.1.0-3.el7rhgs.x86_64 > heketi-templates-3.1.0-3.el7rhgs.x86_64 > [root@dhcp47-115 ~]# docker images > > gluster cluster setup is successful using the above image and builds. moving > the bug to verified. Awesome! thanks. :)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:0149