Bug 1009889

Summary: libgfapi: "Transport endpoint is not connected" when images are created in loop.
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Gowrishankar Rajaiyan <grajaiya>
Component: sambaAssignee: Raghavendra Talur <rtalur>
Status: CLOSED EOL QA Contact: SATHEESARAN <sasundar>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 2.1CC: grajaiya, ira, nlevinki, pgurusid, rjoseph, rtalur, sasundar, ssaha, vagarwal, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: gfapi
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-12-03 17:10:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Gowrishankar Rajaiyan 2013-09-19 12:37:21 UTC
Description of problem:


Version-Release number of selected component (if applicable):
qemu-img-0.12.1.2-2.404.el6.x86_64
qemu-kvm-0.12.1.2-2.404.el6.x86_64
glusterfs-api-3.4.0.33rhs-1.el6_4.x86_64
glusterfs-rdma-3.4.0.33rhs-1.el6_4.x86_64
glusterfs-debuginfo-3.4.0.33rhs-1.el6_4.x86_64
glusterfs-libs-3.4.0.33rhs-1.el6_4.x86_64
glusterfs-fuse-3.4.0.33rhs-1.el6_4.x86_64
glusterfs-api-devel-3.4.0.33rhs-1.el6_4.x86_64
glusterfs-3.4.0.33rhs-1.el6_4.x86_64
glusterfs-devel-3.4.0.33rhs-1.el6_4.x86_64


How reproducible: Always


Steps to Reproduce:
1. Execute qemu-img create -f qcow2 gluster://10.65.201.142/vmstore/images 10G in loop.


Actual results:
[root@dhcp201-162 ~]# for i in {1..5}; do qemu-img create -f qcow2 gluster://10.65.201.142/vmstore/image$i 10G; done;
Formatting 'gluster://10.65.201.142/vmstore/image1', fmt=qcow2 size=10737418240 encryption=off cluster_size=65536 
Formatting 'gluster://10.65.201.142/vmstore/image2', fmt=qcow2 size=10737418240 encryption=off cluster_size=65536 
[2013-09-19 12:31:53.291975] E [rpc-clnt.c:368:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x164) [0x7fcc325ec0f4] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3) [0x7fcc325ebc33] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7fcc325ebb4e]))) 0-vmstore-client-1: forced unwinding frame type(GlusterFS 3.3) op(FLUSH(15)) called at 2013-09-19 12:31:53.291775 (xid=0x15x)
[2013-09-19 12:31:53.292023] W [client-rpc-fops.c:928:client3_3_flush_cbk] 0-glusterfs: remote operation failed: Transport endpoint is not connected
[2013-09-19 12:31:53.292046] I [client.c:2103:client_rpc_notify] 0-vmstore-client-1: disconnected from 10.65.201.191:49153. Client process will keep trying to connect to glusterd until brick's port is available. 
Formatting 'gluster://10.65.201.142/vmstore/image3', fmt=qcow2 size=10737418240 encryption=off cluster_size=65536 
[2013-09-19 12:31:53.637314] E [rpc-clnt.c:368:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x164) [0x7fea898b90f4] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3) [0x7fea898b8c33] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7fea898b8b4e]))) 0-vmstore-client-0: forced unwinding frame type(GlusterFS 3.3) op(FLUSH(15)) called at 2013-09-19 12:31:53.637072 (xid=0x16x)
[2013-09-19 12:31:53.637348] W [client-rpc-fops.c:928:client3_3_flush_cbk] 0-glusterfs: remote operation failed: Transport endpoint is not connected
[2013-09-19 12:31:53.637367] I [client.c:2103:client_rpc_notify] 0-vmstore-client-0: disconnected from 10.65.201.142:49153. Client process will keep trying to connect to glusterd until brick's port is available. 
Formatting 'gluster://10.65.201.142/vmstore/image4', fmt=qcow2 size=10737418240 encryption=off cluster_size=65536 
[2013-09-19 12:31:53.870752] E [rpc-clnt.c:368:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x164) [0x7f05c5fba0f4] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3) [0x7f05c5fb9c33] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f05c5fb9b4e]))) 0-vmstore-client-1: forced unwinding frame type(GlusterFS 3.3) op(FLUSH(15)) called at 2013-09-19 12:31:53.870537 (xid=0x15x)
[2013-09-19 12:31:53.870801] W [client-rpc-fops.c:928:client3_3_flush_cbk] 0-glusterfs: remote operation failed: Transport endpoint is not connected
[2013-09-19 12:31:53.870823] I [client.c:2103:client_rpc_notify] 0-vmstore-client-1: disconnected from 10.65.201.191:49153. Client process will keep trying to connect to glusterd until brick's port is available. 
Formatting 'gluster://10.65.201.142/vmstore/image5', fmt=qcow2 size=10737418240 encryption=off cluster_size=65536 
[root@dhcp201-162 ~]# 



Expected results: All images should be created successfully.


Additional info:

Comment 2 Raghavendra Talur 2013-09-19 13:17:44 UTC
Hi Gowrishankar,

This might be because client process is running out of trusted ports to connect to glusterd.

Can you check if it is the same issue? If not, I will look into it next week.

test to see if it is the same issue:

1. Set server.allow-insecure to on using the following command:
gluster volume set <VOLNAME> server.allow-insecure on

2. In /etc/glusterfs/glusterd.vol of each Red Hat Storage nodes, modify the glusterd vol file by adding the following setting:
option rpc-auth-allow-insecure on
Restart the glusterd service on each RHS node.

3. Run the same command in loop to see if happens again.

Comment 3 Gowrishankar Rajaiyan 2013-09-20 05:37:32 UTC
[root@dhcp201-142 ~]# gluster volume set vmstore server.allow-insecure on
volume set: success
[root@dhcp201-142 ~]#

[root@dhcp201-142 ~]# gluster vol info vmstore
 
Volume Name: vmstore
Type: Distribute
Volume ID: 044845d1-5b3b-4958-94a5-3d7ffad358b2
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: 10.65.201.142:/rhs/vmstore
Brick2: 10.65.201.191:/rhs/vmstore
Options Reconfigured:
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: enable
server.allow-insecure: on
[root@dhcp201-142 ~]# 

On both rhs nodes:
[root@dhcp201-142 ~]# cat /etc/glusterfs/glusterd.vol
volume management
    type mgmt/glusterd
    option working-directory /var/lib/glusterd
    option transport-type socket,rdma
    option transport.socket.keepalive-time 10
    option transport.socket.keepalive-interval 2
    option transport.socket.read-fail-log off
    option rpc-auth-allow-insecure on
end-volume
[root@dhcp201-142 ~]# 

[root@dhcp201-142 ~]# service glusterd restart
Stopping glusterd:                                         [  OK  ]
Starting glusterd:                                         [  OK  ]
[root@dhcp201-142 ~]#


On client:
[root@dhcp201-162 ~]# for i in {1..5}; do qemu-img create -f qcow2 gluster://10.65.201.142/vmstore/image$i 10G; done;
Formatting 'gluster://10.65.201.142/vmstore/image1', fmt=qcow2 size=10737418240 encryption=off cluster_size=65536 
Formatting 'gluster://10.65.201.142/vmstore/image2', fmt=qcow2 size=10737418240 encryption=off cluster_size=65536 
gluster://10.65.201.142/vmstore/image2: error while creating qcow2: No such file or directory
Formatting 'gluster://10.65.201.142/vmstore/image3', fmt=qcow2 size=10737418240 encryption=off cluster_size=65536 
Formatting 'gluster://10.65.201.142/vmstore/image4', fmt=qcow2 size=10737418240 encryption=off cluster_size=65536 
gluster://10.65.201.142/vmstore/image4: error while creating qcow2: No such file or directory
Formatting 'gluster://10.65.201.142/vmstore/image5', fmt=qcow2 size=10737418240 encryption=off cluster_size=65536 
gluster://10.65.201.142/vmstore/image5: error while creating qcow2: No such file or directory
[root@dhcp201-162 ~]#


second try:
[root@dhcp201-162 ~]# for i in {1..5}; do qemu-img create -f qcow2 gluster://10.65.201.142/vmstore/image$i 10G; done;
Formatting 'gluster://10.65.201.142/vmstore/image1', fmt=qcow2 size=10737418240 encryption=off cluster_size=65536 
Formatting 'gluster://10.65.201.142/vmstore/image2', fmt=qcow2 size=10737418240 encryption=off cluster_size=65536 
[2013-09-20 05:31:31.089059] E [rpc-clnt.c:368:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x164) [0x7f0bba6c00f4] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3) [0x7f0bba6bfc33] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f0bba6bfb4e]))) 0-vmstore-client-1: forced unwinding frame type(GlusterFS 3.3) op(FLUSH(15)) called at 2013-09-20 05:31:31.088841 (xid=0x13x)
[2013-09-20 05:31:31.089102] W [client-rpc-fops.c:928:client3_3_flush_cbk] 0-glusterfs: remote operation failed: Transport endpoint is not connected
[2013-09-20 05:31:31.089123] I [client.c:2103:client_rpc_notify] 0-vmstore-client-1: disconnected from 10.65.201.191:49153. Client process will keep trying to connect to glusterd until brick's port is available. 
Formatting 'gluster://10.65.201.142/vmstore/image3', fmt=qcow2 size=10737418240 encryption=off cluster_size=65536 
[2013-09-20 05:31:31.270948] E [rpc-clnt.c:368:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x164) [0x7f0c6636b0f4] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3) [0x7f0c6636ac33] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f0c6636ab4e]))) 0-vmstore-client-0: forced unwinding frame type(GlusterFS 3.3) op(FLUSH(15)) called at 2013-09-20 05:31:31.270702 (xid=0x14x)
[2013-09-20 05:31:31.270988] W [client-rpc-fops.c:928:client3_3_flush_cbk] 0-glusterfs: remote operation failed: Transport endpoint is not connected
[2013-09-20 05:31:31.271010] I [client.c:2103:client_rpc_notify] 0-vmstore-client-0: disconnected from 10.65.201.142:49153. Client process will keep trying to connect to glusterd until brick's port is available. 
Formatting 'gluster://10.65.201.142/vmstore/image4', fmt=qcow2 size=10737418240 encryption=off cluster_size=65536 
[2013-09-20 05:31:31.450305] E [rpc-clnt.c:368:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x164) [0x7fc39a27a0f4] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3) [0x7fc39a279c33] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7fc39a279b4e]))) 0-vmstore-client-1: forced unwinding frame type(GlusterFS 3.3) op(FLUSH(15)) called at 2013-09-20 05:31:31.450093 (xid=0x13x)
[2013-09-20 05:31:31.450348] W [client-rpc-fops.c:928:client3_3_flush_cbk] 0-glusterfs: remote operation failed: Transport endpoint is not connected
[2013-09-20 05:31:31.450369] I [client.c:2103:client_rpc_notify] 0-vmstore-client-1: disconnected from 10.65.201.191:49153. Client process will keep trying to connect to glusterd until brick's port is available. 
Formatting 'gluster://10.65.201.142/vmstore/image5', fmt=qcow2 size=10737418240 encryption=off cluster_size=65536 
[root@dhcp201-162 ~]#

Comment 4 Raghavendra Talur 2013-09-20 14:33:57 UTC
Thanks Gowrishankar,

Above comment confirms that it is a different issue.
Needs more investigation from someone in dev.

Comment 5 Raghavendra Talur 2015-03-25 11:12:52 UTC
Sas,

We are not able to reproduce this issue. This must have got fixed as part of other gfapi fixes.
Can you please try to reproduce and close if not reproduce?

Comment 6 SATHEESARAN 2015-04-13 10:22:51 UTC
(In reply to Raghavendra Talur from comment #5)
> Sas,
> 
> We are not able to reproduce this issue. This must have got fixed as part of
> other gfapi fixes.
> Can you please try to reproduce and close if not reproduce?

I will test it as a part of 3.1 testing and will close it, if not relevant

Comment 9 Vivek Agarwal 2015-12-03 17:10:37 UTC
Thank you for submitting this issue for consideration in Red Hat Gluster Storage. The release for which you requested us to review, is now End of Life. Please See https://access.redhat.com/support/policy/updates/rhs/

If you can reproduce this bug against a currently maintained version of Red Hat Gluster Storage, please feel free to file a new report against the current release.