Bug 1017014

Summary: Create qcow2 image with gluster got error
Product: Red Hat Enterprise Linux 6 Reporter: mazhang <mazhang>
Component: glusterfsAssignee: krishnan parthasarathi <kparthas>
Status: CLOSED ERRATA QA Contact: Sudhir D <sdharane>
Severity: high Docs Contact:
Priority: urgent    
Version: 6.5CC: aavati, acathrow, areis, asias, barumuga, bsarathy, chayang, flang, grajaiya, juzhang, kparthas, mazhang, michen, mkenneth, nsathyan, qzhang, tlavigne, virt-maint
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: glfs_fini() would return -1, even when libgfapi successfully completed all resource cleanup it did. Consequence: qemu-img create command, which uses libgfapi, fails when glfs_fini() returns -1. Fix: glfs_fini returns 0 when the function completes successfully and -1 otherwise. Result:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-11-21 12:01:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
log of glusterfs server none

Description mazhang 2013-10-09 07:17:12 UTC
Description of problem:
Create qcow2 image with gluster got error.

Version-Release number of selected component (if applicable):

host:
RHEL6.5-Snapshot-2.0
[root@m2 ~]# rpm -qa |grep gluster
glusterfs-3.4.0.34rhs-1.el6.x86_64
glusterfs-api-3.4.0.34rhs-1.el6.x86_64
glusterfs-libs-3.4.0.34rhs-1.el6.x86_64
[root@m2 ~]# rpm -qa |grep qemu
qemu-kvm-tools-0.12.1.2-2.411.el6.x86_64
qemu-kvm-0.12.1.2-2.411.el6.x86_64
qemu-kvm-debuginfo-0.12.1.2-2.411.el6.x86_64
gpxe-roms-qemu-0.9.7-6.10.el6.noarch
qemu-img-0.12.1.2-2.411.el6.x86_64

rhs:
[root@rhs ~]# rpm -qa |grep gluster
glusterfs-geo-replication-3.4.0.34rhs-1.el6rhs.x86_64
samba-glusterfs-3.6.9-160.3.el6rhs.x86_64
gluster-swift-container-1.8.0-6.11.el6rhs.noarch
glusterfs-libs-3.4.0.34rhs-1.el6rhs.x86_64
glusterfs-fuse-3.4.0.34rhs-1.el6rhs.x86_64
glusterfs-server-3.4.0.34rhs-1.el6rhs.x86_64
glusterfs-api-3.4.0.34rhs-1.el6rhs.x86_64
gluster-swift-proxy-1.8.0-6.11.el6rhs.noarch
gluster-swift-account-1.8.0-6.11.el6rhs.noarch
gluster-swift-plugin-1.8.0-6.el6rhs.noarch
vdsm-gluster-4.10.2-23.0.1.el6rhs.noarch
glusterfs-3.4.0.34rhs-1.el6rhs.x86_64
glusterfs-rdma-3.4.0.34rhs-1.el6rhs.x86_64
gluster-swift-1.8.0-6.11.el6rhs.noarch
gluster-swift-object-1.8.0-6.11.el6rhs.noarch


How reproducible:
aways

Steps to Reproduce:
1.create image on gluster volume.
qemu-img create gluster://10.66.4.216/gv0/test.raw 20G
qemu-img create -f qcow2 gluster://10.66.4.216/gv0/test.qcow2 20G

2.
3.

Actual results:
create qcow2 image got error.

[root@m2 ~]# qemu-img create gluster://10.66.4.216/gv0/test.raw 20G
Formatting 'gluster://10.66.4.216/gv0/test.raw', fmt=raw size=21474836480 
[root@m2 ~]# qemu-img create gluster://10.66.4.216/gv0/test1.raw 20G
Formatting 'gluster://10.66.4.216/gv0/test1.raw', fmt=raw size=21474836480 
[root@m2 ~]# qemu-img create -f qcow2 gluster://10.66.4.216/gv0/test.qcow2 20G
Formatting 'gluster://10.66.4.216/gv0/test.qcow2', fmt=qcow2 size=21474836480 encryption=off cluster_size=65536 
[2013-10-09 07:08:17.277889] E [rpc-clnt.c:368:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x164) [0x7f6892bd80f4] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3) [0x7f6892bd7c33] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f6892bd7b4e]))) 0-gv0-client-0: forced unwinding frame type(GlusterFS 3.3) op(FLUSH(15)) called at 2013-10-09 07:08:17.277322 (xid=0x14x)
[2013-10-09 07:08:17.277963] W [client-rpc-fops.c:928:client3_3_flush_cbk] 0-glusterfs: remote operation failed: Transport endpoint is not connected
[2013-10-09 07:08:17.278010] I [client.c:2103:client_rpc_notify] 0-gv0-client-0: disconnected from 10.66.4.216:49152. Client process will keep trying to connect to glusterd until brick's port is available. 
[root@m2 ~]# qemu-img create -f qcow2 gluster://10.66.4.216/gv0/test1.qcow2 20G
Formatting 'gluster://10.66.4.216/gv0/test1.qcow2', fmt=qcow2 size=21474836480 encryption=off cluster_size=65536 
[2013-10-09 07:09:14.740745] E [rpc-clnt.c:368:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x164) [0x7fe3f29800f4] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3) [0x7fe3f297fc33] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7fe3f297fb4e]))) 0-gv0-client-0: forced unwinding frame type(GlusterFS 3.3) op(FLUSH(15)) called at 2013-10-09 07:09:14.740372 (xid=0x16x)
[2013-10-09 07:09:14.740790] W [client-rpc-fops.c:928:client3_3_flush_cbk] 0-glusterfs: remote operation failed: Transport endpoint is not connected
[2013-10-09 07:09:14.740818] I [client.c:2103:client_rpc_notify] 0-gv0-client-0: disconnected from 10.66.4.216:49152. Client process will keep trying to connect to glusterd until brick's port is available. 


Expected results:
no error prompt.

Additional info:

Comment 2 Qunfang Zhang 2013-10-09 07:58:22 UTC
Hi, Maosheng

As we did not hit this problem in the last round of glusterfs testing. So please help downgrade the glusterfs or the qemu-kvm to see whether this is a regression, and if yes, help figure out the component. Thanks.

Comment 3 mazhang 2013-10-10 08:01:45 UTC
Downgrade host glusterfs package to glusterfs-3.4.0.19rhs-2.el6.x86_64, can not hit this problem.

host:
RHEL6.5-Snapshot-2.0
kernel-2.6.32-422.el6.x86_64

qemu-kvm-0.12.1.2-2.411.el6.x86_64
qemu-kvm-debuginfo-0.12.1.2-2.411.el6.x86_64
qemu-kvm-tools-0.12.1.2-2.411.el6.x86_64

glusterfs-libs-3.4.0.19rhs-2.el6.x86_64
glusterfs-3.4.0.19rhs-2.el6.x86_64
glusterfs-api-3.4.0.19rhs-2.el6.x86_64

rhs:
RHS-2.1-20130830.n.0
glusterfs-server-3.4.0.34rhs-1.el6rhs.x86_64
glusterfs-3.4.0.34rhs-1.el6rhs.x86_64
glusterfs-fuse-3.4.0.34rhs-1.el6rhs.x86_64
glusterfs-api-3.4.0.34rhs-1.el6rhs.x86_64
glusterfs-geo-replication-3.4.0.34rhs-1.el6rhs.x86_64
glusterfs-libs-3.4.0.34rhs-1.el6rhs.x86_64
glusterfs-rdma-3.4.0.34rhs-1.el6rhs.x86_64

[root@m2 ~]# qemu-img create gluster://10.66.4.216/gv0/ssdfsd 10G
Formatting 'gluster://10.66.4.216/gv0/ssdfsd', fmt=raw size=10737418240 
[root@m2 ~]# qemu-img create -f qcow2 gluster://10.66.4.216/gv0/s12 10G
Formatting 'gluster://10.66.4.216/gv0/s12', fmt=qcow2 size=10737418240 encryption=off cluster_size=65536 
[root@m2 ~]# qemu-img create -f qcow2 gluster://10.66.4.216/gv0/s132ds2 10G
Formatting 'gluster://10.66.4.216/gv0/s132ds2', fmt=qcow2 size=10737418240 encryption=off cluster_size=65536 
[root@m2 ~]# qemu-img create -f qcow2 gluster://10.66.4.216/gv0/s132ds223ds 10G
Formatting 'gluster://10.66.4.216/gv0/s132ds223ds', fmt=qcow2 size=10737418240 encryption=off cluster_size=65536 
[root@m2 ~]# qemu-img create -f qcow2 gluster://10.66.4.216/gv0/s132ds223dssdaf 10G
Formatting 'gluster://10.66.4.216/gv0/s132ds223dssdaf', fmt=qcow2 size=10737418240 encryption=off cluster_size=65536

So move to glusterfs.

Comment 4 mazhang 2013-10-10 10:13:39 UTC
Since this problem was a regression, so QE suggest fix it in rhel6.5, thanks.

Comment 5 Qunfang Zhang 2013-10-13 13:59:36 UTC
Hi, Bala

Could you or some other guy help check the comment 4?  We found this issue during the qemu-kvm function test for glusterfs but this should be a glusterfs issue because after downgrade the glusterfs packages, can not reproduce it any more.

Thanks,
Qunfang

Comment 6 Anand Avati 2013-10-14 20:46:35 UTC
Can you please share the logs of all the gluster servers (/var/log/glusterfs directory)?

Comment 7 mazhang 2013-10-15 02:06:24 UTC
Created attachment 812294 [details]
log of glusterfs server

Comment 8 Bala.FA 2013-10-15 03:25:28 UTC
Assigning to Avati

Comment 9 Anand Avati 2013-10-15 04:55:17 UTC
Requested Maosheng for remote access over email, to the systems where issue is seen (as logs are indicating that the listen port could not be bound by glusterd)

Comment 10 Anand Avati 2013-10-15 08:44:30 UTC
Issue is already fixed in upstream glusterfs at http://review.glsuter.org/5903. Need to backport.

Comment 12 krishnan parthasarathi 2013-10-21 13:22:43 UTC
Patch posted upstream (and merged) - http://review.gluster.com/#/c/6092/

Comment 13 krishnan parthasarathi 2013-10-21 13:26:53 UTC
Patches posted for review on RHS-2.1 update stream (pending review still),

https://code.engineering.redhat.com/gerrit/#/c/14329/ - RHS-2.1 update 1
https://code.engineering.redhat.com/gerrit/#/c/14330/ - RHS-2.1 update 2

Comment 15 mazhang 2013-10-22 02:07:00 UTC
Hi krishnan,

Would you please provide the rpms in here, I'd like have a test to confirm that fix works.

thanks,
mazhang.

Comment 16 krishnan parthasarathi 2013-10-22 08:52:50 UTC
Hi mazhang.
Could you check if the following brew (scratch) build works for you?

https://brewweb.devel.redhat.com/taskinfo?taskID=6450082

thanks,
Krish

Comment 17 mazhang 2013-10-22 09:31:09 UTC
Hit this issue with the package comment#16 provided.

RHS:
[root@rhs ~]# rpm -qa |grep glusterfs
glusterfs-3.4rhsu1-1.el6rhs.x86_64
glusterfs-api-3.4rhsu1-1.el6rhs.x86_64
glusterfs-geo-replication-3.4rhsu1-1.el6rhs.x86_64
glusterfs-libs-3.4rhsu1-1.el6rhs.x86_64
glusterfs-server-3.4rhsu1-1.el6rhs.x86_64
glusterfs-rdma-3.4rhsu1-1.el6rhs.x86_64
glusterfs-fuse-3.4rhsu1-1.el6rhs.x86_64

HOST:
[root@m2 ~]# rpm -qa |grep qemu
qemu-guest-agent-0.12.1.2-2.414.el6.x86_64
gpxe-roms-qemu-0.9.7-6.10.el6.noarch
qemu-img-0.12.1.2-2.414.el6.x86_64
qemu-kvm-tools-0.12.1.2-2.414.el6.x86_64
qemu-kvm-0.12.1.2-2.414.el6.x86_64
[root@m2 ~]# rpm -qa |grep glusterfs
glusterfs-rdma-3.4.0.34rhs-1.el6.x86_64
glusterfs-3.4.0.34rhs-1.el6.x86_64
glusterfs-api-3.4.0.34rhs-1.el6.x86_64
glusterfs-libs-3.4.0.34rhs-1.el6.x86_64
glusterfs-fuse-3.4.0.34rhs-1.el6.x86_64

[root@m2 ~]# qemu-img create -f qcow2 gluster://10.66.4.216/gv0/ttt.qcow2 10G
Formatting 'gluster://10.66.4.216/gv0/ttt.qcow2', fmt=qcow2 size=10737418240 encryption=off cluster_size=65536 
[2013-10-22 09:25:56.246944] W [client-rpc-fops.c:928:client3_3_flush_cbk] 0-glusterfs: remote operation failed: Transport endpoint is not connected
[2013-10-22 09:25:56.246988] I [client.c:2103:client_rpc_notify] 0-gv0-client-0: disconnected from 10.66.4.216:49152. Client process will keep trying to connect to glusterd until brick's port is available. 
[root@m2 ~]# qemu-img create -f qcow2 gluster://10.66.4.216/gv0/ttt23.qcow2 10G
Formatting 'gluster://10.66.4.216/gv0/ttt23.qcow2', fmt=qcow2 size=10737418240 encryption=off cluster_size=65536 
[2013-10-22 09:26:15.164418] E [rpc-clnt.c:368:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x164) [0x7fd04caa80f4] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3) [0x7fd04caa7c33] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7fd04caa7b4e]))) 0-gv0-client-0: forced unwinding frame type(GlusterFS 3.3) op(FLUSH(15)) called at 2013-10-22 09:26:15.163896 (xid=0x16x)
[2013-10-22 09:26:15.164449] W [client-rpc-fops.c:928:client3_3_flush_cbk] 0-glusterfs: remote operation failed: Transport endpoint is not connected
[2013-10-22 09:26:15.164470] I [client.c:2103:client_rpc_notify] 0-gv0-client-0: disconnected from 10.66.4.216:49152. Client process will keep trying to connect to glusterd until brick's port is available.

Any problem please let me know.

Thanks,
Mazhang.

Comment 18 krishnan parthasarathi 2013-10-22 09:52:45 UTC
Hi mazhang,
Could you install the api rpms on HOST also from the brew link that I provided on comment#16?

Comment 19 mazhang 2013-10-22 10:05:55 UTC
Hi krishnan,

This problem will not happened after update host package.

[root@m2 glusterfs-3.4rhsu1-1.el6]# rpm -qa |grep gluster
glusterfs-api-3.4rhsu1-1.el6rhs.x86_64
glusterfs-rdma-3.4rhsu1-1.el6rhs.x86_64
glusterfs-3.4rhsu1-1.el6rhs.x86_64
glusterfs-libs-3.4rhsu1-1.el6rhs.x86_64
glusterfs-fuse-3.4rhsu1-1.el6rhs.x86_64

[root@m2 ~]# qemu-img create -f qcow2 gluster://10.66.4.216/gv0/ttt23.qcow2 10G
Formatting 'gluster://10.66.4.216/gv0/ttt23.qcow2', fmt=qcow2 size=10737418240 encryption=off cluster_size=65536 
[2013-10-22 10:03:06.888187] I [client.c:2103:client_rpc_notify] 0-gv0-client-0: disconnected from 10.66.4.216:49152. Client process will keep trying to connect to glusterd until brick's port is available. 
[2013-10-22 10:03:07.099450] I [client.c:2103:client_rpc_notify] 0-gv0-client-0: disconnected from 10.66.4.216:49152. Client process will keep trying to connect to glusterd until brick's port is available. 
[root@m2 ~]# qemu-img create -f qcow2 gluster://10.66.4.216/gv0/ttt23sdf.qcow2 10G
Formatting 'gluster://10.66.4.216/gv0/ttt23sdf.qcow2', fmt=qcow2 size=10737418240 encryption=off cluster_size=65536 
[root@m2 ~]# qemu-img create -f qcow2 gluster://10.66.4.216/gv0/ttt23sdf23.qcow2 10G
Formatting 'gluster://10.66.4.216/gv0/ttt23sdf23.qcow2', fmt=qcow2 size=10737418240 encryption=off cluster_size=65536 
[2013-10-22 10:03:15.883311] I [client.c:2103:client_rpc_notify] 0-gv0-client-0: disconnected from 10.66.4.216:49152. Client process will keep trying to connect to glusterd until brick's port is available. 

Thanks,
Mazhang.

Comment 21 mazhang 2013-10-23 04:18:23 UTC
one more question, the message seems like qemu-kvm or glusterfs run in debug mode, it's not good experience.
how do I disable display those messages, or is this expected?

Comment 22 mazhang 2013-10-23 05:53:31 UTC
verify this bug with glusterfs-3.4.0.36rhs-1.el6

rhs:
[root@rhs glusterfs-3.4.0.36rhs-1.el6rhs]# rpm -qa |grep glusterfs
glusterfs-fuse-3.4.0.36rhs-1.el6rhs.x86_64
glusterfs-libs-3.4.0.36rhs-1.el6rhs.x86_64
glusterfs-server-3.4.0.36rhs-1.el6rhs.x86_64
glusterfs-3.4.0.36rhs-1.el6rhs.x86_64
glusterfs-rdma-3.4.0.36rhs-1.el6rhs.x86_64
glusterfs-api-3.4.0.36rhs-1.el6rhs.x86_64
glusterfs-geo-replication-3.4.0.36rhs-1.el6rhs.x86_64

host:
[root@m2 ~]# rpm -qa |grep glusterfs
glusterfs-api-3.4.0.36rhs-1.el6.x86_64
glusterfs-3.4.0.36rhs-1.el6.x86_64
glusterfs-libs-3.4.0.36rhs-1.el6.x86_64
[root@m2 ~]# rpm -qa |grep qemu
qemu-guest-agent-0.12.1.2-2.414.el6.x86_64
gpxe-roms-qemu-0.9.7-6.10.el6.noarch
qemu-kvm-tools-0.12.1.2-2.412.el6.x86_64
qemu-kvm-0.12.1.2-2.412.el6.x86_64
qemu-img-0.12.1.2-2.412.el6.x86_64
qemu-kvm-debuginfo-0.12.1.2-2.412.el6.x86_64

[root@m2 ~]#  qemu-img create -f qcow2 gluster://10.66.4.216/gv0/ttt.qcow2 10G
Formatting 'gluster://10.66.4.216/gv0/ttt.qcow2', fmt=qcow2 size=10737418240 encryption=off cluster_size=65536 
[2013-10-23 05:46:49.102718] I [client.c:2103:client_rpc_notify] 0-gv0-client-0: disconnected from 10.66.4.216:49152. Client process will keep trying to connect to glusterd until brick's port is available. 
[2013-10-23 05:46:49.329308] I [client.c:2103:client_rpc_notify] 0-gv0-client-0: disconnected from 10.66.4.216:49152. Client process will keep trying to connect to glusterd until brick's port is available. 
[root@m2 ~]#  qemu-img create -f qcow2 gluster://10.66.4.216/gv0/t23.qcow2 10G
Formatting 'gluster://10.66.4.216/gv0/t23.qcow2', fmt=qcow2 size=10737418240 encryption=off cluster_size=65536 
[2013-10-23 05:46:55.146245] I [client.c:2103:client_rpc_notify] 0-gv0-client-0: disconnected from 10.66.4.216:49152. Client process will keep trying to connect to glusterd until brick's port is available. 
[root@m2 ~]#  qemu-img create -f qcow2 gluster://10.66.4.216/gv0/t23sad.qcow2 10G
Formatting 'gluster://10.66.4.216/gv0/t23sad.qcow2', fmt=qcow2 size=10737418240 encryption=off cluster_size=65536 
[2013-10-23 05:46:58.763283] I [client.c:2103:client_rpc_notify] 0-gv0-client-0: disconnected from 10.66.4.216:49152. Client process will keep trying to connect to glusterd until brick's port is available. 

1. after update package error message disappearance, so this bug has been fixed.
2. about the problem mentioned in comment#21, as shanks's suggestion, I'll file a new bug let developer confirm if it was a bug.

Comment 23 mazhang 2013-10-23 08:17:31 UTC
Re-test the problem in comment#21

1. The debug messages just displayed while creating qcow2 images.
2. Sometimes will not got the messages, about 25 times got one in my test.

so it is a timing issue.

Comment 24 Gowrishankar Rajaiyan 2013-10-23 08:24:10 UTC
Marking as verified as per above comments. Thank you mazhang. 
We will file another bug for the timing issue.

Comment 25 errata-xmlrpc 2013-11-21 12:01:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1641.html