| Summary: | qemu-img segfaults while creating qcow2 image on the gluster volume using libgfapi | ||
|---|---|---|---|
| Product: | Red Hat Gluster Storage | Reporter: | SATHEESARAN <sasundar> |
| Component: | libgfapi | Assignee: | rjoseph |
| Status: | CLOSED ERRATA | QA Contact: | SATHEESARAN <sasundar> |
| Severity: | urgent | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | rhgs-3.2 | CC: | amukherj, jcody, ndevos, rhinduja, rhs-bugs, rjoseph, sasundar, skoduri, storage-qa-internal |
| Target Milestone: | --- | Keywords: | TestBlocker |
| Target Release: | RHGS 3.2.0 | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | glusterfs-3.8.4-5 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: |
virt-gluster integration
|
|
| Last Closed: | 2017-03-23 05:49:10 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Bug Depends On: | |||
| Bug Blocks: | 1351528 | ||
|
Description
SATHEESARAN
2016-09-26 07:19:12 UTC
I could see this issue with, 1. replica 3 volume 2. arbiter volume 3. distribute volume Looks like the issue is seen with libgfapi, and not on the volume type Seeing the following bt. Backtrace of the thread ( that segfaults )doesn't have enough information
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f3c95fb6700 (LWP 3324)]
0x00007f3c963f26f3 in ?? ()
(gdb) bt
#0 0x00007f3c963f26f3 in ?? ()
#1 0x0000000000000000 in ?? ()
(gdb) thread apply all bt
Thread 11 (Thread 0x7f3ca2890700 (LWP 3317)):
#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1 0x00007f3ca5fd69d8 in syncenv_task (proc=proc@entry=0x7f3caeba4040) at syncop.c:603
#2 0x00007f3ca5fd7820 in syncenv_processor (thdata=0x7f3caeba4040) at syncop.c:695
#3 0x00007f3ca8cbddc5 in start_thread (arg=0x7f3ca2890700) at pthread_create.c:308
#4 0x00007f3ca89eaced in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
Thread 10 (Thread 0x7f3ca208f700 (LWP 3318)):
#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1 0x00007f3ca5fd69d8 in syncenv_task (proc=proc@entry=0x7f3caeba4400) at syncop.c:603
#2 0x00007f3ca5fd7820 in syncenv_processor (thdata=0x7f3caeba4400) at syncop.c:695
#3 0x00007f3ca8cbddc5 in start_thread (arg=0x7f3ca208f700) at pthread_create.c:308
#4 0x00007f3ca89eaced in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
Thread 9 (Thread 0x7f3c95fb6700 (LWP 3324)):
#0 0x00007f3c963f26f3 in ?? ()
#1 0x0000000000000000 in ?? ()
Thread 8 (Thread 0x7f3c99765700 (LWP 3326)):
#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1 0x00007f3ca5fd69d8 in syncenv_task (proc=proc@entry=0x7f3cb0d24040) at syncop.c:603
#2 0x00007f3ca5fd7820 in syncenv_processor (thdata=0x7f3cb0d24040) at syncop.c:695
#3 0x00007f3ca8cbddc5 in start_thread (arg=0x7f3c99765700) at pthread_create.c:308
#4 0x00007f3ca89eaced in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
Thread 7 (Thread 0x7f3c98560700 (LWP 3327)):
#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1 0x00007f3ca5fd69d8 in syncenv_task (proc=proc@entry=0x7f3cb0d24400) at syncop.c:603
#2 0x00007f3ca5fd7820 in syncenv_processor (thdata=0x7f3cb0d24400) at syncop.c:695
#3 0x00007f3ca8cbddc5 in start_thread (arg=0x7f3c98560700) at pthread_create.c:308
#4 0x00007f3ca89eaced in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
Thread 6 (Thread 0x7f3c95eb5700 (LWP 3328)):
#0 0x00007f3ca8cc496d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1 0x00007f3ca5fab816 in gf_timer_proc (data=0x7f3caea26640) at timer.c:176
#2 0x00007f3ca8cbddc5 in start_thread (arg=0x7f3c95eb5700) at pthread_create.c:308
#3 0x00007f3ca89eaced in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
Thread 5 (Thread 0x7f3c97934700 (LWP 3329)):
#0 0x00007f3ca8cbeef7 in pthread_join (threadid=139898209384192, thread_return=thread_return@entry=0x0) at pthread_join.c:92
#1 0x00007f3ca5ff83b8 in event_dispatch_epoll (event_pool=0x7f3cb0d2e040) at event-epoll.c:758
#2 0x00007f3cab349c64 in glfs_poller (data=<optimized out>) at glfs.c:612
#3 0x00007f3ca8cbddc5 in start_thread (arg=0x7f3c97934700) at pthread_create.c:308
#4 0x00007f3ca89eaced in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
Thread 4 (Thread 0x7f3c97133700 (LWP 3330)):
#0 0x00007f3ca89eb2c3 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81
#1 0x00007f3ca5ff7e10 in event_dispatch_epoll_worker (data=0x7f3caea0ba20) at event-epoll.c:664
---Type <return> to continue, or q <return> to quit---
#2 0x00007f3ca8cbddc5 in start_thread (arg=0x7f3c97133700) at pthread_create.c:308
#3 0x00007f3ca89eaced in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
Thread 3 (Thread 0x7f3c9442c700 (LWP 3331)):
#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1 0x00007f3c948686f3 in iot_worker (data=0x7f3cb0382200) at io-threads.c:176
#2 0x00007f3ca8cbddc5 in start_thread (arg=0x7f3c9442c700) at pthread_create.c:308
#3 0x00007f3ca89eaced in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
Thread 2 (Thread 0x7f3c9432b700 (LWP 3332)):
#0 0x00007f3ca89eb2c3 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81
#1 0x00007f3ca5ff7e10 in event_dispatch_epoll_worker (data=0x7f3cb0f363a0) at event-epoll.c:664
#2 0x00007f3ca8cbddc5 in start_thread (arg=0x7f3c9432b700) at pthread_create.c:308
#3 0x00007f3ca89eaced in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
Thread 1 (Thread 0x7f3cac0058c0 (LWP 3316)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x00007f3ca5fd866b in syncop_lookup (subvol=subvol@entry=0x7f3cb3404c40, loc=loc@entry=0x7f3caeb41a20, iatt=iatt@entry=0x7f3caeb41b50, parent=parent@entry=0x0,
xdata_in=xdata_in@entry=0x0, xdata_out=xdata_out@entry=0x0) at syncop.c:1223
#2 0x00007f3cab35b88f in glfs_resolve_base (fs=fs@entry=0x7f3caeb46000, subvol=subvol@entry=0x7f3cb3404c40, inode=inode@entry=0x7f3cb18ec05c, iatt=iatt@entry=0x7f3caeb41b50)
at glfs-resolve.c:225
#3 0x00007f3cab35c09a in priv_glfs_resolve_at (fs=0x7f3caeb46000, subvol=0x7f3cb3404c40, at=at@entry=0x0, origpath=origpath@entry=0x7f3cab361c4e "/", loc=loc@entry=0x7f3caeb41ca0,
iatt=iatt@entry=0x7f3caeb41ce0, follow=follow@entry=1, reval=reval@entry=0) at glfs-resolve.c:404
#4 0x00007f3cab35d63c in glfs_resolve_path (fs=fs@entry=0x7f3caeb46000, subvol=subvol@entry=0x7f3cb3404c40, origpath=origpath@entry=0x7f3cab361c4e "/", loc=loc@entry=0x7f3caeb41ca0,
iatt=iatt@entry=0x7f3caeb41ce0, follow=follow@entry=1, reval=reval@entry=0) at glfs-resolve.c:530
#5 0x00007f3cab35d6d3 in priv_glfs_resolve (fs=fs@entry=0x7f3caeb46000, subvol=subvol@entry=0x7f3cb3404c40, origpath=origpath@entry=0x7f3cab361c4e "/", loc=loc@entry=0x7f3caeb41ca0,
iatt=iatt@entry=0x7f3caeb41ce0, reval=reval@entry=0) at glfs-resolve.c:557
#6 0x00007f3cab359df9 in pub_glfs_chdir (fs=fs@entry=0x7f3caeb46000, path=path@entry=0x7f3cab361c4e "/") at glfs-fops.c:3971
#7 0x00007f3cab34b144 in pub_glfs_init (fs=fs@entry=0x7f3caeb46000) at glfs.c:1003
#8 0x00007f3cac0477b3 in qemu_gluster_init (gconf=gconf@entry=0x7f3cae9fa2d0, filename=<optimized out>) at block/gluster.c:219
#9 0x00007f3cac047a03 in qemu_gluster_open (bs=<optimized out>, options=0x7f3cb0d29200, bdrv_flags=66, errp=<optimized out>) at block/gluster.c:341
#10 0x00007f3cac03d0b0 in bdrv_open_common (bs=bs@entry=0x7f3cb0406000, file=file@entry=0x0, options=options@entry=0x7f3cb0d29200, flags=flags@entry=2,
drv=drv@entry=0x7f3cac2dce80 <bdrv_gluster>, errp=0x7f3caeb41ea0) at block.c:836
#11 0x00007f3cac042194 in bdrv_file_open (pbs=pbs@entry=0x7f3caeb41f38, filename=filename@entry=0x7f3cae9fa030 "gluster://10.70.37.104/distvol/test3.img", options=0x7f3cb0d29200,
options@entry=0x0, flags=flags@entry=2, errp=errp@entry=0x7f3caeb41f40) at block.c:972
#12 0x00007f3cac057850 in qcow2_create2 (errp=0x7f3caeb41f30, version=3, prealloc=<optimized out>, cluster_size=65536, flags=0, backing_format=0x0, backing_file=0x0,
total_size=2097152, filename=0x7f3cae9fa030 "gluster://10.70.37.104/distvol/test3.img") at block/qcow2.c:1677
#13 qcow2_create (filename=0x7f3cae9fa030 "gluster://10.70.37.104/distvol/test3.img", options=<optimized out>, errp=0x7f3caeb41f90) at block/qcow2.c:1856
#14 0x00007f3cac03bbd9 in bdrv_create_co_entry (opaque=0x7fff3e460170) at block.c:393
#15 0x00007f3cac077a1a in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at coroutine-ucontext.c:118
#16 0x00007f3ca893b110 in ?? () from /lib64/libc.so.6
#17 0x00007fff3e45f9e0 in ?? ()
#18 0x0000000000000000 in ?? ()
To add to that information, this issue is not seen while creating raw image on gluster volume of any type You mentioned a replica-3 volume; does this occur with a replica-2 volume?
I tried reproducing this with gluster 3.8.4, and a local build of qemu-img-1.5.3-105.el7, and I did not run into this issue. However, my test gluster volume is as follows:
gluster volume info gv0
Volume Name: gv0
Type: Replicate
Volume ID: 6bcb7964-0594-4801-a60b-22dae7f871f6
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 192.168.15.180:/mnt/brick1/brick
Brick2: 192.168.15.180:/mnt/brick2/brick
Options Reconfigured:
performance.readdir-ahead: on
Status of volume: gv0
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 192.168.15.180:/mnt/brick1/brick 49157 0 Y 7929
Brick 192.168.15.180:/mnt/brick2/brick 49158 0 Y 7930
NFS Server on localhost N/A N/A N N/A
Self-heal Daemon on localhost N/A N/A Y 7916
Task Status of Volume gv0
------------------------------------------------------------------------------
There are no active volume tasks
Creating the image:
qemu-img create -f qcow2 gluster://192.168.15.180/gv0/test-bz.qcow2 5G
Formatting 'gluster://192.168.15.180/gv0/test-bz.qcow2', fmt=qcow2 size=5368709120 encryption=off cluster_size=65536 lazy_refcounts=off
[2016-09-28 03:29:42.693494] E [MSGID: 108006] [afr-common.c:4316:afr_notify] 0-gv0-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2016-09-28 03:29:43.812529] E [MSGID: 108006] [afr-common.c:4316:afr_notify] 0-gv0-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2016-09-28 03:29:44.706092] E [MSGID: 108006] [afr-common.c:4316:afr_notify] 0-gv0-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up
Verifying the image:
qemu-img info gluster://192.168.15.180/gv0/test-bz.qcow2
[2016-09-28 03:30:19.361722] E [MSGID: 108006] [afr-common.c:4316:afr_notify] 0-gv0-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
image: gluster://192.168.15.180/gv0/test-bz.qcow2
file format: qcow2
virtual size: 5.0G (5368709120 bytes)
disk size: 193K
cluster_size: 65536
Format specific information:
compat: 1.1
lazy refcounts: false
(In reply to Jeff Cody from comment #7) > You mentioned a replica-3 volume; does this occur with a replica-2 volume? > > I tried reproducing this with gluster 3.8.4, and a local build of > qemu-img-1.5.3-105.el7, and I did not run into this issue. However, my test > gluster volume is as follows: > I tried with replica 2 volume as well. I am hitting the same issue. @Jeff, are you using the upstream gluster 3.8.4 ? I am talking about the interim RHGS downstream build - glusterfs-3.8.4-1.el7rhgs on server, and glusterfs-3.8.4-1.el7 on client (In reply to SATHEESARAN from comment #9) > (In reply to Jeff Cody from comment #7) > > You mentioned a replica-3 volume; does this occur with a replica-2 volume? > > > > I tried reproducing this with gluster 3.8.4, and a local build of > > qemu-img-1.5.3-105.el7, and I did not run into this issue. However, my test > > gluster volume is as follows: > > > I tried with replica 2 volume as well. > I am hitting the same issue. > > @Jeff, are you using the upstream gluster 3.8.4 ? > I am talking about the interim RHGS downstream build - > glusterfs-3.8.4-1.el7rhgs on server, and glusterfs-3.8.4-1.el7 on client Yes, I was using the upstream gluster 3.8.4. I will retest with glusterfs-3.8.4-1.el7 and glusterfs-3.8.4-1.el7rhgs. This reminds me of bug 1350789, which should have been fixed with glusterfs-3.8.1 (and hence in the RHGS-3.2 packages). I am not aware of any backports that could have re-introduced this though. The easiest might be to reproduce the problem on a volume that consists out of a single brick. If someone has a system available where this happens, please let us know here so that we can debug it a little quicker (make sure that all needed -debuginfo RPMs are installed too). Possibly also reported on the gluster-devel mailinglist, with suggestion of the backported patch that causes the problem: - http://www.gluster.org/pipermail/gluster-devel/2016-October/051234.html I am not sure if http://review.gluster.org/15585 (in upstream glusterfs-3.8.5) was backported to glusterfs-3.8.4 in RHGS-3.2. This looks like an issue with client-io-threads enabled (bug1381830). I see iot_worker threads - Thread 3 (Thread 0x7f3c9442c700 (LWP 3331)): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238 #1 0x00007f3c948686f3 in iot_worker (data=0x7f3cb0382200) at io-threads.c:176 #2 0x00007f3ca8cbddc5 in start_thread (arg=0x7f3c9442c700) at pthread_create.c:308 #3 0x00007f3ca89eaced in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 Its being discussed in gluster ML as well. I think you need to retest with client-io-threads disabled once. (In reply to Soumya Koduri from comment #14) > This looks like an issue with client-io-threads enabled (bug1381830). I see > iot_worker threads - > > Thread 3 (Thread 0x7f3c9442c700 (LWP 3331)): > #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at > ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238 > #1 0x00007f3c948686f3 in iot_worker (data=0x7f3cb0382200) at > io-threads.c:176 > #2 0x00007f3ca8cbddc5 in start_thread (arg=0x7f3c9442c700) at > pthread_create.c:308 > #3 0x00007f3ca89eaced in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 > > > Its being discussed in gluster ML as well. I think you need to retest with > client-io-threads disabled once. Sas - given this crash is been already addressed and fixed with latest build (glusterfs-3.8.4-3) can we retest this behaviour (with out disabling client.io-threads of course) and close the bug if the issue doesn't persist? (In reply to Atin Mukherjee from comment #15) > (In reply to Soumya Koduri from comment #14) > > This looks like an issue with client-io-threads enabled (bug1381830). I see > > iot_worker threads - > > > > Thread 3 (Thread 0x7f3c9442c700 (LWP 3331)): > > #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at > > ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238 > > #1 0x00007f3c948686f3 in iot_worker (data=0x7f3cb0382200) at > > io-threads.c:176 > > #2 0x00007f3ca8cbddc5 in start_thread (arg=0x7f3c9442c700) at > > pthread_create.c:308 > > #3 0x00007f3ca89eaced in clone () at > > ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 > > > > > > Its being discussed in gluster ML as well. I think you need to retest with > > client-io-threads disabled once. > > Sas - given this crash is been already addressed and fixed with latest build > (glusterfs-3.8.4-3) can we retest this behaviour (with out disabling > client.io-threads of course) and close the bug if the issue doesn't persist? Atin, client-io-threads is enabled by default. I have tested with and without client-io-threads, and I still see the same issue. I have tested with the latest glusterfs downstream interim build - glusterfs-3.8.4-3.el7rhgs and I still see the same issue All, I tested with the latest downstream RHGS 3.2.0 interim build - glusterfs-3.8.4-5.el7rhgs. I am no longer seeing this issue. Please provide the patch URL for the fix and move this bug to ON_QA with the proper fixed-in-version Fix for BZ1391093 also fixes this issue. Following is the corresponding downstream patch: https://code.engineering.redhat.com/gerrit/#/c/89229/ Therefore moving the bug to ON_QA. Tested with RHGS 3.2.0 interim build ( glusterfs-3.8.4-5.el7rhgs ) 1. Created qcow2 image on the replica 3 gluster volume # qemu-img create gluster://<server>/<vol-name>/vm.img 10G I could successfully create qcow2 images # qemu-img check testvm.img No errors were found on the image. Image end offset: 262144 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html |