Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1088817

Summary: qemu-img segfault while create a large number of images w/ gluster backend
Product: Red Hat Enterprise Linux 6 Reporter: mazhang <mazhang>
Component: glusterfsAssignee: Bug Updates Notification Mailing List <rhs-bugs>
Status: CLOSED WONTFIX QA Contact: storage-qa-internal <storage-qa-internal>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.5CC: jcody, juzhang, michen, mkenneth, qzhang, rbalakri, rpacheco, virt-maint
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-12-06 11:12:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
dmesg none

Description mazhang 2014-04-17 08:54:06 UTC
Description of problem:
qemu-img segfault while create a large number of images

Version-Release number of selected component (if applicable):

Host:
amd-9600b-8-1.englab.nay.redhat.com
qemu-kvm-debuginfo-0.12.1.2-2.424.el6.x86_64
qemu-img-0.12.1.2-2.424.el6.x86_64
qemu-kvm-tools-0.12.1.2-2.424.el6.x86_64
gpxe-roms-qemu-0.9.7-6.10.el6.noarch
qemu-kvm-0.12.1.2-2.424.el6.x86_64
kernel-2.6.32-431.17.1.el6.x86_64
glusterfs-api-3.4.0.59rhs-1.el6.x86_64
glusterfs-libs-3.4.0.59rhs-1.el6.x86_64
glusterfs-fuse-3.4.0.59rhs-1.el6.x86_64
glusterfs-3.4.0.59rhs-1.el6.x86_64

Gluster Server:
glusterfs-fuse-3.4.0.59rhs-1.el6rhs.x86_64
glusterfs-debuginfo-3.4.0.59rhs-1.el6rhs.x86_64
glusterfs-libs-3.4.0.59rhs-1.el6rhs.x86_64
glusterfs-api-3.4.0.59rhs-1.el6rhs.x86_64
glusterfs-geo-replication-3.4.0.59rhs-1.el6rhs.x86_64
glusterfs-api-devel-3.4.0.59rhs-1.el6rhs.x86_64
glusterfs-3.4.0.59rhs-1.el6rhs.x86_64
glusterfs-server-3.4.0.59rhs-1.el6rhs.x86_64

How reproducible:
100%


Steps to Reproduce:
1.Use below script create qcow2 image by gluster.
#!/bin/bash

COUNT=0
while [ $COUNT -lt 1000 ] 
do
        qemu-img create -f qcow2 gluster://rhs/gv0/test-$COUNT 2M &
        COUNT=$((1+$COUNT))
done

2.
3.

Actual results:
1. I/O error.

[2014-04-17 16:40:40.666506] I [client.c:2103:client_rpc_notify] 0-gv0-client-0: disconnected from 10.66.4.217:49153. Client process will keep trying to connect to glusterd until brick's port is available. 
[2014-04-17 16:40:40.666752] I [client.c:2103:client_rpc_notify] 0-gv0-client-0: disconnected from 10.66.4.217:49153. Client process will keep trying to connect to glusterd until brick's port is available. 
gluster://10.66.4.217/gv0/test-700: error while creating qcow2: Connection reset by peer
gluster://10.66.4.217/gv0/test-655: error while creating qcow2: Input/output error
gluster://10.66.4.217/gv0/test-783: error while creating qcow2: Input/output error
gluster://10.66.4.217/gv0/test-538: error while creating qcow2: Input/output error
gluster://10.66.4.217/gv0/test-809: error while creating qcow2: No such file or directory
[2014-04-17 16:40:42.178949] I [client.c:2103:client_rpc_notify] 0-gv0-client-0: disconnected from 10.66.4.217:49153. Client process will keep trying to connect to glusterd until brick's port is available. 


2. qemu-img segfault

[root@amd-9600b-8-1 ~]# dmesg 
qemu-img[2086]: segfault at 20 ip 00007f408b3ac0e6 sp 00007fffc6ca9510 error 4 in libgfapi.so.0.0.0[7f408b3a0000+16000]
qemu-img[2202]: segfault at 20 ip 00007f99dc4500e6 sp 00007fffaa6bcfd0 error 4 in libgfapi.so.0.0.0[7f99dc444000+16000]
qemu-img[3178]: segfault at 20 ip 00007f5af33ac0e6 sp 00007fffd266fb50 error 4 in libgfapi.so.0.0.0[7f5af33a0000+16000]
INFO: task qemu-img:2130 blocked for more than 120 seconds.
      Not tainted 2.6.32-431.17.1.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
qemu-img      D 0000000000000002     0  2130      1 0x00000080
 ffff88022938fc98 0000000000000082 0000000000000000 ffff88022938fc00
 ffff88022938fc68 ffffffff810aefd0 0000000000000202 0000000000000000
 ffff88022938bab8 ffff88022938ffd8 000000000000fbc8 ffff88022938bab8
Call Trace:
 [<ffffffff810aefd0>] ? exit_robust_list+0x90/0x160
 [<ffffffff81076745>] exit_mm+0x95/0x180
 [<ffffffff81076b8f>] do_exit+0x15f/0x870
 [<ffffffff810772f8>] do_group_exit+0x58/0xd0
 [<ffffffff8108cca6>] get_signal_to_deliver+0x1f6/0x460
 [<ffffffff8100a265>] do_signal+0x75/0x800
 [<ffffffff810b186b>] ? sys_futex+0x7b/0x170
 [<ffffffff8100aa80>] do_notify_resume+0x90/0xc0
 [<ffffffff8100b341>] int_signal+0x12/0x17
INFO: task qemu-img:2134 blocked for more than 120 seconds.
      Not tainted 2.6.32-431.17.1.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.


Expected results:
qemu-img works well.

Additional info:

Comment 1 mazhang 2014-04-17 08:54:53 UTC
Created attachment 887104 [details]
dmesg

Comment 2 mazhang 2014-04-17 08:57:07 UTC
Already configure the gluster volume to allow connections from unprivileged ports.  From the gluster server, by run:
# gluster volume set gv0 server.allow-insecure on

gluster> volume info
 
Volume Name: gv0
Type: Distribute
Volume ID: 6e6a0709-0dc0-4500-ae55-81bc062c0d6c
Status: Started
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: 10.66.4.217:/home/brick
Options Reconfigured:
server.allow-insecure: on

Comment 4 Ademar Reis 2014-04-18 15:19:51 UTC
(In reply to mazhang from comment #0)
> 2. qemu-img segfault
> 

Do you have the backtrace (from the coredump) of this segfault? The call trace below is from the kernel.

> [root@amd-9600b-8-1 ~]# dmesg 
> qemu-img[2086]: segfault at 20 ip 00007f408b3ac0e6 sp 00007fffc6ca9510 error
> 4 in libgfapi.so.0.0.0[7f408b3a0000+16000]
> qemu-img[2202]: segfault at 20 ip 00007f99dc4500e6 sp 00007fffaa6bcfd0 error
> 4 in libgfapi.so.0.0.0[7f99dc444000+16000]
> qemu-img[3178]: segfault at 20 ip 00007f5af33ac0e6 sp 00007fffd266fb50 error
> 4 in libgfapi.so.0.0.0[7f5af33a0000+16000]
> INFO: task qemu-img:2130 blocked for more than 120 seconds.
>       Not tainted 2.6.32-431.17.1.el6.x86_64 #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> qemu-img      D 0000000000000002     0  2130      1 0x00000080
>  ffff88022938fc98 0000000000000082 0000000000000000 ffff88022938fc00
>  ffff88022938fc68 ffffffff810aefd0 0000000000000202 0000000000000000
>  ffff88022938bab8 ffff88022938ffd8 000000000000fbc8 ffff88022938bab8
> Call Trace:
>  [<ffffffff810aefd0>] ? exit_robust_list+0x90/0x160
>  [<ffffffff81076745>] exit_mm+0x95/0x180
>  [<ffffffff81076b8f>] do_exit+0x15f/0x870
>  [<ffffffff810772f8>] do_group_exit+0x58/0xd0
>  [<ffffffff8108cca6>] get_signal_to_deliver+0x1f6/0x460
>  [<ffffffff8100a265>] do_signal+0x75/0x800
>  [<ffffffff810b186b>] ? sys_futex+0x7b/0x170
>  [<ffffffff8100aa80>] do_notify_resume+0x90/0xc0
>  [<ffffffff8100b341>] int_signal+0x12/0x17
> INFO: task qemu-img:2134 blocked for more than 120 seconds.
>       Not tainted 2.6.32-431.17.1.el6.x86_64 #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>

Comment 5 mazhang 2014-04-22 01:51:01 UTC
(gdb) bt full
#0  0x00007f1b64a620e6 in glfs_lseek () from /usr/lib64/libgfapi.so.0
No symbol table info available.
#1  0x00007f1b65e21568 in qemu_gluster_getlength (bs=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/block/gluster.c:514
        s = <value optimized out>
        ret = <value optimized out>
#2  0x00007f1b65df489f in refresh_total_sectors (bs=<value optimized out>, hint=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/block.c:532
        length = <value optimized out>
        drv = <value optimized out>
#3  0x00007f1b65dfac5f in bdrv_open_common (bs=0x7f1b671f8af0, filename=0x7fff11738776 "gluster://10.66.65.117/gv1/test-718", flags=<value optimized out>, drv=0x7f1b66053200)
    at /usr/src/debug/qemu-kvm-0.12.1.2/block.c:626
        ret = 0
        open_flags = 2
        __PRETTY_FUNCTION__ = "bdrv_open_common"
#4  0x00007f1b65dfadeb in bdrv_file_open (pbs=0x7fff11736c80, filename=0x7fff11738776 "gluster://10.66.65.117/gv1/test-718", flags=2) at /usr/src/debug/qemu-kvm-0.12.1.2/block.c:664
        bs = 0x7f1b671f8af0
        drv = 0x7f1b66053200
        ret = <value optimized out>
#5  0x00007f1b65e1492a in qcow2_create2 (filename=0x7fff11738776 "gluster://10.66.65.117/gv1/test-718", total_size=4096, backing_file=0x0, backing_format=0x0, flags=0, 
    cluster_size=<value optimized out>, prealloc=0, options=0x7f1b671cb570) at /usr/src/debug/qemu-kvm-0.12.1.2/block/qcow2.c:1072
        cluster_bits = 16
        bs = <value optimized out>
        header = {magic = 0, version = 0, backing_file_offset = 219043332111, backing_file_size = 91, cluster_bits = 124, size = 472446402679, crypt_method = 1709013400, l1_size = 
    32539, l1_table_offset = 224, refcount_table_offset = 224, refcount_table_clusters = 1682046592, nb_snapshots = 32539, snapshots_offset = 140733486172022}
        refcount_table = <value optimized out>
        ret = <value optimized out>
        drv = <value optimized out>
        __PRETTY_FUNCTION__ = "qcow2_create2"
#6  0x00007f1b65e14e5f in qcow2_create (filename=0x7fff11738776 "gluster://10.66.65.117/gv1/test-718", options=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.12.1.2/block/qcow2.c:1214
        backing_file = <value optimized out>
        backing_fmt = <value optimized out>
        sectors = <value optimized out>
        flags = <value optimized out>
        cluster_size = <value optimized out>
        prealloc = <value optimized out>
#7  0x00007f1b65dfb50d in bdrv_img_create (filename=0x7fff11738776 "gluster://10.66.65.117/gv1/test-718", fmt=0x7fff11738770 "qcow2", base_filename=<value optimized out>, base_fmt=
    0x0, options=<value optimized out>, img_size=2097152, flags=64, errp=0x7fff11737018) at /usr/src/debug/qemu-kvm-0.12.1.2/block.c:4678
        param = 0x7f1b671cb4b0
        create_options = 0x7f1b671cb3a0
        backing_fmt = <value optimized out>
        backing_file = <value optimized out>
        bs = 0x0
        drv = 0x7f1b66051740
        proto_drv = <value optimized out>
        backing_drv = 0x0
        ret = <value optimized out>
#8  0x00007f1b65dec365 in img_create (argc=<value optimized out>, argv=0x7fff11737140) at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-img.c:390
        c = <value optimized out>
---Type <return> to continue, or q <return> to quit---
        img_size = <value optimized out>
        fmt = 0x7fff11738770 "qcow2"
        base_fmt = 0x0
        filename = 0x7fff11738776 "gluster://10.66.65.117/gv1/test-718"
        base_filename = 0x0
        options = 0x0
        local_err = 0x0
#9  0x00007f1b640aed1d in __libc_start_main () from /lib64/libc.so.6
No symbol table info available.
#10 0x00007f1b65deb659 in _start ()
No symbol table info available.

Comment 6 Jeff Cody 2014-06-11 17:44:14 UTC
I believe this is a libglusterfs issue, rather than a QEMU issue.  Moving to gluster to investigate.

Comment 9 Jan Kurik 2017-12-06 11:12:27 UTC
Red Hat Enterprise Linux 6 is in the Production 3 Phase. During the Production 3 Phase, Critical impact Security Advisories (RHSAs) and selected Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available.

The official life cycle policy can be reviewed here:

http://redhat.com/rhel/lifecycle

This issue does not meet the inclusion criteria for the Production 3 Phase and will be marked as CLOSED/WONTFIX. If this remains a critical requirement, please contact Red Hat Customer Support to request a re-evaluation of the issue, citing a clear business justification. Note that a strong business justification will be required for re-evaluation. Red Hat Customer Support can be contacted via the Red Hat Customer Portal at the following URL:

https://access.redhat.com/