Bug 1425293
Summary: | qemu_gluster_co_get_block_status gets SIGABRT when doing blockcommit continually | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Han Han <hhan> | ||||||
Component: | glusterfs | Assignee: | Niels de Vos <ndevos> | ||||||
Status: | CLOSED WONTFIX | QA Contact: | SATHEESARAN <sasundar> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 7.4 | CC: | aliang, chayang, chhu, coli, dyuan, hhan, juzhang, knoel, lmiksik, ndevos, qzhang, rcyriac, sasundar, virt-maint, xuzhang, yanqzhan | ||||||
Target Milestone: | rc | Keywords: | Regression | ||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Linux | ||||||||
URL: | http://lists.nongnu.org/archive/html/qemu-block/2017-05/msg00905.html | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | |||||||||
: | 1425296 1454558 (view as bug list) | Environment: | |||||||
Last Closed: | 2021-01-15 07:32:04 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | 1454558 | ||||||||
Bug Blocks: | 1425296 | ||||||||
Attachments: |
|
Description
Han Han
2017-02-21 06:11:03 UTC
I did't encounter the bug when testing with libvirt-2.5.0-1.el7 and qemu-kvm-rhev-2.6.0-29.el7 one month ago. Moreover, it blocks a case of libvirt. Marked as regression and testblocker. From the backtrace, this looks to be aborting when trying to do a lseek via the gluster api: From qemu block/gluster.c: 1270 /* 1271 * SEEK_DATA cases: 1272 * D1. offs == start: start is in data 1273 * D2. offs > start: start is in a hole, next data at offs 1274 * D3. offs < 0, errno = ENXIO: either start is in a trailing hole 1275 * or start is beyond EOF 1276 * If the latter happens, the file has been truncated behind 1277 * our back since we opened it. All bets are off then. 1278 * Treating like a trailing hole is simplest. 1279 * D4. offs < 0, errno != ENXIO: we learned nothing 1280 */ 1281 offs = glfs_lseek(s->fd, start, SEEK_DATA); 1282 if (offs < 0) { 1283 return -errno; /* D3 or D4 */ 1284 } 1285 assert(offs >= start); 1286 1287 if (offs > start) { 1288 /* D2: in hole, next data at offs */ 1289 *hole = start; 1290 *data = offs; 1291 return 0; 1292 } Gluster indicated an error attempting the seek, from the logs: [2017-02-21 06:00:15.089021] W [MSGID: 114031] [client-rpc-fops.c:2211:client3_3_seek_cbk] 0-gluster-vol1-client-0: remote operation failed [No such device or address] A failure for glfs_lseek() should mean a value of -1 is returned, with errno set appropriately. But if for some reason glfs_lseek() silently failed (returning the last offset, or 0), this could be the path that triggered the assert. The fact that there was a seek failure right before the assertion failure leads me to believe there is indeed a path somehow in the gluster library that returns a bogus value on glfs_lseek() failure. Assuming that glfs_lseek() mimics Linux lseek behavior, there should be no way the assert should happen, as the only two value ranges that can be returned are offs>=start (success), or -1 (failure): " LSEEK(2) [...] SEEK_DATA Adjust the file offset to the next location in the file greater than or equal to offset containing data. If offset points to data, then the file offset is set to offset. [...] RETURN VALUE Upon successful completion, lseek() returns the resulting offset location as measured in bytes from the beginning of the file. On error, the value (off_t) -1 is returned and errno is set to indicate the error." Based on the above, re-assigning to the gluster team. Please provide the output of 'gluster volume info gluster-vol1' and if possible the logs from the bricks at the time of the problem (note that time is in UTC in Gluster logs). If we know how the volume is configured, we may be able to reproduce this. Hi, Niels, here is my configuration to setup glusterfs server: # mkdir -p /br1 # cat /etc/glusterfs/glusterd.vol volume management type mgmt/glusterd option working-directory /var/lib/glusterd option transport-type socket,rdma option transport.socket.keepalive-time 10 option transport.socket.keepalive-interval 2 option transport.socket.read-fail-log off option ping-timeout 0 option event-threads 1 #manual added "rpc-auth-allow-insecure" option rpc-auth-allow-insecure on # option transport.address-family inet6 # option base-port 49152 end-volume # service glusterd restart # gluster volume create gluster-vol1 xx.xx.xx.xx:/br1 # gluster volume set gluster-vol1 server.allow-insecure on # gluster volume start gluster-vol1 # gluster volume set gluster-vol1 nfs.disable on The gluster volume info # gluster volume info gluster-vol1 Volume Name: gluster-vol1 Type: Distribute Volume ID: 93004af1-e4bc-4ac6-a105-dfed6ec10b62 Status: Started Snapshot Count: 0 Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: xx.xx.xx.xx:/br1 Options Reconfigured: server.allow-insecure: on transport.address-family: inet performance.readdir-ahead: on nfs.disable: on Prepare a image with OS on the glusterfs server: # qemu-img convert xxx.qcow2 gluster://xx.xx.xx.xx/gluster-vol1/c16572.qcow2 -O qcow2 Created attachment 1259422 [details]
The log of script and bricks log
There are logs of script and brick log. You can debug by their timestamp.
Note that you should remove the following xml from the c16572.xml file in case that you cannot create the domain.
<interface type='bridge'>
<mac address='52:54:00:ad:71:b5'/>
<source bridge='br0'/>
<target dev='vnet1'/>
<model type='rtl8139'/>
<alias name='net0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
</interface>
Version: libvirt-3.0.0-2.el7.x86_64 glusterfs-3.8.4-15.el7rhgs.x86_64 qemu-kvm-rhev-2.8.0-4.el7.x86_64 When looking through bugfixes that went in Gluster Community 3.8, I came across https://review.gluster.org/15943 . This change does not seem to have been backported to RHGS. Without this change, I think it is possible that the offset is incorrectly reset to 0, and that would cause the assertion that Jeff pointed out in comment #4 (assuming 'start' is > 0): 1281 offs = glfs_lseek(s->fd, start, SEEK_DATA); 1282 if (offs < 0) { 1283 return -errno; /* D3 or D4 */ 1284 } 1285 assert(offs >= start); Han, is it easy for you to verify this workflow with a test-package that contains that particular patch? It is a change that is only relevant on the glusterfs-server (brick) environment. If you would be able to find time for a test, I can provide you with the packages. No problem. Please provide me the scratch build. (In reply to Niels de Vos from comment #9) > When looking through bugfixes that went in Gluster Community 3.8, I came > across https://review.gluster.org/15943 . This change does not seem to have > been backported to RHGS. Without this change, I think it is possible that > the offset is incorrectly reset to 0, and that would cause the assertion > that Jeff pointed out in comment #4 (assuming 'start' is > 0): > > 1281 offs = glfs_lseek(s->fd, start, SEEK_DATA); > 1282 if (offs < 0) { > 1283 return -errno; /* D3 or D4 */ > 1284 } > 1285 assert(offs >= start); > > Han, is it easy for you to verify this workflow with a test-package that > contains that particular patch? It is a change that is only relevant on the > glusterfs-server (brick) environment. If you would be able to find time for > a test, I can provide you with the packages. Hi, I am able to reproduce this (I have a duplicate bug, BZ #1451191, that I have not re-assigned or closed yet). What is being returned is not offs = 0, but rather (offs > 0 && offs < start). This does not seem to be a legitimate return value for lseek for SEEK_DATA or SEEK_HOLE. Here is an example of a bad return: start == 7608336384 offs == 7607877632 I am able to reproduce this easily by using qemu-img convert with a larger image size (> 6GB or so). For instance: qemu-img convert -f qcow2 -O raw gluster://192.168.15.180/gv0/stock-fed-i686.qcow2 convert.img Moving this back to the 'qemu-kvm' component, Jeff sent a patch to prevent QEMU from segfaulting. I suggest to get this change included in the RHEL/RHV package(s). Assigning to Jeff for now, hope thats ok. The missing backport (mentioned in comment #9) in glusterfs-server will be included through bug 1454558. Note that glusterfs-server is not part of RHEL, but only of the Red Hat Gluster Storage layered product. (In reply to Niels de Vos from comment #16) > Moving this back to the 'qemu-kvm' component, Jeff sent a patch to prevent > QEMU from segfaulting. I suggest to get this change included in the RHEL/RHV > package(s). Assigning to Jeff for now, hope thats ok. > > The missing backport (mentioned in comment #9) in glusterfs-server will be > included through bug 1454558. Note that glusterfs-server is not part of > RHEL, but only of the Red Hat Gluster Storage layered product. The bugfix mentioned in comment #9 references 3.8. I am able to reproduce this bug with a gluster server-side version of 3.11.0rc0. I can not reproduce the problem mentioned in comment #9, this is what I have: # rpm -q qemu-img glusterfs qemu-img-2.9.0-1.fc27.x86_64 glusterfs-3.11.0-0.4.rc1.fc25.x86_64 The volume that is configured consists out of a single brick (default volume options). The .qcow2 image that I use for testing was created like: - download a Fedora cloud image (is in .raw) - convert the .raw to .qcow2 - resize the 1st partition of the image, add 8GB - copy a 6GB random filled file into the image Running "qemu-img convert -f qcow2 -O raw gluster://..." does not cause crashes. Inspection with "ltrace -x glfs_lseek ..." does not show a return 'offset < start'. No segfaults occur. Could you provide the steps and Gluster configuration that you use to reproduce this problem? I built glusterfs from git, commit id 787d224: [root@localhost ~]# /usr/local/sbin/glusterfsd --version glusterfs 3.11.0rc0 Repository revision: git://git.gluster.org/glusterfs.git Copyright (c) 2006-2016 Red Hat, Inc. <https://www.gluster.org/> GlusterFS comes with ABSOLUTELY NO WARRANTY. It is licensed to you under your choice of the GNU Lesser General Public License, version 3 or any later version (LGPLv3 or later), or the GNU General Public License, version 2 (GPLv2), in all cases as published by the Free Software Foundation. [root@localhost ~]# /usr/local/sbin/glusterd --version glusterfs 3.11.0rc0 Repository revision: git://git.gluster.org/glusterfs.git Copyright (c) 2006-2016 Red Hat, Inc. <https://www.gluster.org/> GlusterFS comes with ABSOLUTELY NO WARRANTY. It is licensed to you under your choice of the GNU Lesser General Public License, version 3 or any later version (LGPLv3 or later), or the GNU General Public License, version 2 (GPLv2), in all cases as published by the Free Software Foundation. [root@localhost ~]# ps auxww|grep gluster root 1006 0.0 1.1 605992 11712 ? Ssl 15:35 0:00 /usr/local/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO root 1141 0.0 0.9 749672 10040 ? Ssl 15:35 0:00 /usr/local/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glusters9b88bef28cc6cabc23f1141afa9afb2.socket --xlator-option *replicate*.node-uuid=f9952781-1b78-42dc-9726-2d5df825a27f root 1156 6.7 1.7 2009320 18260 ? Ssl 15:35 0:50 /usr/local/sbin/glusterfsd -s 192.168.15.180 --volfile-id gv0.192.168.15.180.mnt-brick1-brick -p /var/lib/glusterd/vols/gv0/run/192.168.15.180-mnt-brickter/8a0f4b8c54a8ce692977310eb42baf7f.socket --brick-name /mnt/brick1/brick -l /var/log/glusterfs/bricks/mnt-brick1-brick.log --xlator-option *-posix.glusterd-uuid=f9952781-1b78-42dc-9726-2d5df825a27f --brick-port 4915.listen-port=49152 root 1162 0.0 1.4 1022380 14264 ? Ssl 15:35 0:00 /usr/local/sbin/glusterfsd -s 192.168.15.180 --volfile-id gv0.192.168.15.180.mnt-brick2-brick -p /var/lib/glusterd/vols/gv0/run/192.168.15.180-mnt-brickter/299e5b31e75f9554e4abf0d5073268a7.socket --brick-name /mnt/brick2/brick -l /var/log/glusterfs/bricks/mnt-brick2-brick.log --xlator-option *-posix.glusterd-uuid=f9952781-1b78-42dc-9726-2d5df825a27f --brick-port 4915.listen-port=49153 root 1170 0.0 1.3 1022380 13948 ? Ssl 15:35 0:00 /usr/local/sbin/glusterfsd -s 192.168.15.180 --volfile-id gv1.192.168.15.180.mnt-gv1-brick-small-1-brick -p /var/lib/glusterd/vols/gv1/run/192.168.15.18k.pid -S /var/run/gluster/9610d012ec3c2db9b2acb6f872acdd1e.socket --brick-name /mnt/gv1-brick-small-1/brick -l /var/log/glusterfs/bricks/mnt-gv1-brick-small-1-brick.log --xlator-option *-posix.glusterd-uuid=f9952781-1-brick-port 49154 --xlator-option gv1-server.listen-port=49154 root 1176 0.0 1.3 1022380 13872 ? Ssl 15:35 0:00 /usr/local/sbin/glusterfsd -s 192.168.15.180 --volfile-id gv1.192.168.15.180.mnt-gv1-brick-small-2-brick -p /var/lib/glusterd/vols/gv1/run/192.168.15.18k.pid -S /var/run/gluster/edd3ca7e2f7da3d71ed515b2b2e3d2d9.socket --brick-name /mnt/gv1-brick-small-2/brick -l /var/log/glusterfs/bricks/mnt-gv1-brick-small-2-brick.log --xlator-option *-posix.glusterd-uuid=f9952781-1-brick-port 49155 --xlator-option gv1-server.listen-port=49155 [root@localhost ~]# gluster volume info gv0 Volume Name: gv0 Type: Replicate Volume ID: 6bcb7964-0594-4801-a60b-22dae7f871f6 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 192.168.15.180:/mnt/brick1/brick Brick2: 192.168.15.180:/mnt/brick2/brick Options Reconfigured: performance.readdir-ahead: on # qemu-img info gluster://192.168.15.180/gv0/stock-fed-i686.qcow2 image: gluster://192.168.15.180/gv0/stock-fed-i686.qcow2 file format: qcow2 virtual size: 256G (274877906944 bytes) disk size: 7.5G cluster_size: 65536 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false # ./qemu-img convert -f qcow2 -O raw gluster://192.168.15.180/gv0/stock-fed-i686.qcow2 convert.img qemu-img: block/gluster.c:1278: find_allocation: Assertion `offs >= start' failed. Aborted (core dumped) I don't always get the abort on each run of the convert; the larger the qcow2 image file (in actual disk size), the more likely I am to hit it. Re-assigning back to glusterfs; there already exists a BZ #1451191 (now on POST) for the QEMU workaround, so this BZ is just for the glusterfs component. Go ahead and set it to POST (or the appropriate status) if it is fixed in glusterfs. Thanks! Now it works on libvirt-3.2.0-10.el7.x86_64 qemu-kvm-rhev-2.9.0-10.el7.x86_64 glusterfs-3.8.4-28.el7rhgs.x86_64 . And I can use older glusterfs-server as server. So remove TestBlocker flag. The invalid lseek return value is also seen on Gluster FUSE mounts, as reported by BZ #1536636 After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened. |