Bug 1255471
Summary: | [libgfapi] crash when NFS Ganesha Volume is 100% full | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Harold Miller <hamiller> | |
Component: | glusterfs | Assignee: | Bipin Kunal <bkunal> | |
Status: | CLOSED ERRATA | QA Contact: | Saurabh <saujain> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | rhgs-3.1 | CC: | asrivast, bkunal, byarlaga, divya, mzywusko, ndevos, nlevinki, rcyriac, saujain, skoduri, sreber, vagarwal, vbellur | |
Target Milestone: | --- | Keywords: | Patch, ZStream | |
Target Release: | RHGS 3.1.1 | |||
Hardware: | All | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | glusterfs-3.7.1-13 | Doc Type: | Bug Fix | |
Doc Text: |
Previously, on certain occasions, the libgfapi returned incorrect errors. NFS-Ganesha would handle the incorrect error in such a way that the procedures were retried. However, the used file descriptor should have been marked as bad, and no longer used. As a consequence, using a bad file descriptor caused access to memory that was freed and made NFS-Ganesha segfault. With this fix, libgfapi returns correct errors and marks the file descriptor as bad if the file descriptor should not be used again. Now, NFS-Ganesha does not try to reuse bad file descriptors and prevents segmentation faults.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1262798 (view as bug list) | Environment: | ||
Last Closed: | 2015-10-05 07:24:21 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1218535, 1240920, 1263094 | |||
Bug Blocks: | 1251815, 1262798 |
Description
Harold Miller
2015-08-20 16:32:15 UTC
Gluster version is glusterfs-3.7.1-11.el6rhs.x86_64 In order to test the fix I tried to test it with the following steps, 1. create a volume of 6x2 type. 2. enable quota on the volume 3. set a quota limit of 2 GB 4. configure nfs-ganesha 5. mount the volume using vers=3 6. use dd to create a file of 3GB. Result:- nfs-ganesha coredumps only for one node, with the bt as (gdb) bt #0 0x00007f74c81c6b22 in pub_glfs_pwritev (glfd=0x7f74a832b930, iovec=iovec@entry=0x7f74c97f87f0, iovcnt=iovcnt@entry=1, offset=2352373760, flags=0) at glfs-fops.c:936 #1 0x00007f74c81c6e7a in pub_glfs_pwrite (glfd=<optimized out>, buf=<optimized out>, count=<optimized out>, offset=<optimized out>, flags=<optimized out>) at glfs-fops.c:1051 #2 0x00007f74c85ebbe0 in file_write () from /usr/lib64/ganesha/libfsalgluster.so #3 0x00000000004d458e in cache_inode_rdwr_plus () #4 0x00000000004d53a9 in cache_inode_rdwr () #5 0x000000000045db41 in nfs3_write () #6 0x0000000000453a01 in nfs_rpc_execute () #7 0x00000000004545ad in worker_run () #8 0x000000000050afeb in fridgethr_start_routine () #9 0x00007f74d94f4df5 in start_thread () from /lib64/libpthread.so.0 #10 0x00007f74d901a1ad in clone () from /lib64/libc.so.6 (gdb) f 0 #0 0x00007f74c81c6b22 in pub_glfs_pwritev (glfd=0x7f74a832b930, iovec=iovec@entry=0x7f74c97f87f0, iovcnt=iovcnt@entry=1, offset=2352373760, flags=0) at glfs-fops.c:936 936 __GLFS_ENTRY_VALIDATE_FD (glfd, invalid_fs); (gdb) p * glfd $1 = {openfds = {next = 0x0, prev = 0x7f74a000ce90}, fs = 0x7f74a8324f20, offset = 140139014803232, fd = 0x7f74a8324f20, entries = {next = 0x78, prev = 0x78}, next = 0x7800000001, readdirbuf = 0x10200000002000} (gdb) p * glfd->fd $2 = {pid = 0, flags = -1473070784, refcount = 32628, inode_list = {next = 0x1, prev = 0x7f74a832baa0}, inode = 0x7800000078, lock = 8192, _ctx = 0x0, xl_count = 0, lk_ctx = 0x0, anonymous = _gf_false} (gdb) p * glfd->fd->inode Cannot access memory at address 0x7800000078 # gluster volume info vol1 Volume Name: vol1 Type: Distributed-Replicate Volume ID: 3176319c-c033-4d81-a1c2-e46d92a94e9c Status: Started Number of Bricks: 6 x 2 = 12 Transport-type: tcp Bricks: Brick1: 10.70.44.108:/rhs/brick1/d1r11 Brick2: 10.70.44.109:/rhs/brick1/d1r21 Brick3: 10.70.44.110:/rhs/brick1/d2r11 Brick4: 10.70.44.111:/rhs/brick1/d2r21 Brick5: 10.70.44.108:/rhs/brick1/d3r11 Brick6: 10.70.44.109:/rhs/brick1/d3r21 Brick7: 10.70.44.110:/rhs/brick1/d4r11 Brick8: 10.70.44.111:/rhs/brick1/d4r21 Brick9: 10.70.44.108:/rhs/brick1/d5r11 Brick10: 10.70.44.109:/rhs/brick1/d5r21 Brick11: 10.70.44.110:/rhs/brick1/d6r11 Brick12: 10.70.44.111:/rhs/brick1/d6r21 Options Reconfigured: features.quota-deem-statfs: on features.inode-quota: on features.quota: on ganesha.enable: on features.cache-invalidation: on nfs.disable: on performance.readdir-ahead: on nfs-ganesha: enable cluster.enable-shared-storage: enable Bipin, can you confirm if it was same bt that you was there before fix as well? I think Jiffin is looking into this kind of segfault (quota related?) in bug 1263084. Saurabh, I am not aware of any backtraces during the crash. Customer has not tested with the Quota enabled. He has tested with the small volume size and then creating a file bigger than the volume. Please have a look at the steps in BZ description. Thanks, Bipin Kunal Found the issue on 3.1.1 build as well, #0 0x00007f5763a83b22 in pub_glfs_pwritev (glfd=0x7f5708011f20, iovec=iovec@entry=0x7f57277fc7f0, iovcnt=iovcnt@entry=1, offset=2103709696, flags=0) at glfs-fops.c:936 936 __GLFS_ENTRY_VALIDATE_FD (glfd, invalid_fs); (gdb) bt #0 0x00007f5763a83b22 in pub_glfs_pwritev (glfd=0x7f5708011f20, iovec=iovec@entry=0x7f57277fc7f0, iovcnt=iovcnt@entry=1, offset=2103709696, flags=0) at glfs-fops.c:936 #1 0x00007f5763a83e7a in pub_glfs_pwrite (glfd=<optimized out>, buf=<optimized out>, count=<optimized out>, offset=<optimized out>, flags=<optimized out>) at glfs-fops.c:1051 #2 0x00007f5763ea8be0 in file_write () from /usr/lib64/ganesha/libfsalgluster.so #3 0x00000000004d458e in cache_inode_rdwr_plus () #4 0x00000000004d53a9 in cache_inode_rdwr () #5 0x000000000045db41 in nfs3_write () #6 0x0000000000453a01 in nfs_rpc_execute () #7 0x00000000004545ad in worker_run () #8 0x000000000050afeb in fridgethr_start_routine () #9 0x00007f5765bb9df5 in start_thread () from /lib64/libpthread.so.0 #10 0x00007f57656df1ad in clone () from /lib64/libc.so.6 Fix for the latest crash available in nfs-ganesha-2.2.0-9 From client, # cd /mnt [root@rhsauto019 mnt]# time dd if=/dev/urandom of=f.1 bs=1024 count=4194304 dd: error writing ‘f.1’: Input/output error 2299955+0 records in 2299954+0 records out 2355152896 bytes (2.4 GB) copied, 451.26 s, 5.2 MB/s from server, # sleep 10; df -hk | grep rhs /dev/mapper/vg_vdb-thinp1 2086400 2086380 20 100% /rhs/brick1 # time bash /usr/libexec/ganesha/ganesha-ha.sh --status Online: [ nfs11 nfs12 nfs13 nfs15 ] nfs11-cluster_ip-1 nfs11 nfs11-trigger_ip-1 nfs11 nfs12-cluster_ip-1 nfs12 nfs12-trigger_ip-1 nfs12 nfs13-cluster_ip-1 nfs13 nfs13-trigger_ip-1 nfs13 nfs15-cluster_ip-1 nfs15 nfs15-trigger_ip-1 nfs15 real 0m3.086s user 0m0.678s sys 0m0.235s Niels, Please review and sign-off the edited doc text. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-1845.html |