Bug 1452513
Summary: | [Stress] : Client process crashed during finds/rm from a single client. | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Ambarish <asoman> |
Component: | glusterfs | Assignee: | Csaba Henk <csaba> |
Status: | CLOSED ERRATA | QA Contact: | Ambarish <asoman> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | rhgs-3.3 | CC: | amukherj, asoman, bturner, ksandha, rgowdapp, rhinduja, rhs-bugs, vbellur |
Target Milestone: | --- | Keywords: | Regression |
Target Release: | RHGS 3.3.0 | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | 3.3.0-devel-freeze-exception | ||
Fixed In Version: | glusterfs-3.8.4-33 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-09-21 04:43:23 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1464327 | ||
Bug Blocks: | 1417151 |
Description
Ambarish
2017-05-19 06:14:05 UTC
rms and finds are a part of regular testing. I am positive that this passed on 3.2. Marking as Regression. *BT with missing debug infos installed * : (gdb) bt #0 frame_fill_groups (frame=frame@entry=0x7fa864080ac0) at fuse-helpers.c:158 #1 0x00007fa89fcea1d6 in get_groups (frame=0x7fa864080ac0, priv=0x7fa8a96e2040) at fuse-helpers.c:321 #2 get_call_frame_for_req (state=state@entry=0x7fa87c0065e0) at fuse-helpers.c:366 #3 0x00007fa89fcf27d0 in fuse_unlink_resume (state=0x7fa87c0065e0) at fuse-bridge.c:1631 #4 0x00007fa89fcec5c5 in fuse_resolve_done (state=<optimized out>) at fuse-resolve.c:663 #5 fuse_resolve_all (state=<optimized out>) at fuse-resolve.c:690 #6 0x00007fa89fcec2d8 in fuse_resolve (state=0x7fa87c0065e0) at fuse-resolve.c:654 #7 0x00007fa89fcec60e in fuse_resolve_all (state=<optimized out>) at fuse-resolve.c:686 #8 0x00007fa89fceb8f3 in fuse_resolve_continue (state=state@entry=0x7fa87c0065e0) at fuse-resolve.c:706 #9 0x00007fa89fcebae7 in fuse_resolve_entry_cbk (frame=<optimized out>, cookie=<optimized out>, this=0x7fa8a96dbef0, op_ret=0, op_errno=0, inode=0x7fa8880465f0, buf=0x7fa892ffcc60, xattr=0x0, postparent=0x7fa892ffccd0) at fuse-resolve.c:76 #10 0x00007fa899645069 in io_stats_lookup_cbk (frame=0x7fa87d40e630, cookie=<optimized out>, this=<optimized out>, op_ret=0, op_errno=0, inode=0x7fa8880465f0, buf=0x7fa892ffcc60, xdata=0x0, postparent=0x7fa892ffccd0) at io-stats.c:2190 #11 0x00007fa8a890c4d1 in default_lookup_cbk (frame=frame@entry=0x7fa87c055060, cookie=<optimized out>, this=<optimized out>, op_ret=op_ret@entry=0, op_errno=op_errno@entry=0, inode=0x7fa8880465f0, buf=buf@entry=0x7fa892ffcc60, xdata=0x0, postparent=postparent@entry=0x7fa892ffccd0) at defaults.c:1265 #12 0x00007fa899a70933 in mdc_lookup (frame=0x7fa864080ac0, this=<optimized out>, loc=0x7fa87c17cfa0, xdata=<optimized out>) at md-cache.c:1123 #13 0x00007fa8a8920b92 in default_lookup_resume (frame=0x7fa87c055060, this=0x7fa89401d280, loc=0x7fa87c17cfa0, xdata=0x0) at defaults.c:1872 #14 0x00007fa8a88b0b25 in call_resume (stub=0x7fa87c17cf50) at call-stub.c:2508 #15 0x00007fa89985b957 in iot_worker (data=0x7fa89402c900) at io-threads.c:220 #16 0x00007fa8a76eddc5 in start_thread (arg=0x7fa892ffd700) at pthread_create.c:308 #17 0x00007fa8a703273d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 (gdb) (In reply to Ambarish from comment #0) > Description of problem: > ------------------------ > > The problem is fairly reproducible on EC 96*(4+2) and on a small 2*2 volume > as well,with millions and millions of files. > > Ran find from a single mount. > > Karan could repro this on 2*2 as well during rm. crash during rm is filed under bz 1454416 and is fixed. However, Amabrish mentions that he hit this crash without any rm -rf operations involved. So, this is a different issue. Steps to Repro : Create data set with lost of file on a FUSE mount: python /small-files/smallfile/smallfile_cli.py --operation create --threads 8 --file-size 64 --files 10000 --files-per-dir 100000 --top /gluster-mount --host-set <client list> Drop caches and run ll-R/find . -mindepth 1 -type f/rm -rf * on mount (In reply to Ambarish from comment #7) > *BT with missing debug infos installed * : > > > (gdb) bt > #0 frame_fill_groups (frame=frame@entry=0x7fa864080ac0) at > fuse-helpers.c:158 > #1 0x00007fa89fcea1d6 in get_groups (frame=0x7fa864080ac0, > priv=0x7fa8a96e2040) at fuse-helpers.c:321 > #2 get_call_frame_for_req (state=state@entry=0x7fa87c0065e0) at > fuse-helpers.c:366 > #3 0x00007fa89fcf27d0 in fuse_unlink_resume (state=0x7fa87c0065e0) at > fuse-bridge.c:1631 > #4 0x00007fa89fcec5c5 in fuse_resolve_done (state=<optimized out>) at > fuse-resolve.c:663 > #5 fuse_resolve_all (state=<optimized out>) at fuse-resolve.c:690 > #6 0x00007fa89fcec2d8 in fuse_resolve (state=0x7fa87c0065e0) at > fuse-resolve.c:654 > #7 0x00007fa89fcec60e in fuse_resolve_all (state=<optimized out>) at > fuse-resolve.c:686 > #8 0x00007fa89fceb8f3 in fuse_resolve_continue > (state=state@entry=0x7fa87c0065e0) at fuse-resolve.c:706 > #9 0x00007fa89fcebae7 in fuse_resolve_entry_cbk (frame=<optimized out>, > cookie=<optimized out>, this=0x7fa8a96dbef0, op_ret=0, op_errno=0, > inode=0x7fa8880465f0, buf=0x7fa892ffcc60, xattr=0x0, > postparent=0x7fa892ffccd0) at fuse-resolve.c:76 > #10 0x00007fa899645069 in io_stats_lookup_cbk (frame=0x7fa87d40e630, > cookie=<optimized out>, this=<optimized out>, op_ret=0, op_errno=0, > inode=0x7fa8880465f0, buf=0x7fa892ffcc60, xdata=0x0, > postparent=0x7fa892ffccd0) at io-stats.c:2190 > #11 0x00007fa8a890c4d1 in default_lookup_cbk > (frame=frame@entry=0x7fa87c055060, cookie=<optimized out>, this=<optimized > out>, op_ret=op_ret@entry=0, op_errno=op_errno@entry=0, > inode=0x7fa8880465f0, > buf=buf@entry=0x7fa892ffcc60, xdata=0x0, > postparent=postparent@entry=0x7fa892ffccd0) at defaults.c:1265 > #12 0x00007fa899a70933 in mdc_lookup (frame=0x7fa864080ac0, this=<optimized > out>, loc=0x7fa87c17cfa0, xdata=<optimized out>) at md-cache.c:1123 > #13 0x00007fa8a8920b92 in default_lookup_resume (frame=0x7fa87c055060, > this=0x7fa89401d280, loc=0x7fa87c17cfa0, xdata=0x0) at defaults.c:1872 > #14 0x00007fa8a88b0b25 in call_resume (stub=0x7fa87c17cf50) at > call-stub.c:2508 > #15 0x00007fa89985b957 in iot_worker (data=0x7fa89402c900) at > io-threads.c:220 > #16 0x00007fa8a76eddc5 in start_thread (arg=0x7fa892ffd700) at > pthread_create.c:308 > #17 0x00007fa8a703273d in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 > (gdb) Listing the code at the point of crash: 158 char *saveptr = NULL; (gdb) l 153 char line[4096]; 154 char *ptr = NULL; 155 FILE *fp = NULL; 156 int idx = 0; 157 long int id = 0; 158 char *saveptr = NULL; 159 char *endptr = NULL; 160 int ret = 0; 161 int ngroups = FUSE_MAX_AUX_GROUPS; 162 gid_t mygroups[GF_MAX_AUX_GROUPS]; Ie. it's reported to happen in the declaration boilerplate of frame_fill_groups(). That is the same as can be observed in Bug 1464327, see the analysis there in Comment 2. We can conclude that it's the same stack overflow issue. In Bug 1464327 we identified change I7ede90d0e41bcf55755cced5747fa0fb1699edb2 (https://review.gluster.org/#/q/I7ede90d0e41bcf55755cced5747fa0fb1699edb2) as the culprit. That change has been ported back to RHGS 3.1.2 so all RHGS version from 3.1.2 on are affected. upstream patch : https://review.gluster.org/17706 downstream patch : https://code.engineering.redhat.com/gerrit/#/c/111305/ Verified on 3.8.4-35. Client process did not crash on multiple tries of single/multi-threaded rms/finds from various FUSE mounts. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2774 |