Bug 1671603
Summary: | flooding of "dict is NULL" logging & crash of client process | |||
---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Amar Tumballi <atumball> | |
Component: | core | Assignee: | bugs <bugs> | |
Status: | CLOSED CURRENTRELEASE | QA Contact: | ||
Severity: | low | Docs Contact: | ||
Priority: | low | |||
Version: | 5 | CC: | amgad.saleh, archon810, bugs, pasik, vpvainio | |
Target Milestone: | --- | Keywords: | Triaged, ZStream | |
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | glusterfs-6.x | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | 1313567 | |||
: | 1674225 (view as bug list) | Environment: | ||
Last Closed: | 2019-07-10 06:13:47 UTC | Type: | --- | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1313567 | |||
Bug Blocks: | 1667103, 1674225 |
Description
Amar Tumballi
2019-02-01 03:17:34 UTC
The fuse crash happened again yesterday, to another volume. Are there any mount options that could help mitigate this? In the meantime, I set up a monit (https://mmonit.com/monit/) task to watch and restart the mount, which works and recovers the mount point within a minute. Not ideal, but a temporary workaround. By the way, the way to reproduce this "Transport endpoint is not connected" condition for testing purposes is to kill -9 the right "glusterfs --process-name fuse" process. monit check: check filesystem glusterfs_data1 with path /mnt/glusterfs_data1 start program = "/bin/mount /mnt/glusterfs_data1" stop program = "/bin/umount /mnt/glusterfs_data1" if space usage > 90% for 5 times within 15 cycles then alert else if succeeded for 10 cycles then alert stack trace: [2019-02-01 23:22:00.312894] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fa0249e4329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument] [2019-02-01 23:22:00.314051] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fa0249e4329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument] The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 26 times between [2019-02-01 23:21:20.857333] and [2019-02-01 23:21:56.164427] The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk] 0-SITE_data3-replicate-0: selecting local read_child SITE_data3-client-3" repeated 27 times between [2019-02-01 23:21:11.142467] and [2019-02-01 23:22:03.474036] pending frames: frame : type(1) op(LOOKUP) frame : type(0) op(0) patchset: git://git.gluster.org/glusterfs.git signal received: 6 time of crash: 2019-02-01 23:22:03 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 5.3 /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fa02cf6664c] /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fa02cf70cb6] /lib64/libc.so.6(+0x36160)[0x7fa02c12d160] /lib64/libc.so.6(gsignal+0x110)[0x7fa02c12d0e0] /lib64/libc.so.6(abort+0x151)[0x7fa02c12e6c1] /lib64/libc.so.6(+0x2e6fa)[0x7fa02c1256fa] /lib64/libc.so.6(+0x2e772)[0x7fa02c125772] /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fa02c4bb0b8] /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7fa025543c9d] /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7fa025556ba1] /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7fa0257dbf3f] /usr/lib64/libgfrpc.so.0(+0xe820)[0x7fa02cd31820] /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fa02cd31b6f] /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fa02cd2e063] /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fa02694e0b2] /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fa02cfc44c3] /lib64/libpthread.so.0(+0x7559)[0x7fa02c4b8559] /lib64/libc.so.6(clone+0x3f)[0x7fa02c1ef81f] This got fixed with system flags which made spinlock properly. Also, the dict is NULL logs are now not coming up in the deployments. |