Description of problem: After running fs-drift.py mixed-workload generator for a half-hour, glusterfs core dumped and I got the error "Transport endpoint is not connected" from "df". Details below. Version-Release number of selected component (if applicable): glusterfs-5.2-1.el7.x86_64 kernel-3.10.0-693.2.2.el7.x86_64 How reproducible: not sure yet. Steps to Reproduce: I'm using Kaleb's longevity cluster, I don't know how that was installed. All I did was create glusterfs mountpoints on 6 of the nodes rhs-srv-{08,09,10,11,12,13,14} and launch fs-drift.py from -09 using the script below. I just happened to notice the loscoredump URL will be provided.s of a mountpoint on rhs-srv-10, not sure if it happened anywhere else, or when it happened exactly, only that the mountpoint was functioning when the test started. Actual results: [root@rhs-srv-10 ~]# ls -l /mnt/longevity anaconda-ks.cfg libntirpc-1.6.2-1.el7.x86_64.rpm [root@rhs-srv-10 ~]# ls -l /core* -rw------- 1 root root 346918912 Jan 7 18:37 /core.12648 -rw------- 1 root root 26152960 Nov 14 2016 /core.27656 -rw------- 1 root root 26152960 Nov 14 2016 /core.27681 -rw------- 1 root root 26152960 Nov 14 2016 /core.27704 Expected results: glusterfs should never core dump no matter how heavy the load Additional info: Just before that, I had paused the benchmark (see below), and done a find command to count the files created by the benchmark, only about 1000 so far. Then I resumed the benchmark and that's when I noticed the mountpoint was gone. The fs-drift.py mixed workload benchmark is described here: https://github.com/parallel-fs-utils/fs-drift/blob/master/README.md The workload was this script: [root@rhs-srv-09 fs-drift]# more runit.sh #!/bin/bash -x ./fs-drift.py \ --host-set rhs-srv-08,rhs-srv-10,rhs-srv-11,rhs-srv-12,rhs-srv-13,rhs-srv-14 \ --top /mnt/longevity/bene \ --max-files 1000000 \ --max-file-size-kb 1024 \ --max-record-size-kb 256 \ --max-random-writes 5 \ --max-random-reads 10 \ --threads 16 \ --dirs-per-level 50 \ --duration 36000 \ --tolerate-stale-file-handles true \ --output-json It is still running AFAIK, should be done tomorrow morning. To pause the workload, touch /mnt/longevity/bene/network_shared/pause.tmp To unpause it, just rm that file.
Hi Ben, can you please share backtraces or make the cores accessible? Or would it be worth to bother with if you can reproduce the phenomenon? Thanks, Csaba
sorry Csaba, I thought the cores were accessible by virtue of being in the longevity cluster. This most likely is reproducible. Can you log in to rhs_srv_10.storage-dev.lab.eng.bos.redhat.com and access the core dump above? If not, I'll post the core dump. Kaleb Keithley is the owner of this cluster and can further assist. My main purpose here is to demonstrate enhancements to the fs-drift tool that can be used for verifying that Gluster holds up under a heavy workload for a long period of time, and verifying that Gluster functions correctly under realistic but extreme conditions, such as a near-full filesystem. This benchmark is being used on Cephfs also, but it should work on any POSIX-like filesystem, distributed or not. I'll try it one more time next week (with newer fs-drift) and post the core dump. My preference would be to help someone in the Gluster community use the workload generator and have them reproduce the problem and analyze it. Of course I'll do my best to resolve any problems you find with the workload generator. HTH -ben
sorry again, it was rhs-srv-10 not rhs_srv_10 :-(
I see the crash happens on the fuse_forget -> inode_unref -> inode_table_prune code path. That part of the code has been significally rewritten after 5.2 by https://review.gluster.org/c/glusterfs/+/19778. Would it be possible to recheck with that patch applied?
I don't know how to apply the patch but if you or someone elese does it, then I can re-run fs-drift, which I planned to do anyway.
Created attachment 1521159 [details] patched glusterfs 5.2 rpms (glusterfs-{fuse,libs}) Here we ship glusterfs-{libs,fuse}-5.2-0.2.git477643b.el7.x86_64.rpm in a tarball. The rpms were built from 5.2 with following changes backported: https://review.gluster.org/c/glusterfs/+/21305 https://review.gluster.org/c/glusterfs/+/19778 They are expected to be usable with rest of a glusterfs 5.2 installation on a RHEL 7.* system. Please retest with these.
Citing the backtrace from /core.12648 for further reference. [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/usr/sbin/glusterfs --process-name fuse --volfile-server=rhs-srv-09 --volfile-i'. Program terminated with signal 11, Segmentation fault. #0 __GI___pthread_mutex_lock (mutex=0x800b07f4d388) at ../nptl/pthread_mutex_lock.c:66 66 unsigned int type = PTHREAD_MUTEX_TYPE_ELISION (mutex); Missing separate debuginfos, use: debuginfo-install glusterfs-fuse-5.2-1.el7.x86_64 (gdb) bt #0 __GI___pthread_mutex_lock (mutex=0x800b07f4d388) at ../nptl/pthread_mutex_lock.c:66 #1 0x00007fcdd96ca7ec in __gf_free () from /lib64/libglusterfs.so.0 #2 0x00007fcdcb8d16ab in afr_inode_ctx_destroy () from /usr/lib64/glusterfs/5.2/xlator/cluster/replicate.so #3 0x00007fcdcb8d46a7 in afr_forget () from /usr/lib64/glusterfs/5.2/xlator/cluster/replicate.so #4 0x00007fcdd96ae77e in __inode_ctx_free () from /lib64/libglusterfs.so.0 #5 0x00007fcdd96afbb2 in inode_table_prune () from /lib64/libglusterfs.so.0 #6 0x00007fcdd96afe94 in inode_unref () from /lib64/libglusterfs.so.0 #7 0x00007fcdd0a81292 in fuse_forget () from /usr/lib64/glusterfs/5.2/xlator/mount/fuse.so #8 0x00007fcdd0a98c5a in fuse_thread_proc () from /usr/lib64/glusterfs/5.2/xlator/mount/fuse.so #9 0x00007fcdd8502e25 in start_thread (arg=0x7fcdc8a17700) at pthread_create.c:308 #10 0x00007fcdd7dcf34d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
Created attachment 1521358 [details] backport of https://review.gluster.org//21305 to 5.3 As Kaleb pointed out, 5.3 is out now. However, it still does not have one of the changes referred in comment #7 (https://review.gluster.org//21305), so I'm posting a backport of it to 5.3.
Am re-running test, I think Gluster has been updated on the longevity cluster to v5.2.1 on all hosts, plus whatever was done by Kaleb. Kaleb said in e-mail: "I ended up extracting the libglusterfs shlib and the fuse xlator from the rpms and copying them into place on all the nodes. It probably would have been easier to send me the diff of the backport (or a src.rpm) and then I'd have built a complete set of packages." He also said that it might be easier to upgrade to V5.3 and retry the test. To run this test, login to rhs-srv-09 and: - if not mounted, mount the filesystem on all clients: # export ANSIBLE_INVENTORY=~/bene/host.inv # ansible -m shell -a 'mount -t glusterfs rhs-srv-09:/longevity /mnt/longevity' all - kick off the test # cd ~/bene/fs-drift # ./runit.sh I see these fairly regularly, I think I saw them before: tail -f glusterd.log ... The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 39 times between [2019-01-21 18:27:30.613805] and [2019-01-21 18:29:27.741999] [2019-01-21 18:29:30.745368] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 39 times between [2019-01-21 18:29:30.745368] and [2019-01-21 18:31:27.873675] [2019-01-21 18:31:30.876996] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 39 times between [2019-01-21 18:31:30.876996] and [2019-01-21 18:33:28.005059] [2019-01-21 18:33:31.008389] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 46 times between [2019-01-21 18:33:31.008389] and [2019-01-21 18:35:28.136950] [2019-01-21 18:35:31.140260] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 39 times between [2019-01-21 18:35:31.140260] and [2019-01-21 18:37:28.270902] [2019-01-21 18:37:31.274049] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler Are these real errors or not? mnt-longevity.log: [2019-01-21 18:41:40.349391] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-longevity-dht: renaming /bene/d0029/d0047/f000000336 (04407899-c84c-4d72-b1f3-c6d31fbb38d9) (hash=longevity-replicate-0/cache=longevity-replicate-0) => /bene/d0010/d0036/f000000183 ((null)) (hash=longevity-replicate-3/cache=<nul>) [2019-01-21 18:41:40.540286] W [MSGID: 101159] [inode.c:1330:__inode_unlink] 0-inode: bb4302c7-2c54-432d-9949-d6a3c9a28c54/f000000179: dentry not found in c71679b1-a915-4c7d-8419-3c4e5d16f0cf [2019-01-21 18:41:40.703773] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler [2019-01-21 18:41:40.703994] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler [2019-01-21 18:41:40.704323] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-longevity-dht: renaming /bene/d0045/d0025/f000000128 (4e3e43a6-bf32-4260-8fc3-692991d67484) (hash=longevity-replicate-2/cache=longevity-replicate-2) => /bene/d0011/d0011/f000000370 ((null)) (hash=longevity-replicate-0/cache=<nul>) [2019-01-21 18:41:40.705841] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 159 times between [2019-01-21 18:41:40.705841] and [2019-01-21 18:41:40.746764] [2019-01-21 18:41:40.962160] I [MSGID: 109066] [dht-rename.c:1922:dht_rename] 0-longevity-dht: renaming /bene/d0015/d0020/f000000268 (0a99892f-6a4b-46c4-af73-10b7882542c3) (hash=longevity-replicate-0/cache=longevity-replicate-0) => /bene/d0027/d0020/f000000012 ((null)) (hash=longevity-replicate-1/cache=<nul>) [2019-01-21 18:41:41.012307] E [MSGID: 133021] [shard.c:3569:shard_delete_shards] 0-longevity-shard: Failed to clean up shards of gfid b1123a96-44b0-47f7-97aa-5d536b3308ec [No such file or directory] [2019-01-21 18:41:41.039286] E [MSGID: 133021] [shard.c:3569:shard_delete_shards] 0-longevity-shard: Failed to clean up shards of gfid f4a4c3f3-8fa1-45ce-ba94-052ff3ce79c9 [No such file or directory] [2019-01-21 18:41:41.134390] E [fuse-bridge.c:219:check_and_dump_fuse_W] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f19cafd7fbb] (--> /usr/lib64/glusterfs/5.2/xlator/mount/fuse.so(+0x7f81)[0x7f19c23b6f81] (--> /usr/lib64/glusterfs/5.2/xlator/mount/fuse.so(+0x8e38)[0x7f19c23b7e38] (--> /lib64/libpthread.so.0(+0x7e25)[0x7f19c9e39e25] (--> /lib64/libc.so.6(clone+0x6d)[0x7f19c970634d] ))))) 0-glusterfs-fuse: writing to fuse device failed: No such file or directory [2019-01-21 18:41:41.316802] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 179 times between [2019-01-21 18:41:41.316802] and [2019-01-21 18:41:41.388848] bricks/home-bricks-longevity.log [2019-01-21 18:43:03.641697] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 9630 times between [2019-01-21 18:43:03.641697] and [2019-01-21 18:43:05.598147] [2019-01-21 18:43:05.618000] I [MSGID: 115062] [server-rpc-fops_v2.c:1133:server4_link_cbk] 0-longevity-server: 85655: LINK /bene/d0009/d0049/f000000296 (e568165b-c337-4168-9583-c8b02a54257a) -> bb08e1f6-3828-45c6-8c75-aa475a3eeacc/f000000037, client: CTX_ID:20d0af45-d4ba-48a7-8898-d131ea327af7-GRAPH_ID:0-PID:15008-HOST:rhs-srv-08.storage-dev.lab.eng.bos.redhat.com-PC_NAME:longevity-client-3-RECON_NO:-0, error-xlator: - [File exists] [2019-01-21 18:43:05.618883] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 139 times between [2019-01-21 18:43:05.618883] and [2019-01-21 18:43:05.637162] [2019-01-21 18:43:05.731700] E [MSGID: 113039] [posix-inode-fd-ops.c:1457:posix_open] 0-longevity-posix: open on /home/bricks/longevity/.glusterfs/82/3c/823c46d5-91fc-401a-a77d-b35543ebccbe, flags: 1 [No such file or directory] [2019-01-21 18:43:06.744458] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 21 times between [2019-01-21 18:43:06.744458] and [2019-01-21 18:43:06.750397] [2019-01-21 18:43:06.750419] E [MSGID: 113001] [posix-inode-fd-ops.c:3266:posix_getxattr] 0-longevity-posix: getxattr failed on /home/bricks/longevity/.glusterfs/82/3c/823c46d5-91fc-401a-a77d-b35543ebccbe: trusted.glusterfs.dht.linkto [No such file or directory] [2019-01-21 18:43:06.750491] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 3669 times between [2019-01-21 18:43:06.750491] and [2019-01-21 18:43:07.539439]
and just as before, about 1/2 hour later... fd = os.open(fn, os.O_WRONLY) OSError: [Errno 107] Transport endpoint is not connected: './d0042/d0045/f000000317' thrd.05 2019-01-21 14:00:13,333 - ERROR - append: ./d0042/d0045/f000000317 syscall errno 107(Transport endpoint is not connected) thrd.05 2019-01-21 14:00:13,333 - ERROR - [Errno 107] Transport endpoint is not connected: './d0015' Traceback (most recent call last): File "/root/bene/fs-drift/fsop.py", line 314, in op_create os.makedirs(subdir) File "/usr/lib64/python2.7/os.py", line 150, in makedirs makedirs(head, mode) File "/usr/lib64/python2.7/os.py", line 157, in makedirs mkdir(name, mode) OSError: [Errno 107] Transport endpoint is not connected: './d0015' thrd.05 2019-01-21 14:00:13,334 - ERROR - dir create: ./d0015/d0016/f000000368 syscall errno 107(Transport endpoint is not connected) thrd.05 2019-01-21 14:00:13,334 - ERROR - [Errno 107] Transport endpoint is not connected: './d0001' Traceback (most recent call last): [root@rhs-srv-09 fs-drift]# ansible -i ../host.inv -m shell -a 'ls -ltr /core*' all rhs-srv-10 | CHANGED | rc=0 >> ... -rw------- 1 root root 164057088 Jan 21 14:00 /core.6658 I'll try to fix fs-drift so that it doesn't keep logging errors like this one if you tell it to ignore stale file handles: OSError: [Errno 116] Stale file handle thrd.01 2019-01-21 13:44:23,008 - ERROR - close: ./d0020/d0039/f000000103 syscall errno 116(Stale file handle) thrd.02 2019-01-21 13:45:14,640 - ERROR - [Errno 2] No such file or directory Traceback (most recent call last): File "/root/bene/fs-drift/fsop.py", line 199, in try_to_close os.close(closefd)
Here is a new core dump. So this is definitely reproducible. [root@rhs-srv-09 ~]# ansible -i ~/bene/host.inv -m shell -a 'ls -ltr /core* | grep "Jan 21"' all ... rhs-srv-14 | CHANGED | rc=0 >> -rw------- 1 root root 206307328 Jan 21 17:16 /core.6676 above fs-drift logging problem is fixed.
raising priority to high because entire cluster locked up - all the fs-drift processes on all hosts enter "D" state (ps awux shows "D") and can't be killed, even with -9. Example: root 21762 0.0 0.1 274592 20772 ? D 17:12 0:00 python /root/bene/fs-drift/fs-drift-remote.py --network-sync-dir /mnt/longevity/bene/network-shared --as-host rhs-srv-08 And unmount fails [root@rhs-srv-08 ~]# umount -f /mnt/longevity umount: /mnt/longevity: target is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) Literally only thing I can do at that point is reboot! One next step would be to change the workload mix to remove operation types until this stops happening, and isolate the operation mix that makes it happen. You could also remove the sharding translator and see if it still happens. I'm not going to do that, have to return to my day job. But if someone wants to run fs-drift and try to isolate this problem further, I can provide assistance with fs-drift. Here's some documentation I wrote about it: https://docs.google.com/presentation/d/10DpxUCVC4dvZKwcEov4TEKbbS26DdiZ2L-EMR5-6SGo/edit#slide=id.p
clarifying bz title
Status?