Created attachment 1324790 [details] /var/log/message logs Description of problem: Ceph-Fuse Service stopped while running multiple IO's on CephFS mount. Mounted cephFS on multiple clients, in 2 client using FUSE and 2 more clients using kernel mount Observed failure in one particular client, other client mounts are fine, the IO workloads are different among the clients. There were 2 IO's going on. 1. "sudo rm -rf dir" -- in which directory was having 15G of data and directories of 10 breath and 10 depth. 2. fio with below profile. [random-writes] ioengine=posixaio iodepth=32 directory=/mnt/cephfs/fio rw=randwrite bs=4k direct=1 time_based runtime=30m size=5G numjobs=32 [random-read] rw=randread iodepth=32 during this IO, fio failed with errors "file:filesetup.c:278, func=fstat, error=Transport endpoint is not connected". fio summary errors available in this private_bin link https://privatebin-it-iso.int.open.paas.redhat.com/?f22cce7d7395a69c#61MtmNpehwOZTqX02Ay07iggyEwyRr/ask5slHOQM2g= There were memory issues observed in /var/log/messages logs. attached complete dmessage logs in bz. Sep 12 08:56:50 magna070 kernel: bash invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 Sep 12 08:56:50 magna070 kernel: bash cpuset=/ mems_allowed=0 Sep 12 08:56:50 magna070 kernel: CPU: 9 PID: 5773 Comm: bash Not tainted 3.10.0-693.1.1.el7.x86_64 #1 Sep 12 08:56:50 magna070 kernel: Hardware name: Supermicro SYS-F627R3-RTB+/X9DRFR, BIOS 3.0b 04/24/2014 Sep 12 08:56:50 magna070 kernel: ffff88079cb05ee0 000000009f368ec1 ffff8806a9e27920 ffffffff816a3d91 Sep 12 08:56:50 magna070 kernel: ffff8806a9e279b0 ffffffff8169f186 ffff8806a9e279b8 ffffffff812b7e5b Sep 12 08:56:50 magna070 kernel: ffff88084b22c618 0000000000000206 ffffffff00000206 fffeefff00000000 Sep 12 08:56:50 magna070 kernel: Call Trace: Sep 12 08:56:50 magna070 kernel: [<ffffffff816a3d91>] dump_stack+0x19/0x1b Sep 12 08:56:50 magna070 kernel: [<ffffffff8169f186>] dump_header+0x90/0x229 Sep 12 08:56:50 magna070 kernel: [<ffffffff812b7e5b>] ? cred_has_capability+0x6b/0x120 Sep 12 08:56:50 magna070 kernel: [<ffffffff81186394>] oom_kill_process+0x254/0x3d0 Sep 12 08:56:50 magna070 kernel: [<ffffffff812b803e>] ? selinux_capable+0x2e/0x40 Sep 12 08:56:50 magna070 kernel: [<ffffffff81186bd6>] out_of_memory+0x4b6/0x4f0 Sep 12 08:56:50 magna070 kernel: [<ffffffff8169fc8a>] __alloc_pages_slowpath+0x5d6/0x724 Sep 12 08:56:50 magna070 kernel: [<ffffffff8118cd85>] __alloc_pages_nodemask+0x405/0x420 Sep 12 08:56:50 magna070 kernel: [<ffffffff811d1108>] alloc_pages_current+0x98/0x110 Sep 12 08:56:50 magna070 kernel: [<ffffffff81182917>] __page_cache_alloc+0x97/0xb0 Sep 12 08:56:50 magna070 kernel: [<ffffffff81184eb0>] filemap_fault+0x170/0x410 Sep 12 08:56:50 magna070 kernel: [<ffffffffc025b156>] ext4_filemap_fault+0x36/0x50 [ext4] Sep 12 08:56:50 magna070 kernel: [<ffffffff811ad162>] __do_fault+0x52/0xe0 Sep 12 08:56:50 magna070 kernel: [<ffffffff811ad60b>] do_read_fault.isra.44+0x4b/0x130 Sep 12 08:56:50 magna070 kernel: [<ffffffff811b1f11>] handle_mm_fault+0x691/0xfa0 Sep 12 08:56:50 magna070 kernel: [<ffffffff81036509>] ? __restore_xstate_sig+0x99/0x5c0 Sep 12 08:56:50 magna070 kernel: [<ffffffff8102a5e0>] ? do_signal+0x1d0/0x6c0 Sep 12 08:56:50 magna070 kernel: [<ffffffff816affb4>] __do_page_fault+0x154/0x450 Sep 12 08:56:50 magna070 kernel: [<ffffffff816b02e5>] do_page_fault+0x35/0x90 Sep 12 08:56:50 magna070 kernel: [<ffffffff816ac508>] page_fault+0x28/0x30 Sep 12 08:56:50 magna070 kernel: Mem-Info: Sep 12 08:56:50 magna070 kernel: active_anon:8014350 inactive_anon:21059 isolated_anon:0#012 active_file:0 inactive_file:0 isolated_file:0#012 unevictable:0 dirty:0 writeback:0 unstable:0#012 slab_reclaimable:6818 slab_unreclaimable:13338#012 mapped:18968 shmem:21143 pagetables:17129 bounce:0#012 free:53730 free_pcp:3553 free_cma:0 Sep 12 08:56:50 magna070 kernel: Node 0 DMA free:15800kB min:32kB low:40kB high:48kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15972kB managed:15800kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes Version-Release number of selected component (if applicable): ceph: 12.2.0-1.el7cp (b661348f156f148d764b998b65b90451f096cb27) luminous (rc) How reproducible: 1/1 Steps to Reproduce: NA Actual results: NA Expected results: NA Additional info: NA
Looks like some kind of leak or runaway cache. We need the debug logs for the client to diagnose.
Created attachment 1325770 [details] Client logs. Attaching Client logs.
I reproduced this locally. The reason it that fio does 1 byte random writes, ceph-fuse adds 1 byte data to object cacher. Size of data structures that used for tracking the 1 byte dirty data are tens of bytes. By default, max size of object cacher is configured to 200M, the actual memory usage is amplified to tens of times.
@Zheng - are we saying this behavior is just a function of the test?
yes, writing 1 byte is function of the test. the test reveals flaw of memory management of object cacher
@Zheng, @Patrick - please decide for sure, and then move BZ to Rados, if that's the correct target.
RP https://github.com/ceph/ceph/pull/18183
Moving this bug to Verified state. Verified in ceph version 12.2.1-20.el7cp (a545c917eb9f08c5e8dec406dac5e123accea3a3) luminous (stable)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:3387