Bug 1490814 - [Ceph-Fuse]: fuse service stopped and mount become unavailable during client IO's.
Summary: [Ceph-Fuse]: fuse service stopped and mount become unavailable during client ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: CephFS
Version: 3.0
Hardware: Unspecified
OS: Linux
medium
medium
Target Milestone: rc
: 3.0
Assignee: Patrick Donnelly
QA Contact: Ramakrishnan Periyasamy
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-09-12 09:45 UTC by Ramakrishnan Periyasamy
Modified: 2017-12-05 23:42 UTC (History)
7 users (show)

Fixed In Version: RHEL: ceph-12.2.1-16.el7cp Ubuntu: ceph_12.2.1-18redhat1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-12-05 23:42:56 UTC
Embargoed:


Attachments (Terms of Use)
/var/log/message logs (189.97 KB, text/plain)
2017-09-12 09:45 UTC, Ramakrishnan Periyasamy
no flags Details
Client logs. (2.90 MB, text/plain)
2017-09-14 06:08 UTC, Ramakrishnan Periyasamy
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 21726 0 None None None 2017-10-09 15:13:50 UTC
Red Hat Product Errata RHBA-2017:3387 0 normal SHIPPED_LIVE Red Hat Ceph Storage 3.0 bug fix and enhancement update 2017-12-06 03:03:45 UTC

Description Ramakrishnan Periyasamy 2017-09-12 09:45:48 UTC
Created attachment 1324790 [details]
/var/log/message logs

Description of problem:
Ceph-Fuse Service stopped while running multiple IO's on CephFS mount.
Mounted cephFS on multiple clients, in 2 client using FUSE and 2 more clients using kernel mount

Observed failure in one particular client, other client mounts are fine, the IO workloads are different among the clients.

There were 2 IO's going on.
1. "sudo rm -rf dir" -- in which directory was having 15G of data and directories of 10 breath and 10 depth.

2. fio with below profile.
[random-writes]
ioengine=posixaio
iodepth=32
directory=/mnt/cephfs/fio
rw=randwrite
bs=4k
direct=1
time_based
runtime=30m
size=5G
numjobs=32

[random-read]
rw=randread
iodepth=32

during this IO, fio failed with errors "file:filesetup.c:278, func=fstat, error=Transport endpoint is not connected".

fio summary errors available in this private_bin link
https://privatebin-it-iso.int.open.paas.redhat.com/?f22cce7d7395a69c#61MtmNpehwOZTqX02Ay07iggyEwyRr/ask5slHOQM2g=

There were memory issues observed in /var/log/messages logs.
attached complete dmessage logs in bz.

Sep 12 08:56:50 magna070 kernel: bash invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
Sep 12 08:56:50 magna070 kernel: bash cpuset=/ mems_allowed=0
Sep 12 08:56:50 magna070 kernel: CPU: 9 PID: 5773 Comm: bash Not tainted 3.10.0-693.1.1.el7.x86_64 #1
Sep 12 08:56:50 magna070 kernel: Hardware name: Supermicro SYS-F627R3-RTB+/X9DRFR, BIOS 3.0b 04/24/2014
Sep 12 08:56:50 magna070 kernel: ffff88079cb05ee0 000000009f368ec1 ffff8806a9e27920 ffffffff816a3d91
Sep 12 08:56:50 magna070 kernel: ffff8806a9e279b0 ffffffff8169f186 ffff8806a9e279b8 ffffffff812b7e5b
Sep 12 08:56:50 magna070 kernel: ffff88084b22c618 0000000000000206 ffffffff00000206 fffeefff00000000
Sep 12 08:56:50 magna070 kernel: Call Trace:
Sep 12 08:56:50 magna070 kernel: [<ffffffff816a3d91>] dump_stack+0x19/0x1b
Sep 12 08:56:50 magna070 kernel: [<ffffffff8169f186>] dump_header+0x90/0x229
Sep 12 08:56:50 magna070 kernel: [<ffffffff812b7e5b>] ? cred_has_capability+0x6b/0x120
Sep 12 08:56:50 magna070 kernel: [<ffffffff81186394>] oom_kill_process+0x254/0x3d0
Sep 12 08:56:50 magna070 kernel: [<ffffffff812b803e>] ? selinux_capable+0x2e/0x40
Sep 12 08:56:50 magna070 kernel: [<ffffffff81186bd6>] out_of_memory+0x4b6/0x4f0
Sep 12 08:56:50 magna070 kernel: [<ffffffff8169fc8a>] __alloc_pages_slowpath+0x5d6/0x724
Sep 12 08:56:50 magna070 kernel: [<ffffffff8118cd85>] __alloc_pages_nodemask+0x405/0x420
Sep 12 08:56:50 magna070 kernel: [<ffffffff811d1108>] alloc_pages_current+0x98/0x110
Sep 12 08:56:50 magna070 kernel: [<ffffffff81182917>] __page_cache_alloc+0x97/0xb0
Sep 12 08:56:50 magna070 kernel: [<ffffffff81184eb0>] filemap_fault+0x170/0x410
Sep 12 08:56:50 magna070 kernel: [<ffffffffc025b156>] ext4_filemap_fault+0x36/0x50 [ext4]
Sep 12 08:56:50 magna070 kernel: [<ffffffff811ad162>] __do_fault+0x52/0xe0
Sep 12 08:56:50 magna070 kernel: [<ffffffff811ad60b>] do_read_fault.isra.44+0x4b/0x130
Sep 12 08:56:50 magna070 kernel: [<ffffffff811b1f11>] handle_mm_fault+0x691/0xfa0
Sep 12 08:56:50 magna070 kernel: [<ffffffff81036509>] ? __restore_xstate_sig+0x99/0x5c0
Sep 12 08:56:50 magna070 kernel: [<ffffffff8102a5e0>] ? do_signal+0x1d0/0x6c0
Sep 12 08:56:50 magna070 kernel: [<ffffffff816affb4>] __do_page_fault+0x154/0x450
Sep 12 08:56:50 magna070 kernel: [<ffffffff816b02e5>] do_page_fault+0x35/0x90
Sep 12 08:56:50 magna070 kernel: [<ffffffff816ac508>] page_fault+0x28/0x30
Sep 12 08:56:50 magna070 kernel: Mem-Info:
Sep 12 08:56:50 magna070 kernel: active_anon:8014350 inactive_anon:21059 isolated_anon:0#012 active_file:0 inactive_file:0 isolated_file:0#012 unevictable:0 dirty:0 writeback:0 unstable:0#012 slab_reclaimable:6818 slab_unreclaimable:13338#012 mapped:18968 shmem:21143 pagetables:17129 bounce:0#012 free:53730 free_pcp:3553 free_cma:0
Sep 12 08:56:50 magna070 kernel: Node 0 DMA free:15800kB min:32kB low:40kB high:48kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15972kB managed:15800kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes

Version-Release number of selected component (if applicable):
ceph: 12.2.0-1.el7cp (b661348f156f148d764b998b65b90451f096cb27) luminous (rc)

How reproducible:
1/1

Steps to Reproduce:
NA

Actual results:
NA

Expected results:
NA

Additional info:
NA

Comment 2 Patrick Donnelly 2017-09-13 21:29:57 UTC
Looks like some kind of leak or runaway cache. We need the debug logs for the client to diagnose.

Comment 3 Ramakrishnan Periyasamy 2017-09-14 06:08:37 UTC
Created attachment 1325770 [details]
Client logs.

Attaching Client logs.

Comment 6 Yan, Zheng 2017-09-29 11:21:55 UTC
I reproduced this locally. The reason it that fio does 1 byte random writes, ceph-fuse adds 1 byte data to object cacher. Size of data structures that used for tracking the 1 byte dirty data are tens of bytes. By default, max size of object cacher is configured to 200M, the actual memory usage is amplified to tens of times.

Comment 7 Brett Niver 2017-10-03 18:49:58 UTC
@Zheng - are we saying this behavior is just a function of the test?

Comment 8 Yan, Zheng 2017-10-04 00:23:22 UTC
yes, writing 1 byte is function of the test. the test reveals flaw of memory management of object cacher

Comment 9 Brett Niver 2017-10-05 12:26:33 UTC
@Zheng, @Patrick - please decide for sure, and then move BZ to Rados, if that's the correct target.

Comment 11 Yan, Zheng 2017-10-09 10:30:25 UTC
RP https://github.com/ceph/ceph/pull/18183

Comment 15 Ramakrishnan Periyasamy 2017-10-18 10:07:07 UTC
Moving this bug to Verified state.

Verified in ceph version 12.2.1-20.el7cp (a545c917eb9f08c5e8dec406dac5e123accea3a3) luminous (stable)

Comment 18 errata-xmlrpc 2017-12-05 23:42:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3387


Note You need to log in before you can comment on or make changes to this bug.