Bug 1554593

Summary: Kernel and fuse client mount hangs forever
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: liuwei <wliu>
Component: CephFSAssignee: Yan, Zheng <zyan>
Status: CLOSED ERRATA QA Contact: Ramakrishnan Periyasamy <rperiyas>
Severity: high Docs Contact: Aron Gunn <agunn>
Priority: high    
Version: 3.0CC: agunn, bhubbard, ceph-eng-bugs, ceph-qe-bugs, dzafman, jbiao, john.spray, jquinn, kchai, kdreyer, linuxkidd, pdonnell, rperiyas, tchandra, tserlin, vumrao, wliu, zyan
Target Milestone: z3   
Target Release: 3.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: RHEL: ceph-12.2.4-10.el7cp Ubuntu: ceph_12.2.4-14redhat1xenial Doc Type: Bug Fix
Doc Text:
.Sending large amounts of metadata to another MDS will not cause an exporting mount to fail Previously, when sending large amounts of metadata to another MDS would cause the client mounts to fail. Resulting in the failure of heartbeat beacons sent to the Ceph Monitors. The exporting MDS would be marked by the Ceph Monitors as laggy/unavailable and then be removed, allowing the standby MDS to take over. In this release, the MDS limits the time it spends exporting metadata, allowing mounts to be processed promptly.
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-05-15 18:20:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description liuwei 2018-03-13 00:53:05 UTC
Description of problem:


Kernel mount/fuse mount to Ceph FS hangs intermittently.  This started happening recently in the latest few hours. Tried using all monitor server in mount command, still same problem.

For the segment of strace output is below:

read(8, "# /etc/security/limits.conf\n#\n#T"..., 4096) = 2422
read(8, "", 4096)                       = 0
close(8)                                = 0
munmap(0x7f5c84084000, 4096)            = 0
openat(AT_FDCWD, "/etc/security/limits.d", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 8
brk(NULL)                               = 0x7f5c861d1000
brk(0x7f5c861f5000)                     = 0x7f5c861f5000
getdents(8, /* 3 entries */, 32768)     = 88
open("/usr/lib64/gconv/gconv-modules.cache", O_RDONLY) = 9
fstat(9, {st_mode=S_IFREG|0644, st_size=26254, ...}) = 0
mmap(NULL, 26254, PROT_READ, MAP_SHARED, 9, 0) = 0x7f5c8407e000
close(9)                                = 0
futex(0x7f5c8360ba80, FUTEX_WAKE_PRIVATE, 2147483647) = 0
getdents(8, /* 0 entries */, 32768)     = 0
brk(NULL)                               = 0x7f5c861f5000
brk(NULL)                               = 0x7f5c861f5000
brk(0x7f5c861ed000)                     = 0x7f5c861ed000
brk(NULL)                               = 0x7f5c861ed000
close(8)                                = 0
open("/etc/security/limits.d/20-nproc.conf", O_RDONLY) = 8
fstat(8, {st_mode=S_IFREG|0644, st_size=191, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5c8407d000
read(8, "# Default limit for number of us"..., 4096) = 191
read(8, "", 4096)                       = 0
close(8)                                = 0
munmap(0x7f5c8407d000, 4096)            = 0
setrlimit(RLIMIT_NPROC, {rlim_cur=29182, rlim_max=29182}) = 0
setpriority(PRIO_PROCESS, 0, 0)         = 0
socket(AF_NETLINK, SOCK_RAW, NETLINK_AUDIT) = 8
fcntl(8, F_SETFD, FD_CLOEXEC)           = 0
sendto(8, "\230\0\0\0Q\4\5\0\3\0\0\0\0\0\0\0op=PAM:session_o"..., 152, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 152
poll([{fd=8, events=POLLIN}], 1, 500)   = 1 ([{fd=8, revents=POLLIN}])
recvfrom(8, "$\0\0\0\2\0\0\0\3\0\0\0D\357\377\377\0\0\0\0\230\0\0\0Q\4\5\0\3\0\0\0"..., 8988, MSG_PEEK|MSG_DONTWAIT, {sa_family=AF_NETLINK, pid=0, groups=00000000}, [12]) = 36
recvfrom(8, "$\0\0\0\2\0\0\0\3\0\0\0D\357\377\377\0\0\0\0\230\0\0\0Q\4\5\0\3\0\0\0"..., 8988, MSG_DONTWAIT, {sa_family=AF_NETLINK, pid=0, groups=00000000}, [12]) = 36
close(8)                                = 0
getrlimit(RLIMIT_NPROC, {rlim_cur=29182, rlim_max=29182}) = 0
setrlimit(RLIMIT_NPROC, {rlim_cur=RLIM64_INFINITY, rlim_max=RLIM64_INFINITY}) = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f5c84078b10) = 18728
close(4)                                = 0
select(7, [3 6], [], NULL, NULL


Version-Release number of selected component (if applicable):


cat BaseOS-3.1.3-2017-11-22/installed-debs | grep ceph 
ii  ceph-common                          12.2.1-42redhat1xenial                     amd64        common utilities to mount and interact with a ceph storage cluster
ii  ceph-fuse                            12.2.1-42redhat1xenial                     amd64        FUSE-based client for the Ceph distributed file system
ii  libcephfs2                           12.2.1-42redhat1xenial                     amd64        Ceph distributed file system client library
ii  python-cephfs                        12.2.1-42redhat1xenial                     amd64        Python 2 libraries for the Ceph libcephfs library


 cat BaseOS-3.1.3-2017-11-22/uname 
Linux BaseOS-3.1.3-2017-11-22 4.13.0-25-generic #29~16.04.2-Ubuntu SMP Tue Jan 9 12:16:39 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

How reproducible:

100%reproduced

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 9 Brad Hubbard 2018-03-15 08:47:53 UTC
Search the logs for when the first pg went to state "unknown" compare that to entries in the other logs (cluster, mon, etc) around that time. Something major happened for all pgs to go to state "unknown" and we should have logged something about it.

pgs:     100.000% pgs unknown
         2176 unknown

Comment 19 Yan, Zheng 2018-03-16 01:07:40 UTC
the first issue (not able to mount cephfs) can be related to the second isssue. To mount a client, mds need to records session information in object store. The second issue prevents osd from handling any request. So mount hangs

Comment 20 Brad Hubbard 2018-03-16 02:08:27 UTC
(In reply to Yan, Zheng from comment #19)
> the first issue (not able to mount cephfs) can be related to the second
> isssue. To mount a client, mds need to records session information in object
> store. The second issue prevents osd from handling any request. So mount
> hangs

Hi Yan,

The timeline we have says the first issue (not able to mount cephfs) was seen well before the second issue (MGR can not communicate with OSDs).

"1.	The original issue, where the cephFS was unmount able / intermittent mounts even when the ceph status showed OK ie all GOOD."

Comment 27 Yan, Zheng 2018-03-20 02:45:03 UTC
yes, I looks like balancer issue. It exports too much at a time.

Comment 100 errata-xmlrpc 2018-05-15 18:20:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1563