Bug 1652464

Summary: [CephFS] ceph health in ERROR state with damaged MDS
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Persona non grata <nobody+410372>
Component: CephFSAssignee: Patrick Donnelly <pdonnell>
Status: CLOSED ERRATA QA Contact: Persona non grata <nobody+410372>
Severity: high Docs Contact: Aron Gunn <agunn>
Priority: high    
Version: 3.2CC: ceph-eng-bugs, edonnell, jbrier, pasik, pdonnell, rperiyas, sweil, tchandra, tserlin
Target Milestone: z1Keywords: Automation
Target Release: 3.2   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: RHEL: ceph-12.2.8-64.el7cp Ubuntu: ceph_12.2.8-49redhat1 Doc Type: Bug Fix
Doc Text:
.When Monitors cannot reach an MDS, they no longer incorrectly mark its rank as damaged Previously, Monitors were evicting and fencing an unreachable Metadata Server (MDS), then MDS was signaling that its rank was damaged due to improper handling of blacklist errors. Consequently, Monitors were incorrectly marking the rank as damaged, and the file system became unavailable because of one or more damaged ranks. In this release, the Monitors are setting the correct rank.
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-03-07 15:51:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1629656    

Description Persona non grata 2018-11-22 07:41:52 UTC
Description of problem:
ceph health in ERROR state with damaged MDS, while running cephfs sanity on ubuntu with lvm config and bluestore

Version-Release number of selected component (if applicable):
ceph-ansible 3.2.0~rc3-2redhat1
ceph version 12.2.8-31redhat1xenial

How reproducible:
1/1 

Steps to Reproduce:
1. Have 4 MDSs(2 active+ 2 standby, 4 clients ( 2 fuse + 2 kernel), create 1000 dirs.
2. Do MDS pinning, one MDS with maximum pinning and another with less pinning) and start client IO on directories.	
3. Do failover of active MDSs

Actual results:
===================
Filesystem 'cephfs_new' (5)
fs_name cephfs_new
epoch   855
flags   c
created 2018-11-20 20:53:31.865953
modified        2018-11-20 23:52:44.334002
tableserver     0
root    0
session_timeout 60
session_autoclose       300
max_file_size   1099511627776
last_failure    0
last_failure_osd_epoch  222
compat  compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no anchor table,9=file layout v2}
max_mds 2
in      0,1
up      {0=5594}
failed
damaged 1
stopped
data_pools      [5]
metadata_pool   2
inline_data     disabled
balancer
standby_count_wanted    1
5594:   172.16.115.40:6800/3369263764 'ceph-xeniallvm-1542736791764-node6-mds' mds.0.843 up:active seq 64


Standby daemons:

5467:   172.16.115.76:6800/128262811 'ceph-xeniallvm-1542736791764-node3-mds' mds.-1.0 up:standby seq 109
5476:   172.16.115.66:6800/4128937827 'ceph-xeniallvm-1542736791764-node4-mds' mds.-1.0 up:standby seq 2
5883:   172.16.115.29:6800/1229698542 'ceph-xeniallvm-1542736791764-node12-mds' mds.-1.0 up:standby seq 2

==============

  cluster:
    id:     71618eef-ed70-45ff-a191-a2255c5904e2
    health: HEALTH_ERR
            1 filesystem is degraded
            1 mds daemon damaged
            clock skew detected on mon.ceph-xeniallvm-1542736791764-node15-monmgr, mon.ceph-xeniallvm-1542736791764-node14-monmgr
 
  services:
    mon: 3 daemons, quorum ceph-xeniallvm-1542736791764-node1-monmgrinstaller,ceph-xeniallvm-1542736791764-node15-monmgr,ceph-xeniallvm-1542736791764-node14-monmgr
    mgr: ceph-xeniallvm-1542736791764-node15-monmgr(active), standbys: ceph-xeniallvm-1542736791764-node14-monmgr, ceph-xeniallvm-1542736791764-node1-monmgrinstaller
    mds: cephfs_new-1/2/2 up  {0=ceph-xeniallvm-1542736791764-node6-mds=up:resolve}, 3 up:standby, 1 damaged
    osd: 33 osds: 33 up, 33 in
 
  data:
    pools:   6 pools, 384 pgs
    objects: 28.81k objects, 6.47GiB
    usage:   66.8GiB used, 923GiB / 990GiB avail
    pgs:     384 active+clean

==================
cephfs_new - 2 clients
==========
+------+---------+----------------------------------------+----------+-------+-------+
| Rank |  State  |                  MDS                   | Activity |  dns  |  inos |
+------+---------+----------------------------------------+----------+-------+-------+
|  0   | resolve | ceph-xeniallvm-1542736791764-node6-mds |          |    0  |    0  |
|  1   |  failed |                                        |          |       |       |
+------+---------+----------------------------------------+----------+-------+-------+
+-----------------+----------+-------+-------+
|       Pool      |   type   |  used | avail |
+-----------------+----------+-------+-------+
| cephfs_metadata | metadata |  367M |  287G |
|    data_pool    |   data   | 6258M |  627G |
+-----------------+----------+-------+-------+

+-----------------------------------------+
|               Standby MDS               |
+-----------------------------------------+
|  ceph-xeniallvm-1542736791764-node3-mds |
|  ceph-xeniallvm-1542736791764-node4-mds |
| ceph-xeniallvm-1542736791764-node12-mds |
+-----------------------------------------+
=======================
Expected results:

No damage message should be seen

Additional info:
Logs kept in: magna002.ceph.redhat.com:/home/sshreeka/bz_logs/ceph-xeniallvm-1542736791764

In leader MON log, found this:
 2018-11-20 23:45:47.331407 7f5959fa4700  1 mds.0.736 skipping upkeep work because connection to Monitors appears laggy
 2018-11-20 23:45:48.873857 7f595c7a9700  5 mds.0.736  laggy, deferring client_session(request_renewcaps seq 79) v1
 2018-11-20 23:45:49.266075 7f595c7a9700  5 mds.0.736  laggy, deferring client_session(request_renewcaps seq 79) v1
 2018-11-20 23:45:49.302020 7f59597a3700  5 mds.beacon.ceph-xeniallvm-1542736791764-node6-mds Sending beacon up:active seq 859
 2018-11-20 23:45:51.530410 7f595779f700 -1 mds.0.journaler.pq(rw) _finish_write_head got (108) Cannot send after transport endpoint shutdown
 2018-11-20 23:45:51.530428 7f595779f700 -1 mds.0.journaler.pq(rw) handle_write_error (108) Cannot send after transport endpoint shutdown
 2018-11-20 23:45:51.530468 7f595779f700  5 mds.beacon.ceph-xeniallvm-1542736791764-node6-mds set_want_state: up:active -> down:damaged
 2018-11-20 23:45:51.532007 7f595779f700  5 mds.beacon.ceph-xeniallvm-1542736791764-node6-mds Sending beacon down:damaged seq 860
 2018-11-20 23:45:55.530367 7f59597a3700  5 mds.beacon.ceph-xeniallvm-1542736791764-node6-mds Sending beacon down:damaged seq 861
 2018-11-20 23:45:56.282268 7f595779f700  1 mds.ceph-xeniallvm-1542736791764-node6-mds respawn!

Comment 14 errata-xmlrpc 2019-03-07 15:51:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0475