Bug 1501958

Summary:	[CephFS]:- Cluster ended up in "damaged" mds when subtree pinning is in progress and tried to do mds failover
Product:	[Red Hat Storage] Red Hat Ceph Storage	Reporter:	shylesh <shmohan>
Component:	CephFS	Assignee:	Patrick Donnelly <pdonnell>
Status:	CLOSED ERRATA	QA Contact:	Ramakrishnan Periyasamy <rperiyas>
Severity:	urgent	Docs Contact:
Priority:	high
Version:	3.0	CC:	ceph-eng-bugs, hnallurv, john.spray, kdreyer, pdonnell, rperiyas, shmohan, tserlin, zyan
Target Milestone:	z2
Target Release:	3.0
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	RHEL: ceph-12.2.4-4.el7cp Ubuntu: ceph_12.2.4-5redhat1xenial	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-04-26 17:38:39 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Comment 5 Yan, Zheng 2017-10-16 13:25:58 UTC

External Bug ID: Ceph Project Bug Tracker 21812

Seem like that standby replay mds submitted log entry

Comment 6 Yan, Zheng 2017-10-17 02:50:56 UTC

I wrongly interpret the log. looks like two mds wrote to object 200.00004273 at the same time. something must be wrong with blacklist

In osd.3.log at magna103:/var/log/ceph
<pre>
2017-10-13 14:10:16.312400 7f308a412700 10 osd.3 pg_epoch: 849 pg[2.3( v 849'910079 (841'908563,849'910079] local-lis/les=668/669 n=76028 ec=3/3 lis/c 668/668 les/c/f 669/670/0 668/668/371) [3,0,8] r=0 lpr=668 luod=849'910050 lua=849'910055 crt=849'910079 lcod 848'910049 mlcod 845'910047 active+clean]  sending reply on osd_op(mds.0.2269:12216 2.3 2:c78e7855:::200.00004273:head [write 842784~1373 [fadvise_dontneed]] snapc 0=[] ondisk+write+known_if_redirected+full_force e849) v8 0x9914830a80

... 

2017-10-13 14:11:10.061530 7f309ac33700 10 osd.3 pg_epoch: 851 pg[2.3( v 851'910221 (841'908663,851'910221] local-lis/les=668/669 n=76028 ec=3/3 lis/c 668/668 les/c/f 669/670/0 668/668/371) [3,0,8] r=0 lpr=668 luod=851'910216 lua=851'910215 crt=851'910221 lcod 851'910215 mlcod 851'910214 active+clean]  sending reply on osd_op(mds.0.2207:27831 2.3 2:c78e7855:::200.00004273:head [write 842784~2354 [fadvise_dontneed]] snapc 0=[] ondisk+write+known_if_redirected+full_force e846) v8 0x99126e2700
</pre>

mds.0.2269 first wrote an log entry at offset 842784, then mds.0.2207 wrote another log entry at the same offset. mds.0.2207 was the laggy mds, which should be blacklisted.

Comment 7 Yan, Zheng 2017-10-17 03:00:38 UTC

sudo ceph -c /etc/ceph/cfs.conf daemon mon.magna023 config get mds_blacklist_interval
{
    "mds_blacklist_interval": "5.000000"
}

5 seconds are too short, you should use default value. the issue was caused by wrong config.

Comment 9 John Spray 2017-10-17 12:55:15 UTC

mds_blacklist_interval is only used on monitor daemons.  You do not need to modify this from the default.

Setting a short blacklist interval is effectively the same as preventing the monitors from blacklisting failed MDSs, and it will break the system.

Comment 19 Ramakrishnan Periyasamy 2018-04-03 09:33:12 UTC

provided QA_ACK, clearing need info, Please move to bug to ON_QA

Comment 21 Ramakrishnan Periyasamy 2018-04-03 10:10:26 UTC

Ken, could you please move this bug to ON_QA

Comment 23 tserlin 2018-04-03 14:30:16 UTC

(In reply to Ramakrishnan Periyasamy from comment #21)
> Ken, could you please move this bug to ON_QA

Done.

Thomas

Comment 24 Ramakrishnan Periyasamy 2018-04-03 15:31:45 UTC

Moving this bug to verified state, updated the command output in comment 20

tested in ceph version  ceph-12.2.4-4.el7cp

Comment 28 errata-xmlrpc 2018-04-26 17:38:39 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1259