Bug 1593322 - load might become zero on a MDS in multi-MDS setup
Summary: load might become zero on a MDS in multi-MDS setup
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: CephFS
Version: 3.0
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: z5
: 3.0
Assignee: Patrick Donnelly
QA Contact: ceph-qe-bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-06-20 14:39 UTC by Ram Raja
Modified: 2018-08-09 18:27 UTC (History)
4 users (show)

Fixed In Version: RHEL: ceph-12.2.4-32 Ubuntu: ceph_12.2.4-36redhat1xenial
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-08-09 18:27:11 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Ceph Project Bug Tracker 24538 None None None 2018-06-20 14:39:38 UTC
Red Hat Product Errata RHBA-2018:2375 None None None 2018-08-09 18:27:42 UTC

Description Ram Raja 2018-06-20 14:39:39 UTC
Description of problem:

Reported by a community developer,
"
Recently we found mds load might become zero on another MDS under multi-MDSes scenario. The ceph version is Luminous.

From below log, MDS.1 got its load and would send it to MDS.0.
```
2018-05-13 17:20:31.252804 7ffa471d7700  0 mds.1.bal mds.1 epoch 17 load mdsload<[0,6245.74 12491.5]/[0,33581.5 67163], req 13440, hr 0, qlen 18, cpu 0.04>
``` 

When MDS.0 handled the heartbeat and got MDS.1's load from message, the load became zero.
```
2018-05-13 17:20:30.988828 7f65d8b45700  0 mds.0.bal mds.0 epoch 17 load mdsload<[5580.31,18907.2 43394.8]/[33213.8,109221 251656], req 42543, hr 0, qlen 36, cpu 0.75>
2018-05-13 17:20:31.280096 7f65db34a700  0 mds.0.bal   mds.0 mdsload<[5580.31,18907.2 43394.8]/[33213.8,109221 251656], req 42543, hr 0, qlen 36, cpu 0.75> = 127950 ~ 43394.8
2018-05-13 17:20:31.280113 7f65db34a700  0 mds.0.bal   mds.1 mdsload<[0,0 0]/[0,0 0], req 13440, hr 0, qlen 18, cpu 0.04> = 37045.8 ~ 12564.2
```

We found the last_decay in this message is 0 (utime_t()), so the eclipse time is very large and the original value would be decayed to 0. I think we should not decay any value if last_decay is utime_t()."

Comment 14 errata-xmlrpc 2018-08-09 18:27:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2375


Note You need to log in before you can comment on or make changes to this bug.