Description of problem: Reported by a community developer, " Recently we found mds load might become zero on another MDS under multi-MDSes scenario. The ceph version is Luminous. From below log, MDS.1 got its load and would send it to MDS.0. ``` 2018-05-13 17:20:31.252804 7ffa471d7700 0 mds.1.bal mds.1 epoch 17 load mdsload<[0,6245.74 12491.5]/[0,33581.5 67163], req 13440, hr 0, qlen 18, cpu 0.04> ``` When MDS.0 handled the heartbeat and got MDS.1's load from message, the load became zero. ``` 2018-05-13 17:20:30.988828 7f65d8b45700 0 mds.0.bal mds.0 epoch 17 load mdsload<[5580.31,18907.2 43394.8]/[33213.8,109221 251656], req 42543, hr 0, qlen 36, cpu 0.75> 2018-05-13 17:20:31.280096 7f65db34a700 0 mds.0.bal mds.0 mdsload<[5580.31,18907.2 43394.8]/[33213.8,109221 251656], req 42543, hr 0, qlen 36, cpu 0.75> = 127950 ~ 43394.8 2018-05-13 17:20:31.280113 7f65db34a700 0 mds.0.bal mds.1 mdsload<[0,0 0]/[0,0 0], req 13440, hr 0, qlen 18, cpu 0.04> = 37045.8 ~ 12564.2 ``` We found the last_decay in this message is 0 (utime_t()), so the eclipse time is very large and the original value would be decayed to 0. I think we should not decay any value if last_decay is utime_t()."
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2375