Bug 1340004
Summary: | Seeing lots of "heartbeat_map" messages when stopping an MDS Server | ||||||
---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Tanay Ganguly <tganguly> | ||||
Component: | CephFS | Assignee: | John Spray <john.spray> | ||||
Status: | CLOSED ERRATA | QA Contact: | ceph-qe-bugs <ceph-qe-bugs> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | low | ||||||
Version: | 2.0 | CC: | ceph-eng-bugs, hnallurv, john.spray, kdreyer, kurs, nlevine, pdonnell, rperiyas, uboppana | ||||
Target Milestone: | rc | Keywords: | Rebase | ||||
Target Release: | 2.1 | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | RHEL: ceph-10.2.3-2.el7cp Ubuntu: ceph_10.2.3-3redhat1xenial | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2016-11-22 19:26:02 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Tanay Ganguly
2016-05-26 10:36:01 UTC
How are you starting and stopping the MDS (what command?) Please set debug mds = 20, and capture the log from the point at which you ask the MDS to stop, to the point where it eventually stops (so that we can see what else is going on while the heartbeat_map messages are coming). For start/stopping i am using the below command: systemctl stop ceph-mds systemctl start ceph-mds I had debug enabled , but i was unable to attach the logs as the file size was huge. As discussed over IRC, chopped the file having 100000 lines and attached. Steps: 1. I had 3 healthy mds running. 2. Started IO (Performing an Rsync to the mounted directory) 3. Stopped the mds2 ( which was currently being active) , saw the heartbeat_map message on mds2.log. So we have 2 MDS running. 4. Stopped the mds0 ( which was currently being active), again saw the same message in mds0.log. So we have 2 MDS running. But all through IO is continuing. Created attachment 1161946 [details]
72K Lines of the log file
Tanay: can you confirm that the Ceph cluster in use did not have any customisation of the "ms type" config setting? We have a separate report upstream of a similar issue when "ms type = async" was set http://tracker.ceph.com/issues/16396 John: No i didn't had any customization Hi Tanay, I'm looking into this bug and wanted to let you know I've been able to reproduce it: http://tracker.ceph.com/issues/16042#note-8 I'll keep you updated with our progress. Tanay, the fix has been merged into master. A backport of the fix to Jewel is pending. The fix will be in upstream's v10.2.3. Bug Verified, Not seeing any issue while restarting MDS services. ceph version: ceph version 10.2.3-4.el7cp (852125d923e43802a51f681ca2ae9e721eec91ca) RHEL Version: Red Hat Enterprise Linux Server release 7.3 (Maipo) Kernel Version: Linux node2 3.10.0-511.el7.x86_64 #1 SMP Wed Sep 28 12:25:44 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2016-2815.html |