Bug 1347664 - Seeing a continuous error messge "librbd::ObjectWatcher: rbd_mirroring: handle_rewatch: error encountered during re-watch: (108) Cannot send after transport endpoint shutdown"
Summary: Seeing a continuous error messge "librbd::ObjectWatcher: rbd_mirroring: handl...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RBD
Version: 2.0
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: 2.1
Assignee: Jason Dillaman
QA Contact: Rachana Patel
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-06-17 11:12 UTC by Tanay Ganguly
Modified: 2017-07-30 15:34 UTC (History)
5 users (show)

Fixed In Version: RHEL: ceph-10.2.3-2.el7cp Ubuntu: ceph_10.2.3-3redhat1xenial
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-11-22 19:26:45 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 16349 0 None None None 2016-06-17 13:17:22 UTC
Red Hat Product Errata RHSA-2016:2815 0 normal SHIPPED_LIVE Moderate: Red Hat Ceph Storage security, bug fix, and enhancement update 2017-03-22 02:06:33 UTC

Description Tanay Ganguly 2016-06-17 11:12:15 UTC
Description of problem:
While syncing/re-syncing of multiple snapshot i am seeing these error message continuously, which is flooding my log file. It went upto 4 GB in size.

librbd::ObjectWatcher: rbd_mirroring: handle_rewatch: error encountered during re-watch: (108) Cannot send after transport endpoint shutdow

Version-Release number of selected component (if applicable):
rbd-mirror-10.2.2-1.el7cp.x86_64

How reproducible:
Once

Steps to Reproduce:
1. Create an Image with Journal Enabled.
2. Create 25 snapshots.
3. While snapshot was getting synced, disable Journal.
4. After a while again enable Journal, the remaining snapshot gets synced.
5. Then delete 15 snapshots, while deletion was happening disable Journal.
6. After a while again enable Journal, the remaining snapshot gets synced.

While it was syncing (step 6), i saw the Journal state showing as Disabling for a while on the Slave Node

But later it got corrected and properly sycned to the Master Node
But it generated a Core, logs and rbd-mirror status was continiously showing as:

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

2016-06-17 09:44:59.999928 7fa7dd7fa700 -1 librbd::ObjectWatcher: rbd_mirroring: handle_rewatch: error encountered during re-watch: (108) Cannot send after transport endpoint shutdown
2016-06-17 09:44:59.999933 7fa7dd7fa700 20 librbd::ObjectWatcher: rbd_mirroring: rewatch
2016-06-17 09:45:00.000255 7fa7dd7fa700 20 librbd::ObjectWatcher: rbd_mirroring: handle_rewatch: r=-108
2016-06-17 09:45:00.000263 7fa7dd7fa700 -1 librbd::ObjectWatcher: rbd_mirroring: handle_rewatch: error encountered during re-watch: (108) Cannot send after transport endpoint shutdown
2016-06-17 09:45:00.000270 7fa7dd7fa700 20 librbd::ObjectWatcher: rbd_mirroring: rewatch


-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

2016-06-16 20:33:09.917456 7efddcff9700 -1 librbd::ImageWatcher: 0x7efd84030e80 image watch failed: 139627306619536, (107) Transport endpoint is not connected
2016-06-16 20:33:11.649410 7efddcff9700 -1 librbd::ImageWatcher: 0x7efd84022530 image watch failed: 139627306695968, (107) Transport endpoint is not connected
2016-06-16 20:33:11.887698 7efddcff9700 -1 librbd::ImageWatcher: 0x7efd84047a50 image watch failed: 139627306561744, (107) Transport endpoint is not connected
2016-06-16 20:33:39.077626 7efddcff9700 -1 librbd::ImageWatcher: 0x7efd84048470 image watch failed: 139627306601040, (107) Transport endpoint is not connected
2016-06-16 20:33:40.255308 7efddcff9700 -1 librbd::ImageWatcher: 0x7efd84030940 image watch failed: 139627306546144, (107) Transport endpoint is not connected
2016-06-16 20:33:40.533680 7efddcff9700 -1 librbd::ImageWatcher: 0x7efd84031d70 image watch failed: 139627306549744, (107) Transport endpoint is not connected


-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Actual results:
Seeing lots of Error messages.

Expected results:
There should not be any Error message

Additional info:

Has to restart the rbd-mirror to stop the message flood.

Comment 3 Tanay Ganguly 2016-06-17 11:15:58 UTC
@Jason:

As the Log file were huge to attach.
Please login to 

Slave: magna003

Log File : /var/log/ceph/qemu-guest-20365.log
Core:      /tmp/cores/core.tp_librbd.10600.magna003.1466156585

Master: 10.40.44.44

root/redhat

Comment 5 Jason Dillaman 2016-06-17 13:16:33 UTC
Cluster overloaded and unable to service watch heartbeats within the default 30 second timeout period.

Comment 6 Jason Dillaman 2016-08-10 19:17:28 UTC
Upstream pull request: https://github.com/ceph/ceph/pull/9970

Comment 11 Rachana Patel 2016-10-31 14:31:41 UTC
Verified with 10.2.3-10.el7cp.x86_64

- no core generated. error messages have been observed but its not flooding log and rbd-mirror daemon is not showing faulty status. Hence moving to verified.

Comment 13 errata-xmlrpc 2016-11-22 19:26:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2815.html


Note You need to log in before you can comment on or make changes to this bug.