Bug 1490296

Summary: [iSCSI]; rbd-target-api crashed on one of the GWs during disk creation
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Tejas <tchandra>
Component: iSCSIAssignee: Mike Christie <mchristi>
Status: CLOSED ERRATA QA Contact: Tejas <tchandra>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.0CC: ceph-eng-bugs, ceph-qe-bugs, jdillama
Target Milestone: rc   
Target Release: 3.0   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-12-05 23:42:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 3 Jason Dillaman 2017-09-11 13:45:17 UTC
Sep 11 12:42:43 X rbd-target-api[26872]: ceph version 12.2.0-1.el7cp (b661348f156f148d764b998b65b90451f096cb27) luminous (rc)
Sep 11 12:42:43 X rbd-target-api[26872]: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x7fd76c9ccd10]
Sep 11 12:42:43 X rbd-target-api[26872]: 2: (EventCenter::create_file_event(int, int, EventCallback*)+0x893) [0x7fd76cb4fde3]
Sep 11 12:42:43 X rbd-target-api[26872]: 3: (EventCenter::set_owner()+0x5d9) [0x7fd76cb52219]
Sep 11 12:42:43 X rbd-target-api[26872]: 4: (()+0x44bde2) [0x7fd76cb53de2]
Sep 11 12:42:43 X rbd-target-api[26872]: 5: (()+0xb52b0) [0x7fd76bbee2b0]
Sep 11 12:42:43 X rbd-target-api[26872]: 6: (()+0x7e25) [0x7fd78458be25]
Sep 11 12:42:43 X rbd-target-api[26872]: 7: (clone()+0x6d) [0x7fd783bb034d]
Sep 11 12:42:43 X rbd-target-api[26872]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Sep 11 12:42:43 X systemd[1]: rbd-target-api.service: main process exited, code=killed, status=6/ABRT
Sep 11 12:42:43 X systemd[1]: Unit rbd-target-api.service entered failed state.
Sep 11 12:42:43 X systemd[1]: rbd-target-api.service failed.

This traceback from [1] can only occur if epoll_ctl syscall fails [2] for some reason. It should generate an error log message when this failure occurs so the original errno can be determined. 



[1] https://github.com/ceph/ceph/blob/v12.2.0/src/msg/async/Event.cc#L233
[2] https://github.com/ceph/ceph/blob/v12.2.0/src/msg/async/EventEpoll.cc#L67

Comment 4 Jason Dillaman 2017-09-11 16:27:55 UTC
2017-09-11 21:45:10.161964 7f9a8dffb700 -1 Event(0x7f9a18070b60 nevent=5000 time_id=1).init can't create notify pipe
2017-09-11 21:45:10.162027 7f9a8dffb700 -1 EpollDriver.init unable to do epoll_create: (24) Too many open files
2017-09-11 21:45:10.162038 7f9a8dffb700 -1 Event(0x7f9a1809db60 nevent=0 time_id=1).init failed to init event driver.
2017-09-11 21:45:10.162112 7f9a8dffb700 -1 EpollDriver.init unable to do epoll_create: (24) Too many open files
2017-09-11 21:45:10.162119 7f9a8dffb700 -1 Event(0x7f9a180ad730 nevent=0 time_id=1).init failed to init event driver.
2017-09-11 21:45:10.162207 7f96265bc700 -1 EpollDriver.add_event epoll_ctl: add fd=-1 failed. (9) Bad file descriptor

Need to increase the open file limit within the systemd service file at a minimum -- also should double-check that cluster connections aren't leaking.

Comment 11 errata-xmlrpc 2017-12-05 23:42:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3387