Bug 1155335

Summary:	ceph mon_status hangs
Product:	[Fedora] Fedora	Reporter:	Pete Zaitcev <zaitcev>
Component:	ceph	Assignee:	Boris Ranto <branto>
Status:	CLOSED ERRATA	QA Contact:	Fedora Extras Quality Assurance <extras-qa>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	21	CC:	bkabrda, branto, david, dmalcolm, fedora, harm, ivazqueznet, jonathansteffan, kkeithle, linuxkidd, mstuchli, ncoghlan, rkuska, steve.capper, steve, tomspur, tradej
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	ceph-0.80.7-3.fc21	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2015-01-26 02:31:57 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Pete Zaitcev 2014-10-21 22:51:36 UTC

Description of problem:

All ceph commands that read something from monitors or OSDs hang
at the end. I only signled out "mon_status" for ease of reproducing.
This is most noticeable with "ceph auth get-or-create", because
it blows up ceph-deploy. But really everything reading hangs.
The commands read and print whatever that they are getting
before hanging.

Version-Release number of selected component (if applicable):

ceph-0.80.7-1.fc21.x86_64

How reproducible:

Should be 100% unless it's something in particular installation

Steps to Reproduce:
1. Install ceph with "yum install ceph"
2. Use ceph-deploy or start monitor manually.
   Just do whatever is needed to get a monitor to run.
3. ceph mon_status

Actual results:

[root@kvm-ichi ~]# ceph mon_status
{"name":"kvm-ichi","rank":0,"state":"leader","election_epoch":1,"quorum":[0],"outside_quorum":[],"extra_probe_peers":[],"sync_provider":[],"monmap":{"epoch":1,"fsid":"36c0bece-bdda-462e-9f2c-29ac2caad9a9","modified":"0.000000","created":"0.000000","mons":[{"rank":0,"name":"kvm-ichi","addr":"[fd2d:acfb:74cc:3::3]:6789\/0"}]}}
^C
[root@kvm-ichi ~]# 
 <======= MUST HIT ^C TO EXIT

Expected results:

[root@simbelmyne ~]# ceph mon_status
{"name":"simbelmyne","rank":0,"state":"leader","election_epoch":1,"quorum":[0],"outside_quorum":[],"extra_probe_peers":[],"sync_provider":[],"monmap":{"epoch":1,"fsid":"886c774d-fc5f-49f6-8c8b-49ebb012898d","modified":"2014-10-06 21:37:15.190270","created":"2014-10-06 21:37:15.190270","mons":[{"rank":0,"name":"simbelmyne","addr":"192.168.128.10:6789\/0"}]}}
[root@simbelmyne ~]# 
 <------- exits to shell prompt once done

Additional info:

This may be specific to Fedora 21. The Fedora 20 works as expected.

Comment 1 Pete Zaitcev 2014-10-22 04:27:14 UTC

The hang happens at the end, where sys.exit() is called. Something gums
up the handling of SystemExit, evidently.

Comment 2 Pete Zaitcev 2014-10-23 05:25:36 UTC

Not 100% sure about it, but looks that the server fails to close the
connection, and the client depends on it. Not looked at the code yet,
but here's the straces.

Good:

8303  recvfrom(3, "\350\202o\177\0\0\0\0\201\336[\214\0\0\0\0\0\0\0\0\1", 21, MSG_DONTWAIT, NULL, NULL) = 21
8303  poll([{fd=3, events=POLLIN|0x2000}], 1, 900000 <unfinished ...>

8300  sendmsg(3, {msg_name(0)=NULL, msg_iov(2)=[{"\10", 1}, {"\f\0\0\0\0\0\0\0", 8}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL|MSG_MORE) = 9

8280  write(1, "{\"name\":\"simbelmyne\",\"rank\":0,\"s"..., 360) = 360

8303  <... poll resumed> )              = 1 ([{fd=3, revents=POLLIN|POLLHUP|0x2000}])

Bad:

29096 recvfrom(3, "\350\202o\177\0\0\0\0\316\373\4*\0\0\0\0\0\0\0\0\1", 21, MSG_DONTWAIT, NULL, NULL) = 21
29096 poll([{fd=3, events=POLLIN|0x2000}], 1, 900000 <unfinished ...>

29093 sendmsg(3, {msg_name(0)=NULL, msg_iov(2)=[{"\10", 1}, {"\f\0\0\0\0\0\0\0", 8}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL|MSG_MORE <unfinished ...>
29093 <... sendmsg resumed> )           = 9

29075 write(1, "{\"name\":\"kvm-ichi\",\"rank\":0,\"sta"..., 327) = 327

They both send the \f message, but in the hang case there's no POLLHUP
in response.

Comment 3 Pete Zaitcev 2014-10-24 18:05:02 UTC

On the other hand... If we make clients talk to other servers with
ceph -m host, then the fault seems to lie with the client. Client
from "bad" machine talking to server on "good" machine - hangs.
Client from "good" machine talking to server on "bad" machine -
works okay.

Comment 4 Pete Zaitcev 2014-10-25 05:48:41 UTC

Never mind, comment #2 was terribly misleading. Something else really
occurs. The problem is that we have a __del__ invoking shutdown().
When interpreter exits, it pulls all the destructors and so we
end in shutdown() while in sys.exit(). Then, we use run_in_thread()
to run a C function rados_shutdown. But run_in_thread hangs:

class RadosThread(threading.Thread):
    def __init__(self, target, args=None):
        self.args = args
        self.target = target
        threading.Thread.__init__(self)
    def run(self):
        self.retval = self.target(*self.args)

def run_in_thread(target, args, timeout=0):
    t = RadosThread(target, args)
    t.start()   # <========= HANGS HERE

It hangs like this:

29075 clone(child_stack=0x7fbf56a5eff0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7fbf56a5f9d0, tls=0x7fbf56a5f700, child_tidptr=0x7fbf56a5f9d0) = 29103
29075 futex(0x20e3880, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
29103 set_robust_list(0x7fbf56a5f9e0, 24) = 0
29103 madvise(0x7fbf5625f000, 8368128, MADV_DONTNEED) = 0
29103 _exit(0)                          = ?

Basically the module threading forks and waits on futex, but
the forked thread exits immediately without doing anything instead
of waking our futex. And so we hang.

Comment 5 Boris Ranto 2014-10-30 07:28:56 UTC

Hi,

this was reported upstream a while back [1] and it turned to be a regression in python [2].

Hence, I'm reassigning this to python. The fix currently seems to be to revert the commit that caused the regression or rebase python once it is fixed there (see [2] for details).

[1] http://tracker.ceph.com/issues/8797
[2] http://bugs.python.org/issue21963?

Comment 6 Steve Capper 2014-12-05 18:50:44 UTC

Hi,
I've cherry-picked the following from https://hg.python.org/cpython:
changeset:   93526:4ceca79d1c63
branch:      2.7
parent:      93521:8bc29f5ebeff
user:        Antoine Pitrou <solipsis>
date:        Fri Nov 21 02:04:21 2014 +0100
summary:     Issue #21963: backout issue #1856 patch (avoid crashes and lockups when

and rebuilt the package (for AArch64 running Fedora 21).

This fixed the issues I had experienced with Ceph.

Could this please be either cherry-picked into the Python package or could the base revision be updated to one after: 93526:4ceca79d1c63?

Cheers,
--
Steve

Comment 7 Michael J. Kidd 2015-01-14 00:23:49 UTC

This is fixed in Ceph code.  Details at:
http://tracker.ceph.com/issues/8797

I just manually applied the patch detailed in that tracker and have been able to completely deploy my ceph cluster on Fedora 21.

Shifting from python to ceph.  2 files/rpms will need to be updated:
ceph:/usr/bin/ceph
python-rados:/usr/lib/python2.7/site-packages/rados.py

Without these changes, Ceph is practically useless on Fedora 21.

Thanks,
Michael

Comment 8 Fedora Update System 2015-01-14 10:40:58 UTC

ceph-0.80.7-3.fc21 has been submitted as an update for Fedora 21.
https://admin.fedoraproject.org/updates/ceph-0.80.7-3.fc21

Comment 9 Fedora Update System 2015-01-14 23:59:41 UTC

Package ceph-0.80.7-3.fc21:
* should fix your issue,
* was pushed to the Fedora 21 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing ceph-0.80.7-3.fc21'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2015-0723/ceph-0.80.7-3.fc21
then log in and leave karma (feedback).

Comment 10 Fedora Update System 2015-01-26 02:31:57 UTC

ceph-0.80.7-3.fc21 has been pushed to the Fedora 21 stable repository.  If problems still persist, please make note of it in this bug report.