Hide Forgot
Description of problem: All rbd commands hang indefinitely in the latest build of ceph. the repos can be found here: http://puddle.ceph.redhat.com/puddles/rhscon/2/latest/ http://puddle.ceph.redhat.com/puddles/ceph/2/latest/ The cluster install goes through fine. But the cluster is always stuck in HEALTH_ERR Version-Release number of selected component (if applicable): ceph 10.1.1 How reproducible: Always Steps to Reproduce: 1. Install a ceph cluster . Additional info: [root@magna009 ~]# ceph osd tree 2016-04-19 10:22:07.690321 7f8c3f7dd700 -1 asok(0x7f8c38001680) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/rbd-clients/ceph-client.admin.3095.140240211679152.asok': (2) No such file or directory ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 13.49236 root default -2 2.69847 host magna052 0 0.89949 osd.0 up 1.00000 1.00000 9 0.89949 osd.9 up 1.00000 1.00000 13 0.89949 osd.13 up 1.00000 1.00000 -3 2.69847 host magna077 2 0.89949 osd.2 up 1.00000 1.00000 6 0.89949 osd.6 up 1.00000 1.00000 11 0.89949 osd.11 up 1.00000 1.00000 -4 2.69847 host magna046 1 0.89949 osd.1 up 1.00000 1.00000 3 0.89949 osd.3 up 1.00000 1.00000 7 0.89949 osd.7 up 1.00000 1.00000 -5 2.69847 host magna080 5 0.89949 osd.5 up 1.00000 1.00000 12 0.89949 osd.12 up 1.00000 1.00000 14 0.89949 osd.14 up 1.00000 1.00000 -6 2.69847 host magna058 4 0.89949 osd.4 up 1.00000 1.00000 8 0.89949 osd.8 up 1.00000 1.00000 10 0.89949 osd.10 up 1.00000 1.00000 root@magna009 ~]# ceph -s 2016-04-19 10:31:53.320634 7ffa9aa85700 -1 asok(0x7ffa94001680) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/rbd-clients/ceph-client.admin.3127.140714201585584.asok': (2) No such file or directory cluster aab37108-8f14-49fe-b581-d5c9bd63b5ac health HEALTH_ERR 23 pgs are stuck inactive for more than 300 seconds 23 pgs peering 23 pgs stuck inactive monmap e1: 3 mons at {magna009=10.8.128.9:6789/0,magna031=10.8.128.31:6789/0,magna046=10.8.128.46:6789/0} election epoch 12, quorum 0,1,2 magna009,magna031,magna046 osdmap e92: 15 osds: 15 up, 15 in; 7 remapped pgs flags sortbitwise pgmap v275: 192 pgs, 2 pools, 0 bytes data, 0 objects 525 MB used, 13815 GB / 13815 GB avail 169 active+clean 11 peering 7 remapped+peering 5 creating+peering [root@magna046 ~]# rbd create Tejas/img --size 5G 2016-04-19 10:40:51.530122 7f00370fad80 -1 asok(0x7f0041fe7e60) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/rbd-clients/ceph-client.admin.4070.139639083925648.asok': (2) No such file or directory ^C [root@magna046 ~]# [root@magna031 ~]# rbd ls -l Tejas 2016-04-19 10:34:07.905834 7f157ea7bd80 -1 asok(0x7f1589de3e30) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/rbd-clients/ceph-client.admin.2833.139730484084880.asok': (2) No such file or directory 2016-04-19 10:34:07.908651 7f1560557700 0 -- 10.8.128.31:0/3899757158 >> 10.8.128.46:6808/3521 pipe(0x7f1589e4a620 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f1589e4b8e0).fault ^C [root@magna031 ~]#
I have seen the same yesterday and thought it was something to do with my Network configuration. I checked the OSD logs, the osd's were reporting connectivity issue. After that i flushed the iptable it started working for me for the same Build. [CEPH-2] name=CEPH-2 baseurl=http://puddle.ceph.redhat.com/puddles/ceph/2/2016-04-16.1/CEPH-2/$basearch/os gpgcheck=0 enabled=1 [CEPH-2-debug] name=CEPH-2 Debuginfo baseurl=http://puddle.ceph.redhat.com/puddles/ceph/2/2016-04-16.1/CEPH-2/$basearch/debuginfo gpgcheck=0 enabled=0 [CEPH-2-sources] name=CEPH-2 Sources baseurl=http://puddle.ceph.redhat.com/puddles/ceph/2/2016-04-16.1/CEPH-2/source gpgcheck=0 enabled=0 Note: For earlier build, i didn't need to flush the Iptable.
It's my understanding that you must open the ports in the firewalls on your cluster nodes prior to running the installer. Did you do that?
Tejas, Would you please answer Ken's query "It's my understanding that you must open the ports in the firewalls on your cluster nodes prior to running the installer. Did you do that?"
For example, on the monitors, I use the following (Ansible): - firewalld: port: 6789/tcp immediate: true permanent: true state: enabled And on the OSDs: - firewalld: port: 6800-7300/tcp immediate: true permanent: true state: enabled
Hi Ken, We do have the same firewall ports open when running installer. However this issue was seen when we upgraded our setup to a different repo ceph. Now we rae not seeing this issue. I will go ahead and close this bug for now. Thanks, Tejas