Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 1219818

Summary:	[rbd-openstack] Cannot create nova instances with firewall enabled on ceph cluster running on RHEL 7.1
Product:	[Red Hat Storage] Red Hat Ceph Storage	Reporter:	shilpa <smanjara>
Component:	RBD	Assignee:	Josh Durgin <jdurgin>
Status:	CLOSED DUPLICATE	QA Contact:	ceph-qe-bugs <ceph-qe-bugs>
Severity:	medium	Docs Contact:	John Wilkins <jowilkin>
Priority:	unspecified
Version:	1.3.0	CC:	ceph-eng-bugs, flucifre, kdreyer, smanjara, vumrao
Target Milestone:	rc
Target Release:	1.3.0
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2015-05-22 14:16:52 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description shilpa 2015-05-08 10:56:24 UTC

Description of problem:
ON RHEL 7.1, enable firewall on ceph cluster and opened mon and osd ports. Openstack VM's fail to boot. 


Version-Release number of selected component (if applicable):
# ceph -v
ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)


How reproducible:
Tried once

Steps to Reproduce:
1. Create ceph cluster(3mons and 5 osds) and an openstack setup.
2. Configure RBD for nova,glance and cinder.
3. With firewall disabled, create glance image and boot a nova instance. Ensure the setup is working correctly.
4. Now enable firewall on ceph cluster with port 6789 opened on all mon nodes and open ports for osd traffic between 6800-7100.
5. Disable SELinux on both ceph and openstack clusters.
6. Create a new nova VM instance. 

Actual results:
The instance fails to boot. I can create cinder volume and glance image though. It looks like a problem with libvirt connection.


Expected results:
Client traffic should not be blocked if ports are opened.

Additional info:

From nova-compute logs:

2015-05-07 07:11:12.992 4875 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 5
983, in update_available_resource
2015-05-07 07:11:12.992 4875 TRACE nova.openstack.common.periodic_task     rt.update_available_resource(context)
2015-05-07 07:11:12.992 4875 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py
", line 313, in update_available_resource
2015-05-07 07:11:12.992 4875 TRACE nova.openstack.common.periodic_task     resources = self.driver.get_available_resource(self.nodename)
2015-05-07 07:11:12.992 4875 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", li
ne 5095, in get_available_resource
2015-05-07 07:11:12.992 4875 TRACE nova.openstack.common.periodic_task     stats = self.get_host_stats(refresh=True)
2015-05-07 07:11:12.992 4875 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", li
ne 6158, in get_host_stats
2015-05-07 07:11:12.992 4875 TRACE nova.openstack.common.periodic_task     return self.host_state.get_host_stats(refresh=refresh)
2015-05-07 07:11:12.992 4875 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", li
ne 6732, in get_host_stats
2015-05-07 07:11:12.992 4875 TRACE nova.openstack.common.periodic_task     self.update_status()
2015-05-07 07:11:12.992 4875 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", li
ne 6755, in update_status
2015-05-07 07:11:12.992 4875 TRACE nova.openstack.common.periodic_task     disk_info_dict = self.driver._get_local_gb_info()
2015-05-07 07:11:12.992 4875 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", li
ne 4688, in _get_local_gb_info
2015-05-07 07:11:12.992 4875 TRACE nova.openstack.common.periodic_task     info = LibvirtDriver._get_rbd_driver().get_pool_info()
2015-05-07 07:11:12.992 4875 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/rbd_utils.py",
 line 286, in get_pool_info
2015-05-07 07:11:12.992 4875 TRACE nova.openstack.common.periodic_task     with RADOSClient(self) as client:
2015-05-07 07:11:12.992 4875 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/rbd_utils.py",
 line 86, in __init__
2015-05-07 07:11:12.992 4875 TRACE nova.openstack.common.periodic_task     self.cluster, self.ioctx = driver._connect_to_rados(pool)
2015-05-07 07:11:12.992 4875 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/rbd_utils.py", line 110, in _connect_to_rados
2015-05-07 07:11:12.992 4875 TRACE nova.openstack.common.periodic_task     client.connect()
2015-05-07 07:11:12.992 4875 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/rados.py", line 429, in connect
2015-05-07 07:11:12.992 4875 TRACE nova.openstack.common.periodic_task     raise make_ex(ret, "error connecting to the cluster")
2015-05-07 07:11:12.992 4875 TRACE nova.openstack.common.periodic_task TimedOut: error connecting to the cluster


Ceph health was OK. Did not find any errors in ceph logs on osd or mon nodes. Once firewall is disabled, VM's can boot successfully.

Comment 2 Ken Dreyer (Red Hat) 2015-05-08 13:46:23 UTC

Josh would you mind taking a look at this one (or re-assigning as appropriate?)

Comment 3 Josh Durgin 2015-05-08 16:00:24 UTC

What firewall rules are used exactly? It sounds like they need updating to let clients connect to monitors at least, if that's a persistent issue and not a result of temporarily full firewalling before the ports were opened.

Comment 4 shilpa 2015-05-12 06:14:40 UTC

On the mon nodes:
firewall-cmd --zone=public --add-port=6789/tcp --permanent 

On all osd nodes:
firewall-cmd --zone=public --add-port=6800-6811/tcp --permanent

This is when the client started complaining of "error connecting to cluster" and this was persistent.. until I disabled firewall.

Comment 5 Ken Dreyer (Red Hat) 2015-05-12 18:58:26 UTC

Hi shilpa, does the problem still occur when you run the commands a second time without the --permanent flag?

"--permanent" only writes firewalld's configuration to disk, and it doesn't actually effect a change in the "live" firewall rules until "firewall-cmd --reload" (or a full OS reboot). This is a problem in the current documentation, tracked in bug 1220793 .

Comment 6 shilpa 2015-05-13 04:35:29 UTC

Hi Ken,

I should have mentioned. I did run "firewall-cmd --reload". The test was done afer running reload.

Comment 7 Ken Dreyer (Red Hat) 2015-05-20 03:44:23 UTC

Shilpa, when you re-activate your firewall and open a wider port range on the on the OSDs (TCP ports 6800-7300) as discussed in bz 1219493, does the problem go away?

Comment 8 shilpa 2015-05-21 05:38:41 UTC

Hi Ken,

I have not tried opening the entire port range. Will try that and update.

Comment 9 shilpa 2015-05-21 14:55:27 UTC

Tried with opening 6800-7300 port range on the OSD nodes. I don't see the problem anymore.

Comment 10 Ken Dreyer (Red Hat) 2015-05-21 20:40:39 UTC

shilpa, thanks for confirming!

I'm thinking we should close this bug and open a new one for the needed doc changes (https://github.com/ceph/ceph/pull/4740). Do you agree?

Comment 11 Ken Dreyer (Red Hat) 2015-05-21 22:53:41 UTC

FYI I've filed bz 1223992 for the firewall docs change.

Comment 12 shilpa 2015-05-22 13:09:05 UTC

Yes we can close this bug.

Comment 13 Ken Dreyer (Red Hat) 2015-05-22 14:16:52 UTC


*** This bug has been marked as a duplicate of bug 1223992 ***