Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 1219818

Summary: [rbd-openstack] Cannot create nova instances with firewall enabled on ceph cluster running on RHEL 7.1
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: shilpa <smanjara>
Component: RBDAssignee: Josh Durgin <jdurgin>
Status: CLOSED DUPLICATE QA Contact: ceph-qe-bugs <ceph-qe-bugs>
Severity: medium Docs Contact: John Wilkins <jowilkin>
Priority: unspecified    
Version: 1.3.0CC: ceph-eng-bugs, flucifre, kdreyer, smanjara, vumrao
Target Milestone: rc   
Target Release: 1.3.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-05-22 14:16:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description shilpa 2015-05-08 10:56:24 UTC
Description of problem:
ON RHEL 7.1, enable firewall on ceph cluster and opened mon and osd ports. Openstack VM's fail to boot. 


Version-Release number of selected component (if applicable):
# ceph -v
ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)


How reproducible:
Tried once

Steps to Reproduce:
1. Create ceph cluster(3mons and 5 osds) and an openstack setup.
2. Configure RBD for nova,glance and cinder.
3. With firewall disabled, create glance image and boot a nova instance. Ensure the setup is working correctly.
4. Now enable firewall on ceph cluster with port 6789 opened on all mon nodes and open ports for osd traffic between 6800-7100.
5. Disable SELinux on both ceph and openstack clusters.
6. Create a new nova VM instance. 

Actual results:
The instance fails to boot. I can create cinder volume and glance image though. It looks like a problem with libvirt connection.


Expected results:
Client traffic should not be blocked if ports are opened.

Additional info:

From nova-compute logs:

2015-05-07 07:11:12.992 4875 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 5
983, in update_available_resource
2015-05-07 07:11:12.992 4875 TRACE nova.openstack.common.periodic_task     rt.update_available_resource(context)
2015-05-07 07:11:12.992 4875 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py
", line 313, in update_available_resource
2015-05-07 07:11:12.992 4875 TRACE nova.openstack.common.periodic_task     resources = self.driver.get_available_resource(self.nodename)
2015-05-07 07:11:12.992 4875 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", li
ne 5095, in get_available_resource
2015-05-07 07:11:12.992 4875 TRACE nova.openstack.common.periodic_task     stats = self.get_host_stats(refresh=True)
2015-05-07 07:11:12.992 4875 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", li
ne 6158, in get_host_stats
2015-05-07 07:11:12.992 4875 TRACE nova.openstack.common.periodic_task     return self.host_state.get_host_stats(refresh=refresh)
2015-05-07 07:11:12.992 4875 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", li
ne 6732, in get_host_stats
2015-05-07 07:11:12.992 4875 TRACE nova.openstack.common.periodic_task     self.update_status()
2015-05-07 07:11:12.992 4875 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", li
ne 6755, in update_status
2015-05-07 07:11:12.992 4875 TRACE nova.openstack.common.periodic_task     disk_info_dict = self.driver._get_local_gb_info()
2015-05-07 07:11:12.992 4875 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", li
ne 4688, in _get_local_gb_info
2015-05-07 07:11:12.992 4875 TRACE nova.openstack.common.periodic_task     info = LibvirtDriver._get_rbd_driver().get_pool_info()
2015-05-07 07:11:12.992 4875 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/rbd_utils.py",
 line 286, in get_pool_info
2015-05-07 07:11:12.992 4875 TRACE nova.openstack.common.periodic_task     with RADOSClient(self) as client:
2015-05-07 07:11:12.992 4875 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/rbd_utils.py",
 line 86, in __init__
2015-05-07 07:11:12.992 4875 TRACE nova.openstack.common.periodic_task     self.cluster, self.ioctx = driver._connect_to_rados(pool)
2015-05-07 07:11:12.992 4875 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/rbd_utils.py", line 110, in _connect_to_rados
2015-05-07 07:11:12.992 4875 TRACE nova.openstack.common.periodic_task     client.connect()
2015-05-07 07:11:12.992 4875 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/rados.py", line 429, in connect
2015-05-07 07:11:12.992 4875 TRACE nova.openstack.common.periodic_task     raise make_ex(ret, "error connecting to the cluster")
2015-05-07 07:11:12.992 4875 TRACE nova.openstack.common.periodic_task TimedOut: error connecting to the cluster


Ceph health was OK. Did not find any errors in ceph logs on osd or mon nodes. Once firewall is disabled, VM's can boot successfully.

Comment 2 Ken Dreyer (Red Hat) 2015-05-08 13:46:23 UTC
Josh would you mind taking a look at this one (or re-assigning as appropriate?)

Comment 3 Josh Durgin 2015-05-08 16:00:24 UTC
What firewall rules are used exactly? It sounds like they need updating to let clients connect to monitors at least, if that's a persistent issue and not a result of temporarily full firewalling before the ports were opened.

Comment 4 shilpa 2015-05-12 06:14:40 UTC
On the mon nodes:
firewall-cmd --zone=public --add-port=6789/tcp --permanent 

On all osd nodes:
firewall-cmd --zone=public --add-port=6800-6811/tcp --permanent

This is when the client started complaining of "error connecting to cluster" and this was persistent.. until I disabled firewall.

Comment 5 Ken Dreyer (Red Hat) 2015-05-12 18:58:26 UTC
Hi shilpa, does the problem still occur when you run the commands a second time without the --permanent flag?

"--permanent" only writes firewalld's configuration to disk, and it doesn't actually effect a change in the "live" firewall rules until "firewall-cmd --reload" (or a full OS reboot). This is a problem in the current documentation, tracked in bug 1220793 .

Comment 6 shilpa 2015-05-13 04:35:29 UTC
Hi Ken,

I should have mentioned. I did run "firewall-cmd --reload". The test was done afer running reload.

Comment 7 Ken Dreyer (Red Hat) 2015-05-20 03:44:23 UTC
Shilpa, when you re-activate your firewall and open a wider port range on the on the OSDs (TCP ports 6800-7300) as discussed in bz 1219493, does the problem go away?

Comment 8 shilpa 2015-05-21 05:38:41 UTC
Hi Ken,

I have not tried opening the entire port range. Will try that and update.

Comment 9 shilpa 2015-05-21 14:55:27 UTC
Tried with opening 6800-7300 port range on the OSD nodes. I don't see the problem anymore.

Comment 10 Ken Dreyer (Red Hat) 2015-05-21 20:40:39 UTC
shilpa, thanks for confirming!

I'm thinking we should close this bug and open a new one for the needed doc changes (https://github.com/ceph/ceph/pull/4740). Do you agree?

Comment 11 Ken Dreyer (Red Hat) 2015-05-21 22:53:41 UTC
FYI I've filed bz 1223992 for the firewall docs change.

Comment 12 shilpa 2015-05-22 13:09:05 UTC
Yes we can close this bug.

Comment 13 Ken Dreyer (Red Hat) 2015-05-22 14:16:52 UTC

*** This bug has been marked as a duplicate of bug 1223992 ***