Bug 1413745
Summary: | Read timeout at beginning of CFME deployment | ||
---|---|---|---|
Product: | Red Hat Quickstart Cloud Installer | Reporter: | James Olin Oden <joden> |
Component: | Installation - CloudForms | Assignee: | John Matthews <jmatthew> |
Status: | CLOSED WORKSFORME | QA Contact: | Sudhir Mallamprabhakara <smallamp> |
Severity: | unspecified | Docs Contact: | Dan Macpherson <dmacpher> |
Priority: | unspecified | ||
Version: | 1.1 | CC: | bthurber, jmontleo, joden, qci-bugzillas |
Target Milestone: | --- | Keywords: | Triaged |
Target Release: | 1.1 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-01-17 20:23:02 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
James Olin Oden
2017-01-16 20:59:35 UTC
If the installation hangs the overcloud deployment can't communicate with the ceph storage cluster for some reason. I agree it's not ideal, but failure to connect doesn't cause the deployment to fail or images to outright fail uploading. It just blocks until the storage becomes available again. Make sure you're using Ceph Storage 2.0, not Ceph 1.3 as we were using in OSP 8. They are not compatible. If I recall correctly Ceph Storage 1.3 packages will have a version of roughly 0.97 and Ceph Storage 2.0 packages will have a version of roughly 10.x Also make sure no firewall rules on your host would prevent the overcloud hosts from communicating with the ceph host if your using a virt environment to test and try again. Concerning the version of ceph, here is what I have: [vagrant@ceph ~]$ rpm -qa | grep ceph ceph-selinux-10.2.3-17.el7cp.x86_64 ceph-deploy-1.5.36-20.el7cp.noarch python-cephfs-10.2.3-17.el7cp.x86_64 ceph-base-10.2.3-17.el7cp.x86_64 ceph-mon-10.2.3-17.el7cp.x86_64 ceph-mds-10.2.3-17.el7cp.x86_64 libcephfs1-10.2.3-17.el7cp.x86_64 ceph-common-10.2.3-17.el7cp.x86_64 ceph-osd-10.2.3-17.el7cp.x86_64 ceph-radosgw-10.2.3-17.el7cp.x86_64 And here are the firewall rules in question: 1340K 4168M ACCEPT all -- * virbr9 0.0.0.0/0 192.168.175.0/24 ctstate RELATED,ESTABLISHED 3542K 227M ACCEPT all -- virbr9 * 192.168.175.0/24 0.0.0.0/0 These basically allow connections on the 192.168.175.0/24 network, which is the provisioning network for the fusor and where the ceph server lives (it's under the provisioning range for handing out addresses via DHCP). My second run of the same deployment fails the same way. So I guess this is reproducible. ceph status reports: ceph status cluster bc5e37bc-3cf8-4823-a7c1-5109ddeefe63 health HEALTH_ERR ... no osds I'd guess this is why it is failing. Mine reads: [root@ceph20 ~]# ceph status cluster 42e66d84-53f0-4283-9d63-7077e1b566f4 health HEALTH_OK ... osdmap e18: 3 osds: 3 up, 3 in Please double check your changes to hostnames, etc. and review the output for errors when vagrant runs the setup.sh script. Set up a ceph host. started with: /dev/vdb1 1.0T 5.1G 1019G 1% /storage1 /dev/vdc1 1.0T 5.1G 1019G 1% /storage2 /dev/vdd1 1.0T 5.1G 1019G 1% /storage3 After the entire deployment was complete: /dev/vdb1 1.0T 9.1G 1015G 1% /storage1 /dev/vdc1 1.0T 9.1G 1015G 1% /storage2 /dev/vdd1 1.0T 9.1G 1015G 1% /storage3 [root@ceph20 ~]# ceph status cluster 42e66d84-53f0-4283-9d63-7077e1b566f4 health HEALTH_OK monmap e1: 1 mons at {ceph20=192.168.240.254:6789/0} election epoch 3, quorum 0 ceph20 osdmap e23: 3 osds: 3 up, 3 in flags sortbitwise pgmap v530: 124 pgs, 4 pools, 4083 MB data, 916 objects 27756 MB used, 3043 GB / 3070 GB avail 124 active+clean I will look at adding a validation so we can catch inaccessible and unhealthy clusters in 1.2. |