Bug 1507888
Summary: | Deployment with ceph and TLS everywhere fails with: "WorkflowTasks_Step2_Execution: ERROR "cannot stat '/var/run/ceph/ceph-mon.overcloud-controller-2.asok': No such file or directory"" | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Artem Hrechanychenko <ahrechan> | ||||
Component: | openstack-tripleo-heat-templates | Assignee: | Giulio Fidente <gfidente> | ||||
Status: | CLOSED ERRATA | QA Contact: | Yogev Rabl <yrabl> | ||||
Severity: | high | Docs Contact: | Derek <dcadzow> | ||||
Priority: | high | ||||||
Version: | 12.0 (Pike) | CC: | adeza, agurenko, aschoen, ceph-eng-bugs, dcritch, derli, gfidente, gmeno, jbiao, jomurphy, josorior, mariel, mburns, mcornea, michele, michele, nthomas, pgrist, pmorey, rchincho, rhel-osp-director-maint, sankarshan, sasha, scohen, shan, slinaber, yprokule | ||||
Target Milestone: | z2 | Keywords: | Triaged, ZStream | ||||
Target Release: | 12.0 (Pike) | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | openstack-tripleo-heat-templates-7.0.3-21.el7ost | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2018-03-28 17:14:53 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 1554444 | ||||||
Bug Blocks: | |||||||
Attachments: |
|
Description
Artem Hrechanychenko
2017-10-31 11:44:41 UTC
I have the same issue, though I'm not using TLS. It seems to be due to setting an overcloud domain. All things being equal, setting an overcloud domain causes the same error. The ceph-mon processes are being started with an FQDN so /var/run/ceph/ceph-mon.overcloud-controller-2.asok does not exist, but /var/run/ceph/ceph-mon.overcloud-controller-2.my.domain.asok does. Without setting an overcloud domain, the short hostname version of the file is there and ceph-ansible is happy. hi David, thanks for your help. I wonder if passing mon_use_fqdn to ceph-ansible helps; could you try creating an heat environment file with the following contents: parameter_defaults: CephAnsibleExtraConfig: mon_use_fqdn: true and deploy the overcloud with the above to see if it passes? Yup, that totally did it. Thanks Giulio! David thanks a lot for helping. I am trying to figure if we can enable the parameter in comment #5 conditionally when a cloud domain is set. Can you tell me which parameters you use to set the cloud domain? Is it CloudDomain only? No. With OSP12, you can set an 'overcloud_domain_name' in undercloud.conf. I had CloudDomain in one of my env files - along w/ doing the nova/neutron changes and restarts - prior to 12 but I took it out since undercloud.conf now covers all that. w/a with https://bugzilla.redhat.com/show_bug.cgi?id=1507888#c5 works for me FWIW, I tried `mon_use_fqdn: true` on a fresh deploy without setting the domain and that failed for the same reason. Would have been sweet you could just set it and not worry about it. Whether you've configured a domain the old or the new way, the net result will include a dhcp_domain entry in nova.conf and a dns_domain entry in neutron.conf. So you could conditionally enable it if: crudini --get /etc/neutron/neutron.conf DEFAULT dns_domain or crudini --get /etc/nova/nova.conf DEFAULT dhcp_domain is not 'localdomain' (In reply to David Critch from comment #11) > FWIW, I tried `mon_use_fqdn: true` on a fresh deploy without setting the > domain and that failed for the same reason. Would have been sweet you could > just set it and not worry about it. yeah it doesn't guess the configuration and expects user to set the boolean when necessary > Whether you've configured a domain the old or the new way, the net result > will include a dhcp_domain entry in nova.conf and a dns_domain entry in > neutron.conf. So you could conditionally enable it if: > crudini --get /etc/neutron/neutron.conf DEFAULT dns_domain > or > crudini --get /etc/nova/nova.conf DEFAULT dhcp_domain > is not 'localdomain' it's actually quickstart passing via CloudDomain the same domain set in undercloud.conf so I think we should be okay enabling the boolean based on CloudDomain *** Bug 1508038 has been marked as a duplicate of this bug. *** Hasn't https://review.openstack.org/#/c/523375/ made things worse for everyone now? Now on a fresh deploy that worked until yesterday I get errors because the short hostname is: overcloud-novacompute-0.novalocal And this even when I explicitly set my CloudDomain to localdomain in my env files. In our current undercloud configuration in the default way hostnames are assigned if dhcp_domain on the undercloud is set to something different than the CloudDomain, short hostnames become FQDNs all of a sudden. Also another side-effect here is that a deployment from master now has overcloud nodes with hostnames ending in .novalocal even when .localdomain is specified as CloudDomain (In reply to Michele Baldessari from comment #20) > Hasn't https://review.openstack.org/#/c/523375/ made things worse for > everyone now? > > Now on a fresh deploy that worked until yesterday I get errors because > the short hostname is: > overcloud-novacompute-0.novalocal > > And this even when I explicitly set my CloudDomain to localdomain in my env > files. In our current undercloud configuration in the default way hostnames > are assigned if dhcp_domain on the undercloud is set to something different > than the CloudDomain, short hostnames become FQDNs all of a sudden. I suppose that is because nova is deliberately using .novalocal when the setting is left to the default. I suppose a better fix would have been to set it to '' instead, as we do for the overcloud [1], what do you think? 1. https://github.com/openstack/tripleo-heat-templates/blob/107b610923ba5d39f90c3a6a63bf2d3642e1b35d/puppet/services/nova-base.yaml#L223 failed to deploy the overcloud with the error: "Error: /Stage[main]/Tripleo::Certmonger::Ca::Crl/File[tripleo-ca-crl]: Could not evaluate: Could not retrieve file metadata for http://ipa-ca/ipa/crl/MasterCRL.bin: getaddrinfo: Name or service not known" which is documented in the bug: https://bugzilla.redhat.com/show_bug.cgi?id=1554444 once that bug will backported I'll be able to verify this one I am the assignee so I am not sure if I can move the BZ into VERIFIED myself but I tested this with: ceph-ansible-3.0.27-1.el7cp.noarch using a custom domain name (example.com) in neutron.conf and the following heat parameters: CloudDomain: example.com CloudName: overcloud.example.com CloudNameInternal: overcloud.internalapi.example.com CloudNameStorage: overcloud.storage.example.com CloudNameStorageManagement: overcloud.storagemgmt.example.com CloudNameCtlplane: overcloud.ctlplane.example.com verified on openstack-tripleo-heat-templates-7.0.9-8.el7ost.noarch Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:0602 |