| Summary: | [Backwards Compatibility] SSL enabled UC10-OC9 deployment fails with Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 6+SSL | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Dan Yasny <dyasny> |
| Component: | documentation | Assignee: | RHOS Documentation Team <rhos-docs> |
| Status: | CLOSED DUPLICATE | QA Contact: | RHOS Documentation Team <rhos-docs> |
| Severity: | urgent | Docs Contact: | |
| Priority: | high | ||
| Version: | 10.0 (Newton) | CC: | dbecker, dyasny, jcoufal, josorior, jslagle, lbopf, mandreou, mburns, mcornea, morazi, rhel-osp-director-maint, srevivo |
| Target Milestone: | ga | Keywords: | Documentation, Triaged |
| Target Release: | 10.0 (Newton) | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-11-02 23:33:51 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Dan Yasny
2016-10-17 21:27:59 UTC
original deployment command: openstack overcloud deploy --templates /home/stack/tht --control-scale 3 --compute-scale 1 --neutron-network-type vxlan --neutron-tunnel-types vxlan --ntp-server clock.redhat.com --timeout 90 -e /home/stack/tht/environments/puppet-pacemaker.yaml -e /home/stack/tht/environments/storage-environment.yaml -e /home/stack/tht/environments/network-isolation.yaml -e network-environment.yaml -e ~/ssl-heat-templates/environments/enable-tls.yaml -e ~/ssl-heat-templates/environments/inject-trust-anchor.yaml --ceph-storage-scale 1 /home/stack/tht holds a copy of THT from the openstack-tripleo-heat-templates-compat marios, can someone from lifecycle take a look at this one? OK assigned to apetrich since he's looking at the backwards compat - let's see if there was any info/triage from Sofer as per comment #5 too Adriano can you please sync with Dan and have a look at this? So it seems that the issue is that the VIP is going to 192.168.200.188 instead of 192.168.200.180 and the cert is for 192.168.200.180 some evidence of that: Notice: /Stage[main]/Main/Pacemaker::Resource::Ip[public_vip]/Pcmk_resource[ip-192.168.200.188]/ensure: created here is the error that causes the newton error SSL exception connecting to https://192.168.200.188:13000/v3/auth/tokens: hostname '192.168.200.188' doesn't match u'192.168.200.180' Error: /Stage[main]/Keystone::Roles::Admin/Keystone_user[admin]: Could not evaluate: Execution of '/usr/bin/openstack token issue --format value' returned 1: Certificate did not match expected hostname: 192.168.200.188. the network-environment.yaml: ExternalAllocationPools: [{'start': '192.168.200.180', 'end': '192.168.200.200'}] and a netstat in the controller-0 [root@overcloud-controller-0 keystone]# netstat -anp | grep 188 tcp 0 0 192.168.200.188:13386 0.0.0.0:* LISTEN 18534/haproxy | tcp 0 0 192.168.200.188:13003 0.0.0.0:* LISTEN 18534/haproxy | tcp 0 0 192.168.200.188:13004 0.0.0.0:* LISTEN 18534/haproxy | tcp 0 0 192.168.200.188:13292 0.0.0.0:* LISTEN 18534/haproxy | tcp 0 0 192.168.200.188:13773 0.0.0.0:* LISTEN 18534/haproxy | tcp 0 0 192.168.200.188:13357 0.0.0.0:* LISTEN 18534/haproxy | tcp 0 0 192.168.200.188:13774 0.0.0.0:* LISTEN 18534/haproxy | tcp 0 0 192.168.200.188:13808 0.0.0.0:* LISTEN 18534/haproxy | tcp 0 0 192.168.200.188:80 0.0.0.0:* LISTEN 18534/haproxy | tcp 0 0 192.168.200.188:13776 0.0.0.0:* LISTEN 18534/haproxy | tcp 0 0 192.168.200.188:13041 0.0.0.0:* LISTEN 18534/haproxy | tcp 0 0 192.168.200.188:13777 0.0.0.0:* LISTEN 18534/haproxy | tcp 0 0 192.168.200.188:13042 0.0.0.0:* LISTEN 18534/haproxy | tcp 0 0 192.168.200.188:13080 0.0.0.0:* LISTEN 18534/haproxy | tcp 0 0 192.168.200.188:443 0.0.0.0:* LISTEN 18534/haproxy | tcp 0 0 192.168.200.188:13696 0.0.0.0:* LISTEN 18534/haproxy | tcp 0 0 192.168.200.188:13000 0.0.0.0:* LISTEN 18534/haproxy | tcp 0 0 192.168.200.188:13800 0.0.0.0:* LISTEN 18534/haproxy So, I got into the node and the actual error message is:
Error: /Stage[main]/Keystone::Roles::Admin/Keystone_user[admin]: Could not evaluate: Execution of '/usr/bin/openstack token issue --format value' returned 1: Certificate did not match expected hostn
ame: 192.168.200.185. Certificate: {'notBefore': u'Oct 20 15:43:59 2016 GMT', 'serialNumber': u'9D54725C4D116EB7', 'notAfter': 'Oct 20 15:43:59 2017 GMT', 'version': 3L, 'subject': ((('countryName',
u'US'),), (('stateOrProvinceName', u'NC'),), (('localityName', u'Raleigh'),), (('organizationName', u'Red HAt'),), (('organizationalUnitName', u'QE'),), (('commonName', u'192.168.200.180'),)), 'iss
uer': ((('countryName', u'US'),), (('stateOrProvinceName', u'NC'),), (('localityName', u'Raleigh'),), (('organizationName', u'Red HAt'),), (('organizationalUnitName', u'QE'),), (('commonName', u'192
.168.200.180'),))}
SSL exception connecting to https://192.168.200.185:13000/v3/auth/tokens: hostname '192.168.200.185' doesn't match u'192.168.200.180' (tried 40, for a total of 170 seconds)
which indicates that the certificate has the wrong CN or SubjectAltName. Now, it was assumed that it would be 192.168.200.180, however, we need to consider that this cannot be assured unless we set the FixedIPs for the Public network (which can be done via the PublicVirtualFixedIPs).
I checked the Fixed IPs and they're not set:
"StorageVirtualFixedIPs": "[]",
"PublicVirtualFixedIPs": "[]",
"StorageMgmtVirtualFixedIPs": "[]",
"ControlFixedIPs": "[]",
"InternalApiVirtualFixedIPs": "[]",
Setting FixedIPs to match the certificate would fix the issue.
Could you try that Dan? Also I think it only affects multiple controllers on a 1 controller 1 compute env it worked fine without the extra params (In reply to Adriano Petrich from comment #9) > Could you try that Dan? > > Also I think it only affects multiple controllers on a 1 controller 1 > compute env it worked fine without the extra params I can try it, but this is still something new, since this sort of setup worked on previous puddles without any issues. With just one controller the VIP will not be able to change, and the cert signed for a specific IP will work, of course. I can definitely try to assign fixed IPs, but this will be a workaround, not a solution and will bring us no closer to the root cause of this issue dan, can you confirm that this is a new deployment of an osp9 overcloud with ssl using an osp10 undercloud? I think we need to identify why the cert generation process assumed the VIP was 192.168.200.180 when it is actually 192.168.200.185. Juan, is this something you can look into? Using FixedIP's is a workaround, but it shouldn't be necessary if we are automatically generating the certs during the deployment process. James, Yes it is a new deployment of an overcloud osp9 using an osp10 undercloud. I think it is the other way around we were expecting the VIP to be the first of the ExternalAllocationPools as it has been on the previous versions. this script is based on what was working before. It looks like the ssl is not the issue, as the certs points to the expected VIP the breakage is just showing now. if it wasn't for the ssl the endpoints are mapped to the new ips and everything in the overcloud is still working although not in the first external ip of the allocation pool. So far we are not sure on what prompted this change but I can see two possibilities: * What we assumed that was "the usual behaviour" was a glitch and we used that as the default. This is a tangible possibility since we are not defining PublicVirtualFixedIPs anything in the sense in order forcing that ip to be 192.168.200.180 * the "usual behaviour" is the correct one, and now it has changed accidentally (or not). There are going to be breakages from clients and users Outputs in the first case we might need more documentation on this. on the second we need to find where did the change happened. Anyway I don't know where to go from here besides what I'm doing right now that is try setting up those values as an workaround Dan,
adding PublicVirtualFixedIPs: [{'ip_address':'192.168.200.180'}] to the network-settings.yaml fixed the issue.
(In reply to Adriano Petrich from comment #15) > Dan, > > > adding PublicVirtualFixedIPs: [{'ip_address':'192.168.200.180'}] to the > network-settings.yaml fixed the issue. That sounds good, but we need to understand whether this workaround needs to become the default for all new deployments (or just mixed version deployments?) and then this needs to be documented, or the old behaviour is correct and we need to fix whatever broke it in the current and previous puddles. James, can your team help with that? I realize the easiest solution is to just document it, but leaving a regression alone can cause additional grief down the line, I think. (In reply to Dan Yasny from comment #16) > (In reply to Adriano Petrich from comment #15) > > Dan, > > > > > > adding PublicVirtualFixedIPs: [{'ip_address':'192.168.200.180'}] to the > > network-settings.yaml fixed the issue. > > That sounds good, but we need to understand whether this workaround needs to > become the default for all new deployments (or just mixed version > deployments?) and then this needs to be documented, or the old behaviour is > correct and we need to fix whatever broke it in the current and previous > puddles. > > James, can your team help with that? I realize the easiest solution is to > just document it, but leaving a regression alone can cause additional grief > down the line, I think. setting PublicVirtualFixedIPs is required when deploying with ssl and using the VIP as the CN of the certificate. This is because Neutron no longer gurantees that the first IP allocated in a dhcp subnet range will be the first (lowest) IP in the range, so the VIP is not predictable. Setting the PublicVirtualFixedIPs parameter makes it predictable. This is not a regression. It's still possible to deploy with SSL and do everything that was previously possible. It is however a change in the documented instructions on how you need to deploy with SSL. This is documented in tripleo-docs: http://docs.openstack.org/developer/tripleo-docs/advanced_deployment/ssl.html#overcloud-ssl I think the action here for this bugzilla is to make it into a docs bug to make sure that same change is reflected in the product docs. With the suggested workaround in place, the mixed version deployment with SSL enabled works manually. I have also tested deployments of clean SSL enabled 7, 8 and 9 setups with a FixedIP parameter set, and it might be a good idea to recommend this parameter to be included in the documentation for all versions, since it causes no damage and allows for consistency between versions I think this requirement is covered in bug 1357688. I'm closing this one as a duplicate. Please reopen if this is incorrect, or add any additional requirements in bug 1357688. *** This bug has been marked as a duplicate of bug 1357688 *** |