Bug 1632745

Summary: OVN metadata agent fails when using TLS Everywhere
Product: Red Hat OpenStack Reporter: Gregory Charot <gcharot>
Component: openstack-tripleo-heat-templatesAssignee: Daniel Alvarez Sanchez <dalvarez>
Status: CLOSED ERRATA QA Contact: Roman Safronov <rsafrono>
Severity: medium Docs Contact:
Priority: medium    
Version: 13.0 (Queens)CC: dalvarez, ekuris, mburns, mschuppe, nchandek, rmeillon, tfreger
Target Milestone: zstreamKeywords: Triaged, ZStream
Target Release: 13.0 (Queens)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-8.0.7-29.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1666617 (view as bug list) Environment:
Last Closed: 2019-03-14 13:54:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1666617    

Description Gregory Charot 2018-09-25 12:28:56 UTC
Description of problem:

When deploying OSP13 with TLS everywhere, OVN metadata fails to serve metadata to instances.

Version-Release number of selected component (if applicable):

13

How reproducible:

Always

Steps to Reproduce:
1. Deploy OSP13 with TLS Everywhere and OVN-DVR
2. Spawn instances
3. Instances get network config but can't reach metadata service.

Actual results:

Inside the guest (cloud-init.log)
2018-09-24 12:02:44,916 - url_helper.py[DEBUG]: [0/1] open 'http://169.254.169.254/2009-04-04/meta-data/instance-id' with {'url': 'http://169.254.169.254/2009-04-04/meta-data/instance-id', 'headers': {'User-Agent': 'Cloud-Init/0.7.9'}, 'a
llow_redirects': True, 'method': 'GET', 'timeout': 50.0} configuration
2018-09-24 12:02:44,930 - url_helper.py[DEBUG]: Read from http://169.254.169.254/2009-04-04/meta-data/instance-id (500, 207b) after 1 attempts
2018-09-24 12:02:44,931 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [0/120s]: bad status code [500]
2018-09-24 12:02:44,931 - url_helper.py[DEBUG]: Please wait 1 seconds while we wait to try again


ovn-metadata-agent.log (on the compute)
2018-09-24 12:02:45.498 24 INFO eventlet.wsgi.server [-] 192.168.1.22,<local> "GET /2009-04-04/meta-data/instance-id HTTP/1.1" status: 500  len: 362 time: 0.0078270
2018-09-24 12:02:46.511 24 ERROR networking_ovn.agent.metadata.server [-] Unexpected error.: BadStatusLine: ''
2018-09-24 12:02:46.511 24 ERROR networking_ovn.agent.metadata.server Traceback (most recent call last):
2018-09-24 12:02:46.511 24 ERROR networking_ovn.agent.metadata.server   File "/usr/lib/python2.7/site-packages/networking_ovn/agent/metadata/server.py", line 68, in __call__
2018-09-24 12:02:46.511 24 ERROR networking_ovn.agent.metadata.server     return self._proxy_request(instance_id, project_id, req)
2018-09-24 12:02:46.511 24 ERROR networking_ovn.agent.metadata.server   File "/usr/lib/python2.7/site-packages/networking_ovn/agent/metadata/server.py", line 119, in _proxy_request
2018-09-24 12:02:46.511 24 ERROR networking_ovn.agent.metadata.server     body=req.body)
2018-09-24 12:02:46.511 24 ERROR networking_ovn.agent.metadata.server   File "/usr/lib/python2.7/site-packages/httplib2/__init__.py", line 1621, in request
2018-09-24 12:02:46.511 24 ERROR networking_ovn.agent.metadata.server     (response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
2018-09-24 12:02:46.511 24 ERROR networking_ovn.agent.metadata.server   File "/usr/lib/python2.7/site-packages/httplib2/__init__.py", line 1363, in _request
2018-09-24 12:02:46.511 24 ERROR networking_ovn.agent.metadata.server     (response, content) = self._conn_request(conn, request_uri, method, body, headers)
2018-09-24 12:02:46.511 24 ERROR networking_ovn.agent.metadata.server   File "/usr/lib/python2.7/site-packages/httplib2/__init__.py", line 1319, in _conn_request
2018-09-24 12:02:46.511 24 ERROR networking_ovn.agent.metadata.server     response = conn.getresponse()
2018-09-24 12:02:46.511 24 ERROR networking_ovn.agent.metadata.server   File "/usr/lib64/python2.7/httplib.py", line 1113, in getresponse
2018-09-24 12:02:46.511 24 ERROR networking_ovn.agent.metadata.server     response.begin()
2018-09-24 12:02:46.511 24 ERROR networking_ovn.agent.metadata.server   File "/usr/lib64/python2.7/httplib.py", line 444, in begin
2018-09-24 12:02:46.511 24 ERROR networking_ovn.agent.metadata.server     version, status, reason = self._read_status()
2018-09-24 12:02:46.511 24 ERROR networking_ovn.agent.metadata.server   File "/usr/lib64/python2.7/httplib.py", line 408, in _read_status
2018-09-24 12:02:46.511 24 ERROR networking_ovn.agent.metadata.server     raise BadStatusLine(line)
2018-09-24 12:02:46.511 24 ERROR networking_ovn.agent.metadata.server BadStatusLine: ''
2018-09-24 12:02:46.511 24 ERROR networking_ovn.agent.metadata.server

Expected results:

Instances successfully reach the metadata service and configure itself (hostname, SSH key, etc)

Additional info:

OVN metadata agent configuration does not include TLS parameters therefore it cannot reach nova internal API VIP.

The following fixes the issue:

Open /var/lib/config-data/puppet-generated/neutron/etc/neutron/plugins/networking-ovn/networking-ovn-metadata-agent.ini

Replace
ova_metadata_ip=172.17.1.150 # internal API VIP
By 
nova_metadata_host=overcloud.internalapi.redhat.local # Internal API VIP FQDN

As well as adding the following:
nova_metadata_protocol = https
auth_ca_cert=/etc/pki/tls/cert.pem

Finally restart service: 
docker restart ovn_metadata_agent

We can see these options are not included in the core THT:
https://github.com/openstack/tripleo-heat-templates/blob/stable/queens/puppet/services/ovn-metadata.yaml#L84

As a side note, this case is covered for ML2-OVS (not tested):
https://github.com/openstack/tripleo-heat-templates/blob/stable/queens/puppet/services/neutron-metadata.yaml#L116

Comment 5 Daniel Alvarez Sanchez 2018-09-26 12:09:15 UTC
Patch sent to THT master. Once merged, I'll backport it to stable/queens and handle it D/S.

Comment 11 Daniel Alvarez Sanchez 2019-01-16 08:46:38 UTC
Patches posted d/s, awaiting reviews for both 14 and 13. Once merged I'll build the packages

Comment 29 Roman Safronov 2019-02-26 16:31:46 UTC
Verified on 2019-02-25.2/RH7-RHOS-13.0/ with TLS Everywhere enabled.

Link to the verified build: https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DFG/view/network/view/networking-ovn/job/DFG-network-networking-ovn-13_director-rhel-virthost-3cont_2comp_1ipa-ipv4-geneve-tls/14/

Verified that instances are able to retrieve data from metadata service.
Verified also after moving nova_metadata_ip to another controller host.
Verified also after shutting down and restarting instances and creating new instances after moving the ip. 
Verified with cirros and rhel images.
Verified that traffic between metadata agent and nova uses TLS.

Comment 31 errata-xmlrpc 2019-03-14 13:54:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0448