Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1589849

Summary: [OVN] Stopping metadata agent on a compute node will leave running/new instances without metadata
Product: Red Hat OpenStack Reporter: Daniel Alvarez Sanchez <dalvarez>
Component: openstack-tripleo-heat-templatesAssignee: Daniel Alvarez Sanchez <dalvarez>
Status: CLOSED NOTABUG QA Contact: Eran Kuris <ekuris>
Severity: high Docs Contact:
Priority: high    
Version: 13.0 (Queens)CC: amuller, cjeanner, ekuris, emacchi, jschluet, jslagle, mburns, rchincho, rzaleski, tfreger
Target Milestone: z3Keywords: Triaged, ZStream
Target Release: 13.0 (Queens)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-8.0.7-2.el7ost Doc Type: Release Note
Doc Text:
When the OVN metadata agent is stopped in a Compute node, all the VMs on that node will not have access to the metadata service. The impact is that if a new VM is spawned or an existing VM is rebooted, the VM will fail to access metadata until the OVN metadata agent is brought up back again.
Story Points: ---
Clone Of:
: 1589851 (view as bug list) Environment:
Last Closed: 2018-11-10 10:45:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1589851    

Description Daniel Alvarez Sanchez 2018-06-11 14:14:35 UTC
Description of problem:
When OVN metadata agent (controlplane) is stopped, any VM being restarted on that compute node won't be able to access the metadata service (dataplane) as all the haproxy processes will go away with the agent container.

How reproducible:
100%

Steps to Reproduce:
1. Spawn a VM on a compute node
2. Stop the agent in that compute node
3. Restart the VM

Actual results:

The VM will fail to fetch metadata at boot.

Expected results:

Access to metadata should still be possible.


Additional info:
We need to apply the same approach as we do in other neutron containers of creating a sidecar container for haproxy instances which remains running after stopping the agent.

Comment 5 Daniel Alvarez Sanchez 2018-10-03 15:52:16 UTC
Patches merged upstream

Comment 6 Daniel Alvarez Sanchez 2018-10-04 08:50:45 UTC
For clarification and helping QA to verify this BZ, this is what we expect on a compute node hosting a VM:

[heat-admin@compute-1 ~]$ sudo docker ps | grep metadata-agent
3d4b63f13a8e        192.168.24.1:8787/rhosp14/openstack-neutron-metadata-agent-ovn:2018-10-01.1   "ip netns exec ovn..."   19 minutes ago      Up 19 minutes                               neutron-haproxy-ovnmeta-2f7d8747-bce6-4afd-8f20-cff29e531ff4
aea63c8cb334        192.168.24.1:8787/rhosp14/openstack-neutron-metadata-agent-ovn:2018-10-01.1   "kolla_start"            15 hours ago        Up 15 hours (healthy)                       ovn_metadata_agent



The docker container running the OVN metadata agent plus a sidecar container running the haproxy instance.

If we switch the VM off, that sidecar container is expected to go away:

(overcloud) [stack@undercloud-0 ~]$ openstack server stop cirros1

[heat-admin@compute-1 ~]$ sudo docker ps | grep metadata-agent
aea63c8cb334        192.168.24.1:8787/rhosp14/openstack-neutron-metadata-agent-ovn:2018-10-01.1   "kolla_start"       16 hours ago        Up 16 hours (healthy)                       ovn_metadata_agent
 

[heat-admin@compute-1 ~]$ ps -aef | grep haproxy
[heat-admin@compute-1 ~]$

Comment 7 Daniel Alvarez Sanchez 2018-10-04 08:55:57 UTC
*** Bug 1633594 has been marked as a duplicate of this bug. ***

Comment 21 Eran Kuris 2018-10-31 08:06:51 UTC
The issue is still exists:

After stopping the container, there is still haproxy sidecar container
 but it is not serving the metadata
)[root@compute-0 /]# ps -aef | grep haproxy
neutron   111054  111034  0 Oct29 ?        00:00:00 /usr/sbin/haproxy -Ds -f /var/lib/neutron/ovn-metadata-proxy/056ca522-7e19-44c0-985c-c9599872ceb1.conf
neutron   111075  111054  0 Oct29 ?        00:00:05 /usr/sbin/haproxy -Ds -f /var/lib/neutron/ovn-metadata-proxy/056ca522-7e19-44c0-985c-c9599872ceb1.conf
root      863630  863578  0 07:54 ?        00:00:00 grep --color=auto haproxy
()[root@compute-0 /]# ls /var/lib/neutron/ovn-metadata-proxy/056ca522-7e19-44c0-985c-c9599872ceb1.conf
/var/lib/neutron/ovn-metadata-proxy/056ca522-7e19-44c0-985c-c9599872ceb1.conf


         Starting Execute cloud user/final scripts...
[   63.407748] cloud-init[979]: Cloud-init v. 0.7.9 running 'modules:final' at Wed, 31 Oct 2018 07:14:15 +0000. Up 63.25 seconds.
[   63.521554] cloud-init[979]: Cloud-init v. 0.7.9 finished at Wed, 31 Oct 2018 07:14:15 +0000. Datasource DataSourceOpenStack [net,ver=2].  Up 63.50 seconds
[  OK  ] Started Execute cloud user/final scripts.
[  OK  ] Reached target Multi-User System.
         Starting Update UTMP about System Runlevel Changes...
[  OK  ] Started Update UTMP about System Runlevel Changes.

Red Hat Enterprise Linux Server 7.5 (Maipo)
Kernel 3.10.0-860.el7.x86_64 on an x86_64

net-64-1-vm-1 login: root
Password: 
Last login: Wed Oct 31 03:11:44 on ttyS0
[root@net-64-1-vm-1 ~]# curl http://169.254.169.254/
<html><body><h1>503 Service Unavailable</h1>
No server is available to handle this request.


re-opened on:
openstack-tripleo-heat-templates-8.0.7-4.el7ost.noarch

OSP13z3 2018-10-24.1

Comment 22 Eran Kuris 2018-10-31 10:51:50 UTC
re-open

Comment 23 Daniel Alvarez Sanchez 2018-10-31 13:24:33 UTC
(In reply to Eran Kuris from comment #21)
> The issue is still exists:
> 
> After stopping the container, there is still haproxy sidecar container
>  but it is not serving the metadata
> )[root@compute-0 /]# ps -aef | grep haproxy
> neutron   111054  111034  0 Oct29 ?        00:00:00 /usr/sbin/haproxy -Ds -f
> /var/lib/neutron/ovn-metadata-proxy/056ca522-7e19-44c0-985c-c9599872ceb1.conf
> neutron   111075  111054  0 Oct29 ?        00:00:05 /usr/sbin/haproxy -Ds -f
> /var/lib/neutron/ovn-metadata-proxy/056ca522-7e19-44c0-985c-c9599872ceb1.conf
> root      863630  863578  0 07:54 ?        00:00:00 grep --color=auto haproxy
> ()[root@compute-0 /]# ls
> /var/lib/neutron/ovn-metadata-proxy/056ca522-7e19-44c0-985c-c9599872ceb1.conf
> /var/lib/neutron/ovn-metadata-proxy/056ca522-7e19-44c0-985c-c9599872ceb1.conf
> 
> 

Can you show the sidecar containers through docker ps and not ps? Like:

[heat-admin@compute-1 ~]$ sudo docker ps | grep metadata-agent
3d4b63f13a8e        192.168.24.1:8787/rhosp14/openstack-neutron-metadata-agent-ovn:2018-10-01.1   "ip netns exec ovn..."   19 minutes ago      Up 19 minutes                               neutron-haproxy-ovnmeta-2f7d8747-bce6-4afd-8f20-cff29e531ff4
aea63c8cb334        192.168.24.1:8787/rhosp14/openstack-neutron-metadata-agent-ovn:2018-10-01.1   "kolla_start"            15 hours ago        Up 15 hours (healthy)                       ovn_metadata_agent

Comment 24 Eran Kuris 2018-10-31 13:29:24 UTC
(In reply to Daniel Alvarez Sanchez from comment #23)
> (In reply to Eran Kuris from comment #21)
> > The issue is still exists:
> > 
> > After stopping the container, there is still haproxy sidecar container
> >  but it is not serving the metadata
> > )[root@compute-0 /]# ps -aef | grep haproxy
> > neutron   111054  111034  0 Oct29 ?        00:00:00 /usr/sbin/haproxy -Ds -f
> > /var/lib/neutron/ovn-metadata-proxy/056ca522-7e19-44c0-985c-c9599872ceb1.conf
> > neutron   111075  111054  0 Oct29 ?        00:00:05 /usr/sbin/haproxy -Ds -f
> > /var/lib/neutron/ovn-metadata-proxy/056ca522-7e19-44c0-985c-c9599872ceb1.conf
> > root      863630  863578  0 07:54 ?        00:00:00 grep --color=auto haproxy
> > ()[root@compute-0 /]# ls
> > /var/lib/neutron/ovn-metadata-proxy/056ca522-7e19-44c0-985c-c9599872ceb1.conf
> > /var/lib/neutron/ovn-metadata-proxy/056ca522-7e19-44c0-985c-c9599872ceb1.conf
> > 
> > 
> 
> Can you show the sidecar containers through docker ps and not ps? Like:
> 
> [heat-admin@compute-1 ~]$ sudo docker ps | grep metadata-agent
> 3d4b63f13a8e       
> 192.168.24.1:8787/rhosp14/openstack-neutron-metadata-agent-ovn:2018-10-01.1 
> "ip netns exec ovn..."   19 minutes ago      Up 19 minutes                  
> neutron-haproxy-ovnmeta-2f7d8747-bce6-4afd-8f20-cff29e531ff4
> aea63c8cb334       
> 192.168.24.1:8787/rhosp14/openstack-neutron-metadata-agent-ovn:2018-10-01.1 
> "kolla_start"            15 hours ago        Up 15 hours (healthy)          
> ovn_metadata_agent

yes, I can : 
root@compute-0 ~]# docker ps -a | grep meta
194feb06385f        192.168.24.1:8787/rhosp13/openstack-neutron-metadata-agent-ovn:2018-10-24.1   "ip netns exec ovn..."   42 hours ago        Up 42 hours                                    neutron-haproxy-ovnmeta-056ca522-7e19-44c0-985c-c9599872ceb1
788d1b1310b9        192.168.24.1:8787/rhosp13/openstack-neutron-metadata-agent-ovn:2018-10-24.1   "kolla_start"            44 hours ago        Exited (0) 5 seconds ago                       ovn_metadata_agent
8c5a2aa9780b        192.168.24.1:8787/rhosp13/openstack-neutron-metadata-agent-ovn:2018-10-24.1   "/docker_puppet_ap..."   44 hours ago        Exited (0) 44 hours ago                        setup_ovs_manager
fad957a1a538        192.168.24.1:8787/rhosp13/openstack-neutron-metadata-agent-ovn:2018-10-24.1   "/docker_puppet_ap..."   45 hours ago        Exited (0) 45 hours ago                        create_haproxy_wrapper

Comment 26 Daniel Alvarez Sanchez 2018-11-10 10:45:05 UTC
I am closing this bug because with the current situation (ie. haproxy sidecar container), the behaviour doesn't represent a regression with regards to older versions of OSP. When metadata proxy was down, instances can't reach Nova metadata API server getting an error from haproxy saying that the server is unavailable.

With the sidecar containers, this exact same behaviour is kept so we're not hitting any regressions.