Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1577945

Summary: FFU: metadata service http://169.254.169.254 becomes unavailable for overcloud instances during ffwd-upgrade run
Product: Red Hat OpenStack Reporter: Marius Cornea <mcornea>
Component: rhosp-directorAssignee: RHOS Maint <rhos-maint>
Status: CLOSED DUPLICATE QA Contact: Amit Ugol <augol>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 13.0 (Queens)CC: dbecker, lbezdick, mburns, morazi
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-05-14 16:52:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Marius Cornea 2018-05-14 13:33:59 UTC
Description of problem:

FFU: metadata service http://169.254.169.254 becomes unavailable for overcloud instances during ffwd-upgrade run.

This means that overcloud instances that get rebooted during the upgrade process cannot boot successfuly in the eventuality they rely on data provided by the metadata service at boot time.

Moreover the metadata service remains unreachable even after the fast forward upgrade process has finished.

Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-8.0.2-17.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Deploy OSP10 
2. Upgrade undercloud to 11/12/13
3. Run openstack overcloud ffwd-upgrade prepare
4. Run openstack overcloud ffwd-upgrade run
5. While upgrade is running log in to an instance running on the overcloud and try to reach the metadata service via curl http://169.254.169.254

Actual results:

<html><body><h1>503 Service Unavailable</h1>
No server is available to handle this request.
</body></html>


Expected results:
The metadata service is reachable for existing instances during and after upgrade. 

Additional info:
Attaching sosreports.

Comment 2 Lukas Bezdicka 2018-05-14 15:16:30 UTC
$ traceroute 169.254.169.254
traceroute to 169.254.169.254 (169.254.169.254), 30 hops max, 46 byte packets
 1  host-192-168-0-1.openstacklocal (192.168.0.1)  0.467 ms  0.314 ms  0.505 ms
 2  10.0.0.1 (10.0.0.1)  0.684 ms  0.561 ms  0.634 ms
 3  10.9.76.254 (10.9.76.254)  13.433 ms  18.293 ms  21.940 ms
 4  10.10.191.120 (10.10.191.120)  1.442 ms  1.098 ms  2.427 ms
 5  10.11.0.150 (10.11.0.150)  14.990 ms  13.242 ms  22.287 ms
 6  10.11.0.157 (10.11.0.157)  16.262 ms  17.254 ms  19.904 ms
 7  66.187.233.253 (66.187.233.253)  19.192 ms  66.187.233.252 (66.187.233.252)  21.013 ms  66.187.233.253 (66.187.233.253)  15.096 ms
 8  209.132.190.198 (209.132.190.198)  0.865 ms  0.810 ms  0.834 ms

Comment 3 Marius Cornea 2018-05-14 15:56:02 UTC
Looks like a permission issue related to moving the neutron haproxy processses to containers(note that https://review.openstack.org/#/c/567655/ is already included when this issue is observed):

[root@controller-0 heat-admin]# ip netns exec qrouter-c3e0daee-e103-40cd-951a-9ac4afdd19a5 netstat -tupan
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 0.0.0.0:9697            0.0.0.0:*               LISTEN      192689/haproxy      
tcp        0      0 192.168.0.1:9697        192.168.0.5:40316       TIME_WAIT   -                   
tcp        0      0 192.168.0.1:9697        192.168.0.5:40317       TIME_WAIT   -                   
[root@controller-0 heat-admin]# strace -p 192689
strace: Process 192689 attached
epoll_wait(0, [], 200, 1000)            = 0
epoll_wait(0, [{EPOLLIN, {u32=4, u64=4}}], 200, 1000) = 1
accept4(4, {sa_family=AF_INET, sin_port=htons(40318), sin_addr=inet_addr("192.168.0.5")}, [16], SOCK_NONBLOCK) = 1
setsockopt(1, SOL_TCP, TCP_NODELAY, [1], 4) = 0
accept4(4, 0x7ffe0e3360c0, 0x7ffe0e3360bc, SOCK_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(1, "GET / HTTP/1.1\r\nUser-Agent: curl"..., 8192, 0, NULL, NULL) = 142
getsockname(1, {sa_family=AF_INET, sin_port=htons(9697), sin_addr=inet_addr("192.168.0.1")}, [16]) = 0
getsockopt(1, SOL_IP, 0x50 /* IP_??? */, "\2\0\0P\251\376\251\376\0\0\0\0\0\0\0\0", [16]) = 0
socket(AF_LOCAL, SOCK_STREAM, 0)        = 2
fcntl(2, F_SETFL, O_RDONLY|O_NONBLOCK)  = 0
connect(2, {sa_family=AF_LOCAL, sun_path="/var/lib/neutron/metadata_proxy"}, 110) = -1 EACCES (Permission denied)
close(2)                                = 0
socket(AF_LOCAL, SOCK_STREAM, 0)        = 2
fcntl(2, F_SETFL, O_RDONLY|O_NONBLOCK)  = 0
connect(2, {sa_family=AF_LOCAL, sun_path="/var/lib/neutron/metadata_proxy"}, 110) = -1 EACCES (Permission denied)
close(2)                                = 0
socket(AF_LOCAL, SOCK_STREAM, 0)        = 2
fcntl(2, F_SETFL, O_RDONLY|O_NONBLOCK)  = 0
connect(2, {sa_family=AF_LOCAL, sun_path="/var/lib/neutron/metadata_proxy"}, 110) = -1 EACCES (Permission denied)
close(2)                                = 0
socket(AF_LOCAL, SOCK_STREAM, 0)        = 2
fcntl(2, F_SETFL, O_RDONLY|O_NONBLOCK)  = 0
connect(2, {sa_family=AF_LOCAL, sun_path="/var/lib/neutron/metadata_proxy"}, 110) = -1 EACCES (Permission denied)
close(2)                                = 0
epoll_wait(0, [], 200, 0)               = 0
sendto(1, "HTTP/1.0 503 Service Unavailable"..., 212, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_MORE, NULL, 0) = 212
shutdown(1, SHUT_WR)                    = 0
close(1)                                = 0
sendto(5, "<134>May 14 15:25:50 haproxy[192"..., 170, MSG_DONTWAIT|MSG_NOSIGNAL, {sa_family=AF_LOCAL, sun_path="/dev/log"}, 110) = 170
epoll_wait(0, [], 200, 1000)            = 0
epoll_wait(0, ^Cstrace: Process 192689 detached
 <detached ...>

[root@controller-0 heat-admin]# ps axu | grep 192689
neutron   192689  0.0  0.0  47932  1296 ?        Ss   00:32   0:02 haproxy -f /var/lib/neutron/ns-metadata-proxy/c3e0daee-e103-40cd-951a-9ac4afdd19a5.conf
root      777159  0.0  0.0 112708   980 pts/1    S+   15:55   0:00 grep --color=auto 192689

[root@controller-0 heat-admin]# ls -l /var/lib/neutron/metadata_proxy
srw-r--r--+ 1 42435 42435 0 May 14 02:53 /var/lib/neutron/metadata_proxy

[root@controller-0 heat-admin]# getfacl /var/lib/neutron/metadata_proxy
getfacl: Removing leading '/' from absolute path names
# file: var/lib/neutron/metadata_proxy
# owner: 42435
# group: 42435
user::rw-
user:neutron:rw-		#effective:r--
group::r-x			#effective:r--
mask::r--
other::r--

Comment 4 Marius Cornea 2018-05-14 16:52:32 UTC
Fix for bug 1563443 is going to address this issue as well.

*** This bug has been marked as a duplicate of bug 1563443 ***