Bug 1577945 - FFU: metadata service http://169.254.169.254 becomes unavailable for overcloud instances during ffwd-upgrade run
Summary: FFU: metadata service http://169.254.169.254 becomes unavailable for overclou...
Keywords:
Status: CLOSED DUPLICATE of bug 1563443
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: ---
Assignee: RHOS Maint
QA Contact: Amit Ugol
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-05-14 13:33 UTC by Marius Cornea
Modified: 2018-05-14 16:52 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-05-14 16:52:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Marius Cornea 2018-05-14 13:33:59 UTC
Description of problem:

FFU: metadata service http://169.254.169.254 becomes unavailable for overcloud instances during ffwd-upgrade run.

This means that overcloud instances that get rebooted during the upgrade process cannot boot successfuly in the eventuality they rely on data provided by the metadata service at boot time.

Moreover the metadata service remains unreachable even after the fast forward upgrade process has finished.

Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-8.0.2-17.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Deploy OSP10 
2. Upgrade undercloud to 11/12/13
3. Run openstack overcloud ffwd-upgrade prepare
4. Run openstack overcloud ffwd-upgrade run
5. While upgrade is running log in to an instance running on the overcloud and try to reach the metadata service via curl http://169.254.169.254

Actual results:

<html><body><h1>503 Service Unavailable</h1>
No server is available to handle this request.
</body></html>


Expected results:
The metadata service is reachable for existing instances during and after upgrade. 

Additional info:
Attaching sosreports.

Comment 2 Lukas Bezdicka 2018-05-14 15:16:30 UTC
$ traceroute 169.254.169.254
traceroute to 169.254.169.254 (169.254.169.254), 30 hops max, 46 byte packets
 1  host-192-168-0-1.openstacklocal (192.168.0.1)  0.467 ms  0.314 ms  0.505 ms
 2  10.0.0.1 (10.0.0.1)  0.684 ms  0.561 ms  0.634 ms
 3  10.9.76.254 (10.9.76.254)  13.433 ms  18.293 ms  21.940 ms
 4  10.10.191.120 (10.10.191.120)  1.442 ms  1.098 ms  2.427 ms
 5  10.11.0.150 (10.11.0.150)  14.990 ms  13.242 ms  22.287 ms
 6  10.11.0.157 (10.11.0.157)  16.262 ms  17.254 ms  19.904 ms
 7  66.187.233.253 (66.187.233.253)  19.192 ms  66.187.233.252 (66.187.233.252)  21.013 ms  66.187.233.253 (66.187.233.253)  15.096 ms
 8  209.132.190.198 (209.132.190.198)  0.865 ms  0.810 ms  0.834 ms

Comment 3 Marius Cornea 2018-05-14 15:56:02 UTC
Looks like a permission issue related to moving the neutron haproxy processses to containers(note that https://review.openstack.org/#/c/567655/ is already included when this issue is observed):

[root@controller-0 heat-admin]# ip netns exec qrouter-c3e0daee-e103-40cd-951a-9ac4afdd19a5 netstat -tupan
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 0.0.0.0:9697            0.0.0.0:*               LISTEN      192689/haproxy      
tcp        0      0 192.168.0.1:9697        192.168.0.5:40316       TIME_WAIT   -                   
tcp        0      0 192.168.0.1:9697        192.168.0.5:40317       TIME_WAIT   -                   
[root@controller-0 heat-admin]# strace -p 192689
strace: Process 192689 attached
epoll_wait(0, [], 200, 1000)            = 0
epoll_wait(0, [{EPOLLIN, {u32=4, u64=4}}], 200, 1000) = 1
accept4(4, {sa_family=AF_INET, sin_port=htons(40318), sin_addr=inet_addr("192.168.0.5")}, [16], SOCK_NONBLOCK) = 1
setsockopt(1, SOL_TCP, TCP_NODELAY, [1], 4) = 0
accept4(4, 0x7ffe0e3360c0, 0x7ffe0e3360bc, SOCK_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(1, "GET / HTTP/1.1\r\nUser-Agent: curl"..., 8192, 0, NULL, NULL) = 142
getsockname(1, {sa_family=AF_INET, sin_port=htons(9697), sin_addr=inet_addr("192.168.0.1")}, [16]) = 0
getsockopt(1, SOL_IP, 0x50 /* IP_??? */, "\2\0\0P\251\376\251\376\0\0\0\0\0\0\0\0", [16]) = 0
socket(AF_LOCAL, SOCK_STREAM, 0)        = 2
fcntl(2, F_SETFL, O_RDONLY|O_NONBLOCK)  = 0
connect(2, {sa_family=AF_LOCAL, sun_path="/var/lib/neutron/metadata_proxy"}, 110) = -1 EACCES (Permission denied)
close(2)                                = 0
socket(AF_LOCAL, SOCK_STREAM, 0)        = 2
fcntl(2, F_SETFL, O_RDONLY|O_NONBLOCK)  = 0
connect(2, {sa_family=AF_LOCAL, sun_path="/var/lib/neutron/metadata_proxy"}, 110) = -1 EACCES (Permission denied)
close(2)                                = 0
socket(AF_LOCAL, SOCK_STREAM, 0)        = 2
fcntl(2, F_SETFL, O_RDONLY|O_NONBLOCK)  = 0
connect(2, {sa_family=AF_LOCAL, sun_path="/var/lib/neutron/metadata_proxy"}, 110) = -1 EACCES (Permission denied)
close(2)                                = 0
socket(AF_LOCAL, SOCK_STREAM, 0)        = 2
fcntl(2, F_SETFL, O_RDONLY|O_NONBLOCK)  = 0
connect(2, {sa_family=AF_LOCAL, sun_path="/var/lib/neutron/metadata_proxy"}, 110) = -1 EACCES (Permission denied)
close(2)                                = 0
epoll_wait(0, [], 200, 0)               = 0
sendto(1, "HTTP/1.0 503 Service Unavailable"..., 212, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_MORE, NULL, 0) = 212
shutdown(1, SHUT_WR)                    = 0
close(1)                                = 0
sendto(5, "<134>May 14 15:25:50 haproxy[192"..., 170, MSG_DONTWAIT|MSG_NOSIGNAL, {sa_family=AF_LOCAL, sun_path="/dev/log"}, 110) = 170
epoll_wait(0, [], 200, 1000)            = 0
epoll_wait(0, ^Cstrace: Process 192689 detached
 <detached ...>

[root@controller-0 heat-admin]# ps axu | grep 192689
neutron   192689  0.0  0.0  47932  1296 ?        Ss   00:32   0:02 haproxy -f /var/lib/neutron/ns-metadata-proxy/c3e0daee-e103-40cd-951a-9ac4afdd19a5.conf
root      777159  0.0  0.0 112708   980 pts/1    S+   15:55   0:00 grep --color=auto 192689

[root@controller-0 heat-admin]# ls -l /var/lib/neutron/metadata_proxy
srw-r--r--+ 1 42435 42435 0 May 14 02:53 /var/lib/neutron/metadata_proxy

[root@controller-0 heat-admin]# getfacl /var/lib/neutron/metadata_proxy
getfacl: Removing leading '/' from absolute path names
# file: var/lib/neutron/metadata_proxy
# owner: 42435
# group: 42435
user::rw-
user:neutron:rw-		#effective:r--
group::r-x			#effective:r--
mask::r--
other::r--

Comment 4 Marius Cornea 2018-05-14 16:52:32 UTC
Fix for bug 1563443 is going to address this issue as well.

*** This bug has been marked as a duplicate of bug 1563443 ***


Note You need to log in before you can comment on or make changes to this bug.