Description of problem: FFU: metadata service http://169.254.169.254 becomes unavailable for overcloud instances during ffwd-upgrade run. This means that overcloud instances that get rebooted during the upgrade process cannot boot successfuly in the eventuality they rely on data provided by the metadata service at boot time. Moreover the metadata service remains unreachable even after the fast forward upgrade process has finished. Version-Release number of selected component (if applicable): openstack-tripleo-heat-templates-8.0.2-17.el7ost.noarch How reproducible: 100% Steps to Reproduce: 1. Deploy OSP10 2. Upgrade undercloud to 11/12/13 3. Run openstack overcloud ffwd-upgrade prepare 4. Run openstack overcloud ffwd-upgrade run 5. While upgrade is running log in to an instance running on the overcloud and try to reach the metadata service via curl http://169.254.169.254 Actual results: <html><body><h1>503 Service Unavailable</h1> No server is available to handle this request. </body></html> Expected results: The metadata service is reachable for existing instances during and after upgrade. Additional info: Attaching sosreports.
$ traceroute 169.254.169.254 traceroute to 169.254.169.254 (169.254.169.254), 30 hops max, 46 byte packets 1 host-192-168-0-1.openstacklocal (192.168.0.1) 0.467 ms 0.314 ms 0.505 ms 2 10.0.0.1 (10.0.0.1) 0.684 ms 0.561 ms 0.634 ms 3 10.9.76.254 (10.9.76.254) 13.433 ms 18.293 ms 21.940 ms 4 10.10.191.120 (10.10.191.120) 1.442 ms 1.098 ms 2.427 ms 5 10.11.0.150 (10.11.0.150) 14.990 ms 13.242 ms 22.287 ms 6 10.11.0.157 (10.11.0.157) 16.262 ms 17.254 ms 19.904 ms 7 66.187.233.253 (66.187.233.253) 19.192 ms 66.187.233.252 (66.187.233.252) 21.013 ms 66.187.233.253 (66.187.233.253) 15.096 ms 8 209.132.190.198 (209.132.190.198) 0.865 ms 0.810 ms 0.834 ms
Looks like a permission issue related to moving the neutron haproxy processses to containers(note that https://review.openstack.org/#/c/567655/ is already included when this issue is observed): [root@controller-0 heat-admin]# ip netns exec qrouter-c3e0daee-e103-40cd-951a-9ac4afdd19a5 netstat -tupan Active Internet connections (servers and established) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 0.0.0.0:9697 0.0.0.0:* LISTEN 192689/haproxy tcp 0 0 192.168.0.1:9697 192.168.0.5:40316 TIME_WAIT - tcp 0 0 192.168.0.1:9697 192.168.0.5:40317 TIME_WAIT - [root@controller-0 heat-admin]# strace -p 192689 strace: Process 192689 attached epoll_wait(0, [], 200, 1000) = 0 epoll_wait(0, [{EPOLLIN, {u32=4, u64=4}}], 200, 1000) = 1 accept4(4, {sa_family=AF_INET, sin_port=htons(40318), sin_addr=inet_addr("192.168.0.5")}, [16], SOCK_NONBLOCK) = 1 setsockopt(1, SOL_TCP, TCP_NODELAY, [1], 4) = 0 accept4(4, 0x7ffe0e3360c0, 0x7ffe0e3360bc, SOCK_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable) recvfrom(1, "GET / HTTP/1.1\r\nUser-Agent: curl"..., 8192, 0, NULL, NULL) = 142 getsockname(1, {sa_family=AF_INET, sin_port=htons(9697), sin_addr=inet_addr("192.168.0.1")}, [16]) = 0 getsockopt(1, SOL_IP, 0x50 /* IP_??? */, "\2\0\0P\251\376\251\376\0\0\0\0\0\0\0\0", [16]) = 0 socket(AF_LOCAL, SOCK_STREAM, 0) = 2 fcntl(2, F_SETFL, O_RDONLY|O_NONBLOCK) = 0 connect(2, {sa_family=AF_LOCAL, sun_path="/var/lib/neutron/metadata_proxy"}, 110) = -1 EACCES (Permission denied) close(2) = 0 socket(AF_LOCAL, SOCK_STREAM, 0) = 2 fcntl(2, F_SETFL, O_RDONLY|O_NONBLOCK) = 0 connect(2, {sa_family=AF_LOCAL, sun_path="/var/lib/neutron/metadata_proxy"}, 110) = -1 EACCES (Permission denied) close(2) = 0 socket(AF_LOCAL, SOCK_STREAM, 0) = 2 fcntl(2, F_SETFL, O_RDONLY|O_NONBLOCK) = 0 connect(2, {sa_family=AF_LOCAL, sun_path="/var/lib/neutron/metadata_proxy"}, 110) = -1 EACCES (Permission denied) close(2) = 0 socket(AF_LOCAL, SOCK_STREAM, 0) = 2 fcntl(2, F_SETFL, O_RDONLY|O_NONBLOCK) = 0 connect(2, {sa_family=AF_LOCAL, sun_path="/var/lib/neutron/metadata_proxy"}, 110) = -1 EACCES (Permission denied) close(2) = 0 epoll_wait(0, [], 200, 0) = 0 sendto(1, "HTTP/1.0 503 Service Unavailable"..., 212, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_MORE, NULL, 0) = 212 shutdown(1, SHUT_WR) = 0 close(1) = 0 sendto(5, "<134>May 14 15:25:50 haproxy[192"..., 170, MSG_DONTWAIT|MSG_NOSIGNAL, {sa_family=AF_LOCAL, sun_path="/dev/log"}, 110) = 170 epoll_wait(0, [], 200, 1000) = 0 epoll_wait(0, ^Cstrace: Process 192689 detached <detached ...> [root@controller-0 heat-admin]# ps axu | grep 192689 neutron 192689 0.0 0.0 47932 1296 ? Ss 00:32 0:02 haproxy -f /var/lib/neutron/ns-metadata-proxy/c3e0daee-e103-40cd-951a-9ac4afdd19a5.conf root 777159 0.0 0.0 112708 980 pts/1 S+ 15:55 0:00 grep --color=auto 192689 [root@controller-0 heat-admin]# ls -l /var/lib/neutron/metadata_proxy srw-r--r--+ 1 42435 42435 0 May 14 02:53 /var/lib/neutron/metadata_proxy [root@controller-0 heat-admin]# getfacl /var/lib/neutron/metadata_proxy getfacl: Removing leading '/' from absolute path names # file: var/lib/neutron/metadata_proxy # owner: 42435 # group: 42435 user::rw- user:neutron:rw- #effective:r-- group::r-x #effective:r-- mask::r-- other::r--
Fix for bug 1563443 is going to address this issue as well. *** This bug has been marked as a duplicate of bug 1563443 ***