Bug 1460116
Summary: | Cannot upgrade OSP11 to OSP12: Overcloud upgrade failed due to deleted masquerade rules for br-ctrplane network and overcloud nodes cannot reach external repos | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Artem Hrechanychenko <ahrechan> | ||||||
Component: | rhosp-director | Assignee: | Lee Yarwood <lyarwood> | ||||||
Status: | CLOSED DUPLICATE | QA Contact: | Marius Cornea <mcornea> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | 12.0 (Pike) | CC: | afazekas, ahrechan, amuller, bhaley, dbecker, emacchi, ihrachys, mbultel, mburns, mcornea, morazi, rhel-osp-director-maint, sathlang, yprokule | ||||||
Target Milestone: | ga | Keywords: | AutomationBlocker, Reopened, Triaged | ||||||
Target Release: | 12.0 (Pike) | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2017-09-06 12:09:53 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | 1481207 | ||||||||
Bug Blocks: | 1399762 | ||||||||
Attachments: |
|
Description
Artem Hrechanychenko
2017-06-09 07:33:11 UTC
This bugzilla has been removed from the release and needs to be reviewed and Triaged for another Target Release. lbezdick: information that you asked: [heat-admin@compute-0 ~]$ sudo cat /var/log/yum.log [heat-admin@compute-0 ~]$ [heat-admin@controller-0 ~]$ sudo cat /var/log/yum.log [heat-admin@controller-0 ~]$ (undercloud) [stack@undercloud-0 ~]$ systemctl status dnsmasq.service ● dnsmasq.service - DNS caching server. Loaded: loaded (/usr/lib/systemd/system/dnsmasq.service; disabled; vendor preset: disabled) Active: inactive (dead) logs:http://pastebin.test.redhat.com/492653 Created attachment 1286449 [details]
journalctl for net manager from controller node
Created attachment 1286450 [details]
journalctl for net manager from compute node
Looks like a conflict between upstream installation using quickstart and infrared installation of OSP11 in the same time. Now didn't reproduced (undercloud) [stack@undercloud-0 ~]$ openstack stack failures list overcloud overcloud.Compute.0.NovaComputeUpgradeInitDeployment: resource_type: OS::Heat::SoftwareDeployment physical_resource_id: eab7a5fe-ba64-4da3-885b-3e711bbf179f status: UPDATE_FAILED status_reason: | Error: resources.NovaComputeUpgradeInitDeployment: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 7 deploy_stdout: | deploy_stderr: | ... 0 0 0 0 0 0 0 0 --:--:-- 0:01:58 --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- 0:01:59 --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- 0:02:00 --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- 0:02:01 --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- 0:02:02 --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- 0:02:03 --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- 0:02:04 --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- 0:02:05 --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- 0:02:06 --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- 0:02:07 --:--:-- 0curl: (7) Failed connect to rhos-release.virt.bos.redhat.com:80; Connection timed out (truncated, view all with --long) overcloud.Compute.0.UpdateDeployment: resource_type: OS::Heat::SoftwareDeployment physical_resource_id: f960ab75-cd04-4dbb-8a5b-acc0310a2115 status: UPDATE_FAILED status_reason: | UPDATE aborted deploy_stdout: | Started yum_update.sh on server 74d45f6a-6c2c-4ff1-b6f3-5896e33f34a3 at Tue Jun 27 10:19:32 UTC 2017 Not running due to unset update_identifier deploy_stderr: | During major upgrade masquerade rules for br-ctrlplane network was deleted (undercloud) [stack@undercloud-0 ~]$ sudo iptables -nL -t nat Chain PREROUTING (policy ACCEPT) target prot opt source destination nova-api-PREROUTING all -- 0.0.0.0/0 0.0.0.0/0 DOCKER all -- 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match dst-type LOCAL Chain INPUT (policy ACCEPT) target prot opt source destination Chain OUTPUT (policy ACCEPT) target prot opt source destination nova-api-OUTPUT all -- 0.0.0.0/0 0.0.0.0/0 DOCKER all -- 0.0.0.0/0 !127.0.0.0/8 ADDRTYPE match dst-type LOCAL Chain POSTROUTING (policy ACCEPT) target prot opt source destination MASQUERADE all -- 172.17.0.0/16 0.0.0.0/0 nova-api-POSTROUTING all -- 0.0.0.0/0 0.0.0.0/0 nova-postrouting-bottom all -- 0.0.0.0/0 0.0.0.0/0 Chain DOCKER (2 references) target prot opt source destination RETURN all -- 0.0.0.0/0 0.0.0.0/0 Chain nova-api-OUTPUT (1 references) target prot opt source destination Chain nova-api-POSTROUTING (1 references) target prot opt source destination Chain nova-api-PREROUTING (1 references) target prot opt source destination Chain nova-api-float-snat (1 references) target prot opt source destination Chain nova-api-snat (1 references) target prot opt source destination nova-api-float-snat all -- 0.0.0.0/0 0.0.0.0/0 Chain nova-postrouting-bottom (1 references) target prot opt source destination nova-api-snat all -- 0.0.0.0/0 0.0.0.0/0 But must be : [stack@undercloud-0 ~]$ sudo iptables -nL -t nat Chain PREROUTING (policy ACCEPT) target prot opt source destination nova-api-PREROUTING all -- 0.0.0.0/0 0.0.0.0/0 REDIRECT tcp -- 0.0.0.0/0 169.254.169.254 tcp dpt:80 redir ports 8775 DOCKER all -- 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match dst-type LOCAL Chain INPUT (policy ACCEPT) target prot opt source destination Chain OUTPUT (policy ACCEPT) target prot opt source destination nova-api-OUTPUT all -- 0.0.0.0/0 0.0.0.0/0 DOCKER all -- 0.0.0.0/0 !127.0.0.0/8 ADDRTYPE match dst-type LOCAL Chain POSTROUTING (policy ACCEPT) target prot opt source destination RETURN all -- 192.168.122.0/24 224.0.0.0/24 RETURN all -- 192.168.122.0/24 255.255.255.255 MASQUERADE tcp -- 192.168.122.0/24 !192.168.122.0/24 masq ports: 1024-65535 MASQUERADE udp -- 192.168.122.0/24 !192.168.122.0/24 masq ports: 1024-65535 MASQUERADE all -- 192.168.122.0/24 !192.168.122.0/24 MASQUERADE all -- 172.17.0.0/16 0.0.0.0/0 nova-api-POSTROUTING all -- 0.0.0.0/0 0.0.0.0/0 BOOTSTACK_MASQ all -- 0.0.0.0/0 0.0.0.0/0 nova-postrouting-bottom all -- 0.0.0.0/0 0.0.0.0/0 Chain BOOTSTACK_MASQ (1 references) target prot opt source destination MASQUERADE all -- 192.168.24.0/24 !192.168.24.0/24 Chain DOCKER (2 references) target prot opt source destination RETURN all -- 0.0.0.0/0 0.0.0.0/0 Chain nova-api-OUTPUT (1 references) target prot opt source destination Chain nova-api-POSTROUTING (1 references) target prot opt source destination Chain nova-api-PREROUTING (1 references) target prot opt source destination Chain nova-api-float-snat (1 references) target prot opt source destination Chain nova-api-snat (1 references) target prot opt source destination nova-api-float-snat all -- 0.0.0.0/0 0.0.0.0/0 Chain nova-postrouting-bottom (1 references) target prot opt source destination nova-api-snat all -- 0.0.0.0/0 0.0.0.0/0 w.a - add deleted rules to NAT table Hi, I want to re-inforce the idea here that we have this error only on certain virtual setup. We had this issue in all release I'm aware off when using certain test env. Doing something similar to: if ! /usr/sbin/ip a | grep vlan10; then sudo ifup ifcfg-vlan10 fi if ! sudo /usr/sbin/iptables -L BOOTSTACK_MASQ -nvx -t nat | grep 10.0.0.0; then sudo iptables -t nat -A BOOTSTACK_MASQ -o eth0 -s 10.0.0.0/24 -j MASQUERADE fi work around the problem. Nice to get to the bottom of it though as time permit. Artem, in your particular case can you post which command exactly you use to work around it ? I'm seeing the same issue with one of my tests. The masquerade rules get applied based on the undercloud.conf setting (masquerade_network option) for use cases where the undercloud acts as a router for the ctlplane network: [stack@undercloud-0 ~]$ cat undercloud.conf [DEFAULT] # Network interface on the Undercloud that will be handling the PXE # boots and DHCP for Overcloud instances. (string value) local_interface = eth0 # 192.168.24.0 subnet is by default used since RHOS11 local_ip = 192.168.24.1/24 network_gateway = 192.168.24.1 undercloud_public_vip = 192.168.24.2 undercloud_admin_vip = 192.168.24.3 network_cidr = 192.168.24.0/24 masquerade_network = 192.168.24.0/24 dhcp_start = 192.168.24.5 dhcp_end = 192.168.24.24 inspection_iprange = 192.168.24.100,192.168.24.120 undercloud_service_certificate = /etc/pki/instack-certs/undercloud.pem Resulting iptables rules: [stack@undercloud-0 ~]$ sudo iptables -nL -t nat Chain PREROUTING (policy ACCEPT) target prot opt source destination nova-api-PREROUTING all -- 0.0.0.0/0 0.0.0.0/0 DOCKER all -- 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match dst-type LOCAL Chain INPUT (policy ACCEPT) target prot opt source destination Chain OUTPUT (policy ACCEPT) target prot opt source destination nova-api-OUTPUT all -- 0.0.0.0/0 0.0.0.0/0 DOCKER all -- 0.0.0.0/0 !127.0.0.0/8 ADDRTYPE match dst-type LOCAL Chain POSTROUTING (policy ACCEPT) target prot opt source destination MASQUERADE all -- 172.17.0.0/16 0.0.0.0/0 nova-api-POSTROUTING all -- 0.0.0.0/0 0.0.0.0/0 nova-postrouting-bottom all -- 0.0.0.0/0 0.0.0.0/0 Chain DOCKER (2 references) target prot opt source destination RETURN all -- 0.0.0.0/0 0.0.0.0/0 Chain nova-api-OUTPUT (1 references) target prot opt source destination Chain nova-api-POSTROUTING (1 references) target prot opt source destination Chain nova-api-PREROUTING (1 references) target prot opt source destination Chain nova-api-float-snat (1 references) target prot opt source destination Chain nova-api-snat (1 references) target prot opt source destination nova-api-float-snat all -- 0.0.0.0/0 0.0.0.0/0 Chain nova-postrouting-bottom (1 references) target prot opt source destination nova-api-snat all -- 0.0.0.0/0 0.0.0.0/0 It looks like the iptables service failed to start at some point: [root@undercloud-0 ~]# systemctl status iptables ● iptables.service - IPv4 firewall with iptables Loaded: loaded (/usr/lib/systemd/system/iptables.service; enabled; vendor preset: disabled) Active: active (exited) since Tue 2017-07-11 18:21:51 EDT; 4min 40s ago Main PID: 29669 (code=exited, status=0/SUCCESS) CGroup: /system.slice/iptables.service Jul 11 18:21:51 undercloud-0.redhat.local systemd[1]: Starting IPv4 firewall with iptables... Jul 11 18:21:51 undercloud-0.redhat.local iptables.init[29669]: iptables: Applying firewall rules: [ OK ] Jul 11 18:21:51 undercloud-0.redhat.local systemd[1]: Started IPv4 firewall with iptables. [root@undercloud-0 ~]# journalctl -l -u iptables -- Logs begin at Tue 2017-07-11 12:54:23 EDT, end at Tue 2017-07-11 18:26:38 EDT. -- Jul 11 12:57:50 undercloud-0.redhat.local systemd[1]: Starting IPv4 firewall with iptables... Jul 11 12:57:50 undercloud-0.redhat.local iptables.init[13982]: iptables: Applying firewall rules: [ OK ] Jul 11 12:57:50 undercloud-0.redhat.local systemd[1]: Started IPv4 firewall with iptables. Jul 11 14:39:21 undercloud-0.redhat.local systemd[1]: Stopping IPv4 firewall with iptables... Jul 11 14:39:21 undercloud-0.redhat.local iptables.init[26705]: iptables: Setting chains to policy ACCEPT: raw mangle Another app is currently holding the xtables lock. Perhaps you want to use the -w option? Jul 11 14:39:21 undercloud-0.redhat.local iptables.init[26705]: nat Another app is currently holding the xtables lock. Perhaps you want to use the -w option? Jul 11 14:39:21 undercloud-0.redhat.local iptables.init[26705]: filter Another app is currently holding the xtables lock. Perhaps you want to use the -w option? Jul 11 14:39:21 undercloud-0.redhat.local iptables.init[26705]: [FAILED] Jul 11 14:39:21 undercloud-0.redhat.local iptables.init[26705]: iptables: Flushing firewall rules: [ OK ] Jul 11 14:39:22 undercloud-0.redhat.local iptables.init[26705]: iptables: Unloading modules: iptable_raw iptable_mangle iptable_nat iptable_filter iptable_filter iptable_mangle iptable_nat iptable_raw ip_tables[FAILED] Jul 11 14:39:22 undercloud-0.redhat.local systemd[1]: iptables.service: control process exited, code=exited status=9 Jul 11 14:39:22 undercloud-0.redhat.local systemd[1]: Stopped IPv4 firewall with iptables. Jul 11 14:39:22 undercloud-0.redhat.local systemd[1]: Unit iptables.service entered failed state. Jul 11 14:39:22 undercloud-0.redhat.local systemd[1]: iptables.service failed. -- Reboot -- Jul 11 14:39:28 undercloud-0.redhat.local systemd[1]: Starting IPv4 firewall with iptables... Jul 11 14:39:28 undercloud-0.redhat.local iptables.init[562]: iptables: Applying firewall rules: Another app is currently holding the xtables lock. Perhaps you want to use the -w option? Jul 11 14:39:28 undercloud-0.redhat.local iptables.init[562]: [FAILED] Jul 11 14:39:28 undercloud-0.redhat.local systemd[1]: iptables.service: main process exited, code=exited, status=1/FAILURE Jul 11 14:39:28 undercloud-0.redhat.local systemd[1]: Failed to start IPv4 firewall with iptables. Jul 11 14:39:28 undercloud-0.redhat.local systemd[1]: Unit iptables.service entered failed state. Jul 11 14:39:28 undercloud-0.redhat.local systemd[1]: iptables.service failed. Jul 11 18:21:51 undercloud-0.redhat.local systemd[1]: Starting IPv4 firewall with iptables... Jul 11 18:21:51 undercloud-0.redhat.local iptables.init[29669]: iptables: Applying firewall rules: [ OK ] Jul 11 18:21:51 undercloud-0.redhat.local systemd[1]: Started IPv4 firewall with iptables. After undercloud upgrade: (undercloud) [stack@undercloud-0 ~]$ systemctl status iptables ● iptables.service - IPv4 firewall with iptables Loaded: loaded (/usr/lib/systemd/system/iptables.service; enabled; vendor preset: disabled) Active: failed (Result: exit-code) since Wed 2017-07-12 09:25:12 EDT; 2h 20min ago Process: 612 ExecStart=/usr/libexec/iptables/iptables.init start (code=exited, status=1/FAILURE) Main PID: 612 (code=exited, status=1/FAILURE) Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable. Workaround: systemctl restart iptables I had similar issue at 12 install time. This seems to only happen in certain test environments. I'm adding Neutron DFG to see if we can flush out what exactly is causing this behavior. So in the log from comment #8 there is a failure running iptables: Jul 11 14:39:28 undercloud-0.redhat.local iptables.init[562]: iptables: Applying firewall rules: Another app is currently holding the xtables lock. Perhaps you want to use the -w option? Can we get more info on the environment, and what other iptables commands might be running? Lee, Can you take a look to this one also, its seems to be under the same iptables set of issues Please see the attached Launchpad bug, Ihar is fixing a Neutron issue due to new incompat change introduced to iptables in RHEL 7.4, it looks like it's an issue in this context as well. This is same as https://bugzilla.redhat.com/show_bug.cgi?id=1481207 My belief is that we should close-as-duplicate all OSP bugs that are due to iptables service failing because of xlock error; we still need a new bug for Neutron to track functional test failure that also result in xlock, but it's not director related, and not what we witness here. *** Bug 1463227 has been marked as a duplicate of this bug. *** *** Bug 1465382 has been marked as a duplicate of this bug. *** *** This bug has been marked as a duplicate of bug 1481207 *** |