Bug 1460116

Summary: Cannot upgrade OSP11 to OSP12: Overcloud upgrade failed due to deleted masquerade rules for br-ctrplane network and overcloud nodes cannot reach external repos
Product: Red Hat OpenStack Reporter: Artem Hrechanychenko <ahrechan>
Component: rhosp-directorAssignee: Lee Yarwood <lyarwood>
Status: CLOSED DUPLICATE QA Contact: Marius Cornea <mcornea>
Severity: high Docs Contact:
Priority: high    
Version: 12.0 (Pike)CC: afazekas, ahrechan, amuller, bhaley, dbecker, emacchi, ihrachys, mbultel, mburns, mcornea, morazi, rhel-osp-director-maint, sathlang, yprokule
Target Milestone: gaKeywords: AutomationBlocker, Reopened, Triaged
Target Release: 12.0 (Pike)   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-09-06 12:09:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1481207    
Bug Blocks: 1399762    
Attachments:
Description Flags
journalctl for net manager from controller node
none
journalctl for net manager from compute node none

Description Artem Hrechanychenko 2017-06-09 07:33:11 UTC
Description of problem:
Cannot upgrade OSP11 to OSP12: During overcloud upgrade compute node unable to resolve dns addresses 
    2017-06-09 06:50:09Z [overcloud-Compute-uhltm6xf3cuy-0-e7lm6iyleegz.NovaComputeUpgradeInitDeployment]: SIGNAL_IN_PROGRESS  Signal: deployment 6172f71e-8160-44bc-a432-8b902580d044 failed (6)
    2017-06-09 06:50:10Z [overcloud-Compute-uhltm6xf3cuy-0-e7lm6iyleegz.NovaComputeUpgradeInitDeployment]: UPDATE_FAILED  Error: resources.NovaComputeUpgradeInitDeployment: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 6
    2017-06-09 06:50:10Z [overcloud-Compute-uhltm6xf3cuy-0-e7lm6iyleegz.UpdateDeployment]: UPDATE_FAILED  UPDATE aborted
    2017-06-09 06:50:10Z [overcloud-Compute-uhltm6xf3cuy-0-e7lm6iyleegz]: UPDATE_FAILED  Error: resources.NovaComputeUpgradeInitDeployment: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 6
    2017-06-09 06:50:10Z [overcloud-Compute-uhltm6xf3cuy-0-e7lm6iyleegz.UpdateDeployment]: SIGNAL_FAILED  Signal: deployment 41542ebf-7b7d-4c2b-b9ac-fd896fac1191 succeeded
    2017-06-09 06:50:11Z [overcloud-Compute-uhltm6xf3cuy.0]: UPDATE_FAILED  resources[0]: Error: resources.NovaComputeUpgradeInitDeployment: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 6
    2017-06-09 06:50:11Z [overcloud-Compute-uhltm6xf3cuy]: UPDATE_FAILED  resources[0]: Error: resources.NovaComputeUpgradeInitDeployment: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 6
    2017-06-09 06:50:11Z [Compute]: UPDATE_FAILED  resources.Compute: resources[0]: Error: resources.NovaComputeUpgradeInitDeployment: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 6
    2017-06-09 06:50:11Z [Controller]: UPDATE_FAILED  UPDATE aborted
    2017-06-09 06:50:11Z [overcloud]: UPDATE_FAILED  resources.Compute: resources[0]: Error: resources.NovaComputeUpgradeInitDeployment: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 6
    2017-06-09 06:50:12Z [overcloud-Controller-v23m5qhmo6g3.0]: UPDATE_FAILED  UPDATE aborted
    2017-06-09 06:50:12Z [overcloud-Controller-v23m5qhmo6g3]: UPDATE_FAILED  Operation cancelled
    2017-06-09 06:50:12Z [overcloud-Controller-v23m5qhmo6g3-0-mmjapthsi6hj.ControllerUpgradeInitDeployment]: UPDATE_FAILED  UPDATE aborted
    2017-06-09 06:50:12Z [overcloud-Controller-v23m5qhmo6g3-0-mmjapthsi6hj.UpdateDeployment]: UPDATE_FAILED  UPDATE aborted
    2017-06-09 06:50:12Z [overcloud-Controller-v23m5qhmo6g3-0-mmjapthsi6hj]: UPDATE_FAILED  Operation cancelled
     
     Stack overcloud UPDATE_FAILED
     
    overcloud.Controller.0.ControllerUpgradeInitDeployment:
      resource_type: OS::Heat::SoftwareDeployment
      physical_resource_id: 1e0acead-f88b-4439-8da5-731c853ed918
      status: UPDATE_FAILED
      status_reason: |
        UPDATE aborted
      deploy_stdout: |
     
      deploy_stderr: |
        ...
          0     0    0     0    0     0      0      0 --:--:--  0:00:10 --:--:--     0
          0     0    0     0    0     0      0      0 --:--:--  0:00:11 --:--:--     0
          0     0    0     0    0     0      0      0 --:--:--  0:00:12 --:--:--     0
          0     0    0     0    0     0      0      0 --:--:--  0:00:13 --:--:--     0
          0     0    0     0    0     0      0      0 --:--:--  0:00:14 --:--:--     0
          0     0    0     0    0     0      0      0 --:--:--  0:00:15 --:--:--     0
          0     0    0     0    0     0      0      0 --:--:--  0:00:16 --:--:--     0
          0     0    0     0    0     0      0      0 --:--:--  0:00:17 --:--:--     0
          0     0    0     0    0     0      0      0 --:--:--  0:00:18 --:--:--     0
          0     0    0     0    0     0      0      0 --:--:--  0:00:19 --:--:--     0curl: (6) Could not resolve host: rhos-release.virt.bos.redhat.com; Unknown error
        (truncated, view all with --long)
    overcloud.Controller.0.UpdateDeployment:
      resource_type: OS::Heat::SoftwareDeployment
      physical_resource_id: 2cf95db9-161f-468c-b121-7e380fbf69cf
      status: UPDATE_FAILED
      status_reason: |
        UPDATE aborted
      deploy_stdout: |
        Started yum_update.sh on server 41048952-0aef-4732-8fb0-9610ff912c65 at Fri Jun  9 06:50:28 UTC 2017
        Not running due to unset update_identifier
      deploy_stderr: |
     
    overcloud.Compute.0.NovaComputeUpgradeInitDeployment:
      resource_type: OS::Heat::SoftwareDeployment
      physical_resource_id: 6172f71e-8160-44bc-a432-8b902580d044
      status: UPDATE_FAILED
      status_reason: |
        Error: resources.NovaComputeUpgradeInitDeployment: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 6
      deploy_stdout: |
     
      deploy_stderr: |
          % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                         Dload  Upload   Total   Spent    Left  Speed
       
          0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0curl: (6) Could not resolve host: rhos-release.virt.bos.redhat.com; Unknown error
    overcloud.Compute.0.UpdateDeployment:
      resource_type: OS::Heat::SoftwareDeployment
      physical_resource_id: 41542ebf-7b7d-4c2b-b9ac-fd896fac1191
      status: UPDATE_FAILED
      status_reason: |
        UPDATE aborted
      deploy_stdout: |
        Started yum_update.sh on server afda508b-fd02-46be-9297-e02c7ba6469c at Fri Jun  9 06:50:09 UTC 2017
        Not running due to unset update_identifier
      deploy_stderr: |




[heat-admin@compute-0 ~]$ ping google.com
ping: google.com: Name or service not known
[heat-admin@compute-0 ~]$ ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=38 time=18.1 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=38 time=18.1 ms
^C
--- 8.8.8.8 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 18.102/18.145/18.189/0.141 ms

[heat-admin@compute-0 ~]$ cat /etc/resolv.conf 
# Generated by NetworkManager
search localdomain

[heat-admin@compute-0 ~]$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:54:00:3d:13:37 brd ff:ff:ff:ff:ff:ff
    inet 192.168.24.16/24 brd 192.168.24.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe3d:1337/64 scope link 
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovs-system state UP qlen 1000
    link/ether 52:54:00:5d:9a:ea brd ff:ff:ff:ff:ff:ff
    inet6 fe80::5054:ff:fe5d:9aea/64 scope link 
       valid_lft forever preferred_lft forever
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:54:00:ef:44:69 brd ff:ff:ff:ff:ff:ff
5: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether 86:af:b8:18:5d:5e brd ff:ff:ff:ff:ff:ff
6: br-isolated: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN qlen 1000
    link/ether 52:54:00:5d:9a:ea brd ff:ff:ff:ff:ff:ff
    inet6 fe80::5054:ff:fe5d:9aea/64 scope link 
       valid_lft forever preferred_lft forever
7: vlan20: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN qlen 1000
    link/ether 66:6e:28:40:4a:07 brd ff:ff:ff:ff:ff:ff
    inet 172.17.1.14/24 brd 172.17.1.255 scope global vlan20
       valid_lft forever preferred_lft forever
    inet6 fe80::646e:28ff:fe40:4a07/64 scope link 
       valid_lft forever preferred_lft forever
8: vlan30: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN qlen 1000
    link/ether a6:fd:9c:a1:14:c9 brd ff:ff:ff:ff:ff:ff
    inet 172.17.3.13/24 brd 172.17.3.255 scope global vlan30
       valid_lft forever preferred_lft forever
    inet6 fe80::a4fd:9cff:fea1:14c9/64 scope link 
       valid_lft forever preferred_lft forever
9: vlan50: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN qlen 1000
    link/ether 06:40:51:a8:73:e2 brd ff:ff:ff:ff:ff:ff
    inet 172.17.2.19/24 brd 172.17.2.255 scope global vlan50
       valid_lft forever preferred_lft forever
    inet6 fe80::440:51ff:fea8:73e2/64 scope link 
       valid_lft forever preferred_lft forever
10: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN qlen 1000
    link/ether f2:48:e7:ca:4b:47 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::f048:e7ff:feca:4b47/64 scope link 
       valid_lft forever preferred_lft forever
11: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether 52:d6:4b:e4:eb:42 brd ff:ff:ff:ff:ff:ff
12: br-tun: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether 7a:4c:e8:c8:a5:4a brd ff:ff:ff:ff:ff:ff
13: vxlan_sys_4789: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65470 qdisc noqueue master ovs-system state UNKNOWN qlen 1000

heat-admin@compute-0 ~]$ route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
default         gateway         0.0.0.0         UG    0      0        0 eth0
169.254.169.254 gateway         255.255.255.255 UGH   0      0        0 eth0
172.17.1.0      0.0.0.0         255.255.255.0   U     0      0        0 vlan20
172.17.2.0      0.0.0.0         255.255.255.0   U     0      0        0 vlan50
172.17.3.0      0.0.0.0         255.255.255.0   U     0      0        0 vlan30
192.168.24.0    0.0.0.0         255.255.255.0   U     0      0        0 eth0

Version-Release number of selected component (if applicable):
OSP12

How reproducible:


Steps to Reproduce:
1) Deploy OSP11 Undercloud & Overcloud
2) perform minor update of undercloud node
>sudo rhos-release 11 -r 7.4
>sudo systemctl stop 'openstack-*' 'neutron-*' httpd
>sudo yum update -y instack-undercloud openstack-puppet-modules openstack-tripleo-common python-tripleoclient
>openstack undercloud upgrade
>sudo reboot

3) upgrade osp11 undrcloud to osp12

>sudo rhos-release 12-director
>sudo systemctl stop openstack-*
>sudo systemctl stop neutron-*
>sudo systemctl stop httpd
>sudo yum -y update instack-undercloud openstack-puppet-modules openstack-tripleo-common python-tripleoclient
>openstack undercloud upgrade

#Overcloud upgrade 
source stackrc

export IMAGE_TAG=`bash /usr/bin/puddle-version http://download-node-02.eng.bos.redhat.com/rcm-guest/puddles/OpenStack/12.0-RHEL-7/latest_containers`

 sudo wget -O /home/stack/overcloud_containers.yaml http://file.rdu.redhat.com/~ohochman/containers/overcloud_containers.yaml
    sudo chown stack:stack /home/stack/overcloud_containers.yaml
    sudo sed -i 's:<IMAGE_TAG>:'$IMAGE_TAG':g' /home/stack/overcloud_containers.yaml
    sudo sed -i 's/--insecure-registry 192.168.24.3:8787/--insecure-registry docker-registry.engineering.redhat.com/' /etc/sysconfig/docker
    sudo service docker restart
    source /home/stack/stackrc && openstack overcloud container image upload --verbose --config-file /home/stack/overcloud_containers.yaml 
    
sudo wget -O /usr/share/openstack-tripleo-heat-templates/environments/docker-osp12.yaml http://file.rdu.redhat.com/~ohochman/containers/docker-osp12.yaml
    sudo chown stack:stack /usr/share/openstack-tripleo-heat-templates/environments/docker-osp12.yaml

    sudo sed -i 's:<IMAGE_TAG>:'$IMAGE_TAG':g' /usr/share/openstack-tripleo-heat-templates/environments/docker-osp12.yaml


# use master repos
cat > ~/containers-upgrade-repos.yaml <<EOEF
parameter_defaults:
  UpgradeInitCommand: |
    set -e
    curl -LO http://rhos-release.virt.bos.redhat.com/repos/rhos-release/rhos-release-latest.noarch.rpm
    rpm -ivh rhos-release-latest.noarch.rpm || true
    rhos-release 12-director
    yum clean all
EOEF

export THT=/usr/share/openstack-tripleo-heat-templates

Apply workaround for https://bugzilla.redhat.com/show_bug.cgi?id=1456458
Apply workaround for https://bugzilla.redhat.com/show_bug.cgi?id=1456562

# upgrade overcloud
openstack overcloud deploy --templates --libvirt-type kvm --ntp-server clock.redhat.com -e /home/stack/virt/network/network-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /home/stack/virt/hostnames.yml -e /home/stack/virt/debug.yaml -e /home/stack/virt/nodes_data.yaml -e $THT/environments/low-memory-usage.yaml -e $THT/environments/docker.yaml -e $THT/environments/major-upgrade-composable-steps-docker.yaml -e ~/containers-upgrade-repos.yaml -e $THT/environments/docker-osp12.yaml --log-file overcloud_deployment_26.log




Actual results:
Overcloud upgrade was failed


Expected results:
Overcloud upgraded successfully

Comment 1 Red Hat Bugzilla Rules Engine 2017-06-09 07:33:16 UTC
This bugzilla has been removed from the release and needs to be reviewed and Triaged for another Target Release.

Comment 2 Artem Hrechanychenko 2017-06-09 15:30:51 UTC
‎lbezdick‎: information that you asked:


[heat-admin@compute-0 ~]$ sudo cat /var/log/yum.log 
[heat-admin@compute-0 ~]$ 



[heat-admin@controller-0 ~]$ sudo cat /var/log/yum.log 
[heat-admin@controller-0 ~]$ 


(undercloud) [stack@undercloud-0 ~]$ systemctl status dnsmasq.service
● dnsmasq.service - DNS caching server.
   Loaded: loaded (/usr/lib/systemd/system/dnsmasq.service; disabled; vendor preset: disabled)
   Active: inactive (dead)

logs:http://pastebin.test.redhat.com/492653

Comment 3 Artem Hrechanychenko 2017-06-09 15:31:38 UTC
Created attachment 1286449 [details]
journalctl for net manager from controller node

Comment 4 Artem Hrechanychenko 2017-06-09 15:32:21 UTC
Created attachment 1286450 [details]
journalctl for net manager from compute node

Comment 5 Artem Hrechanychenko 2017-06-10 08:52:54 UTC
Looks like a conflict between upstream installation using quickstart and infrared installation of OSP11 in the same time. Now didn't reproduced

Comment 6 Artem Hrechanychenko 2017-06-27 10:53:07 UTC
(undercloud) [stack@undercloud-0 ~]$ openstack stack failures list overcloud
overcloud.Compute.0.NovaComputeUpgradeInitDeployment:
  resource_type: OS::Heat::SoftwareDeployment
  physical_resource_id: eab7a5fe-ba64-4da3-885b-3e711bbf179f
  status: UPDATE_FAILED
  status_reason: |
    Error: resources.NovaComputeUpgradeInitDeployment: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 7
  deploy_stdout: |

  deploy_stderr: |
    ...
      0     0    0     0    0     0      0      0 --:--:--  0:01:58 --:--:--     0
      0     0    0     0    0     0      0      0 --:--:--  0:01:59 --:--:--     0
      0     0    0     0    0     0      0      0 --:--:--  0:02:00 --:--:--     0
      0     0    0     0    0     0      0      0 --:--:--  0:02:01 --:--:--     0
      0     0    0     0    0     0      0      0 --:--:--  0:02:02 --:--:--     0
      0     0    0     0    0     0      0      0 --:--:--  0:02:03 --:--:--     0
      0     0    0     0    0     0      0      0 --:--:--  0:02:04 --:--:--     0
      0     0    0     0    0     0      0      0 --:--:--  0:02:05 --:--:--     0
      0     0    0     0    0     0      0      0 --:--:--  0:02:06 --:--:--     0
      0     0    0     0    0     0      0      0 --:--:--  0:02:07 --:--:--     0curl: (7) Failed connect to rhos-release.virt.bos.redhat.com:80; Connection timed out
    (truncated, view all with --long)
overcloud.Compute.0.UpdateDeployment:
  resource_type: OS::Heat::SoftwareDeployment
  physical_resource_id: f960ab75-cd04-4dbb-8a5b-acc0310a2115
  status: UPDATE_FAILED
  status_reason: |
    UPDATE aborted
  deploy_stdout: |
    Started yum_update.sh on server 74d45f6a-6c2c-4ff1-b6f3-5896e33f34a3 at Tue Jun 27 10:19:32 UTC 2017
    Not running due to unset update_identifier
  deploy_stderr: |



During major upgrade masquerade rules for br-ctrlplane network was deleted

(undercloud) [stack@undercloud-0 ~]$ sudo iptables -nL -t nat
Chain PREROUTING (policy ACCEPT)
target     prot opt source               destination         
nova-api-PREROUTING  all  --  0.0.0.0/0            0.0.0.0/0           
DOCKER     all  --  0.0.0.0/0            0.0.0.0/0            ADDRTYPE match dst-type LOCAL

Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         
nova-api-OUTPUT  all  --  0.0.0.0/0            0.0.0.0/0           
DOCKER     all  --  0.0.0.0/0           !127.0.0.0/8          ADDRTYPE match dst-type LOCAL

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination         
MASQUERADE  all  --  172.17.0.0/16        0.0.0.0/0           
nova-api-POSTROUTING  all  --  0.0.0.0/0            0.0.0.0/0           
nova-postrouting-bottom  all  --  0.0.0.0/0            0.0.0.0/0           

Chain DOCKER (2 references)
target     prot opt source               destination         
RETURN     all  --  0.0.0.0/0            0.0.0.0/0           

Chain nova-api-OUTPUT (1 references)
target     prot opt source               destination         

Chain nova-api-POSTROUTING (1 references)
target     prot opt source               destination         

Chain nova-api-PREROUTING (1 references)
target     prot opt source               destination         

Chain nova-api-float-snat (1 references)
target     prot opt source               destination         

Chain nova-api-snat (1 references)
target     prot opt source               destination         
nova-api-float-snat  all  --  0.0.0.0/0            0.0.0.0/0           

Chain nova-postrouting-bottom (1 references)
target     prot opt source               destination         
nova-api-snat  all  --  0.0.0.0/0            0.0.0.0/0 


But must be :
[stack@undercloud-0 ~]$ sudo iptables -nL -t nat
Chain PREROUTING (policy ACCEPT)
target     prot opt source               destination         
nova-api-PREROUTING  all  --  0.0.0.0/0            0.0.0.0/0           
REDIRECT   tcp  --  0.0.0.0/0            169.254.169.254      tcp dpt:80 redir ports 8775
DOCKER     all  --  0.0.0.0/0            0.0.0.0/0            ADDRTYPE match dst-type LOCAL

Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         
nova-api-OUTPUT  all  --  0.0.0.0/0            0.0.0.0/0           
DOCKER     all  --  0.0.0.0/0           !127.0.0.0/8          ADDRTYPE match dst-type LOCAL

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination         
RETURN     all  --  192.168.122.0/24     224.0.0.0/24        
RETURN     all  --  192.168.122.0/24     255.255.255.255     
MASQUERADE  tcp  --  192.168.122.0/24    !192.168.122.0/24     masq ports: 1024-65535
MASQUERADE  udp  --  192.168.122.0/24    !192.168.122.0/24     masq ports: 1024-65535
MASQUERADE  all  --  192.168.122.0/24    !192.168.122.0/24    
MASQUERADE  all  --  172.17.0.0/16        0.0.0.0/0           
nova-api-POSTROUTING  all  --  0.0.0.0/0            0.0.0.0/0           
BOOTSTACK_MASQ  all  --  0.0.0.0/0            0.0.0.0/0           
nova-postrouting-bottom  all  --  0.0.0.0/0            0.0.0.0/0           

Chain BOOTSTACK_MASQ (1 references)
target     prot opt source               destination         
MASQUERADE  all  --  192.168.24.0/24     !192.168.24.0/24     

Chain DOCKER (2 references)
target     prot opt source               destination         
RETURN     all  --  0.0.0.0/0            0.0.0.0/0           

Chain nova-api-OUTPUT (1 references)
target     prot opt source               destination         

Chain nova-api-POSTROUTING (1 references)
target     prot opt source               destination         

Chain nova-api-PREROUTING (1 references)
target     prot opt source               destination         

Chain nova-api-float-snat (1 references)
target     prot opt source               destination         

Chain nova-api-snat (1 references)
target     prot opt source               destination         
nova-api-float-snat  all  --  0.0.0.0/0            0.0.0.0/0           

Chain nova-postrouting-bottom (1 references)
target     prot opt source               destination         
nova-api-snat  all  --  0.0.0.0/0            0.0.0.0/0


w.a - add deleted rules to NAT table

Comment 7 Sofer Athlan-Guyot 2017-06-30 16:28:01 UTC
Hi,

I want to re-inforce the idea here that we have this error only on certain virtual setup.  We had this issue in all release I'm aware off when using certain test env.

Doing something similar to:

if ! /usr/sbin/ip a | grep vlan10; then
    sudo ifup ifcfg-vlan10
fi

if ! sudo /usr/sbin/iptables -L BOOTSTACK_MASQ -nvx -t nat | grep 10.0.0.0; then
    sudo iptables -t nat -A BOOTSTACK_MASQ -o eth0 -s 10.0.0.0/24 -j MASQUERADE
fi

work around the problem.

Nice to get to the bottom of it though as time permit.

Artem, in your particular case can you post which command exactly you use to work around it ?

Comment 8 Marius Cornea 2017-07-11 22:27:27 UTC
I'm seeing the same issue with one of my tests. The masquerade rules get applied based on the undercloud.conf setting (masquerade_network option) for use cases where the undercloud acts as a router for the ctlplane network:

[stack@undercloud-0 ~]$ cat undercloud.conf 
[DEFAULT]
# Network interface on the Undercloud that will be handling the PXE
# boots and DHCP for Overcloud instances. (string value)
local_interface = eth0

# 192.168.24.0 subnet is by default used since RHOS11
local_ip = 192.168.24.1/24
network_gateway = 192.168.24.1
undercloud_public_vip = 192.168.24.2
undercloud_admin_vip = 192.168.24.3
network_cidr = 192.168.24.0/24
masquerade_network = 192.168.24.0/24
dhcp_start = 192.168.24.5
dhcp_end = 192.168.24.24
inspection_iprange = 192.168.24.100,192.168.24.120
undercloud_service_certificate = /etc/pki/instack-certs/undercloud.pem


Resulting iptables rules:
[stack@undercloud-0 ~]$ sudo iptables -nL -t nat
Chain PREROUTING (policy ACCEPT)
target     prot opt source               destination         
nova-api-PREROUTING  all  --  0.0.0.0/0            0.0.0.0/0           
DOCKER     all  --  0.0.0.0/0            0.0.0.0/0            ADDRTYPE match dst-type LOCAL

Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         
nova-api-OUTPUT  all  --  0.0.0.0/0            0.0.0.0/0           
DOCKER     all  --  0.0.0.0/0           !127.0.0.0/8          ADDRTYPE match dst-type LOCAL

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination         
MASQUERADE  all  --  172.17.0.0/16        0.0.0.0/0           
nova-api-POSTROUTING  all  --  0.0.0.0/0            0.0.0.0/0           
nova-postrouting-bottom  all  --  0.0.0.0/0            0.0.0.0/0           

Chain DOCKER (2 references)
target     prot opt source               destination         
RETURN     all  --  0.0.0.0/0            0.0.0.0/0           

Chain nova-api-OUTPUT (1 references)
target     prot opt source               destination         

Chain nova-api-POSTROUTING (1 references)
target     prot opt source               destination         

Chain nova-api-PREROUTING (1 references)
target     prot opt source               destination         

Chain nova-api-float-snat (1 references)
target     prot opt source               destination         

Chain nova-api-snat (1 references)
target     prot opt source               destination         
nova-api-float-snat  all  --  0.0.0.0/0            0.0.0.0/0           

Chain nova-postrouting-bottom (1 references)
target     prot opt source               destination         
nova-api-snat  all  --  0.0.0.0/0            0.0.0.0/0           


It looks like the iptables service failed to start at some point:

[root@undercloud-0 ~]# systemctl status iptables 
● iptables.service - IPv4 firewall with iptables
   Loaded: loaded (/usr/lib/systemd/system/iptables.service; enabled; vendor preset: disabled)
   Active: active (exited) since Tue 2017-07-11 18:21:51 EDT; 4min 40s ago
 Main PID: 29669 (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/iptables.service

Jul 11 18:21:51 undercloud-0.redhat.local systemd[1]: Starting IPv4 firewall with iptables...
Jul 11 18:21:51 undercloud-0.redhat.local iptables.init[29669]: iptables: Applying firewall rules: [  OK  ]
Jul 11 18:21:51 undercloud-0.redhat.local systemd[1]: Started IPv4 firewall with iptables.
[root@undercloud-0 ~]# journalctl -l -u iptables 
-- Logs begin at Tue 2017-07-11 12:54:23 EDT, end at Tue 2017-07-11 18:26:38 EDT. --
Jul 11 12:57:50 undercloud-0.redhat.local systemd[1]: Starting IPv4 firewall with iptables...
Jul 11 12:57:50 undercloud-0.redhat.local iptables.init[13982]: iptables: Applying firewall rules: [  OK  ]
Jul 11 12:57:50 undercloud-0.redhat.local systemd[1]: Started IPv4 firewall with iptables.
Jul 11 14:39:21 undercloud-0.redhat.local systemd[1]: Stopping IPv4 firewall with iptables...
Jul 11 14:39:21 undercloud-0.redhat.local iptables.init[26705]: iptables: Setting chains to policy ACCEPT: raw mangle Another app is currently holding the xtables lock. Perhaps you want to use the -w option?
Jul 11 14:39:21 undercloud-0.redhat.local iptables.init[26705]: nat Another app is currently holding the xtables lock. Perhaps you want to use the -w option?
Jul 11 14:39:21 undercloud-0.redhat.local iptables.init[26705]: filter Another app is currently holding the xtables lock. Perhaps you want to use the -w option?
Jul 11 14:39:21 undercloud-0.redhat.local iptables.init[26705]: [FAILED]
Jul 11 14:39:21 undercloud-0.redhat.local iptables.init[26705]: iptables: Flushing firewall rules: [  OK  ]
Jul 11 14:39:22 undercloud-0.redhat.local iptables.init[26705]: iptables: Unloading modules:  iptable_raw iptable_mangle iptable_nat iptable_filter iptable_filter iptable_mangle iptable_nat iptable_raw ip_tables[FAILED]
Jul 11 14:39:22 undercloud-0.redhat.local systemd[1]: iptables.service: control process exited, code=exited status=9
Jul 11 14:39:22 undercloud-0.redhat.local systemd[1]: Stopped IPv4 firewall with iptables.
Jul 11 14:39:22 undercloud-0.redhat.local systemd[1]: Unit iptables.service entered failed state.
Jul 11 14:39:22 undercloud-0.redhat.local systemd[1]: iptables.service failed.
-- Reboot --
Jul 11 14:39:28 undercloud-0.redhat.local systemd[1]: Starting IPv4 firewall with iptables...
Jul 11 14:39:28 undercloud-0.redhat.local iptables.init[562]: iptables: Applying firewall rules: Another app is currently holding the xtables lock. Perhaps you want to use the -w option?
Jul 11 14:39:28 undercloud-0.redhat.local iptables.init[562]: [FAILED]
Jul 11 14:39:28 undercloud-0.redhat.local systemd[1]: iptables.service: main process exited, code=exited, status=1/FAILURE
Jul 11 14:39:28 undercloud-0.redhat.local systemd[1]: Failed to start IPv4 firewall with iptables.
Jul 11 14:39:28 undercloud-0.redhat.local systemd[1]: Unit iptables.service entered failed state.
Jul 11 14:39:28 undercloud-0.redhat.local systemd[1]: iptables.service failed.
Jul 11 18:21:51 undercloud-0.redhat.local systemd[1]: Starting IPv4 firewall with iptables...
Jul 11 18:21:51 undercloud-0.redhat.local iptables.init[29669]: iptables: Applying firewall rules: [  OK  ]
Jul 11 18:21:51 undercloud-0.redhat.local systemd[1]: Started IPv4 firewall with iptables.

Comment 9 Marius Cornea 2017-07-12 15:51:47 UTC
After undercloud upgrade:

(undercloud) [stack@undercloud-0 ~]$ systemctl status iptables
● iptables.service - IPv4 firewall with iptables
   Loaded: loaded (/usr/lib/systemd/system/iptables.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Wed 2017-07-12 09:25:12 EDT; 2h 20min ago
  Process: 612 ExecStart=/usr/libexec/iptables/iptables.init start (code=exited, status=1/FAILURE)
 Main PID: 612 (code=exited, status=1/FAILURE)

Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.

Workaround: systemctl restart iptables

Comment 10 Attila Fazekas 2017-07-18 03:43:27 UTC
I had similar issue at 12 install time.

Comment 11 Mike Orazi 2017-07-25 15:01:20 UTC
This seems to only happen in certain test environments.  I'm adding Neutron DFG to see if we can flush out what exactly is causing this behavior.

Comment 12 Brian Haley 2017-07-25 15:34:31 UTC
So in the log from comment #8 there is a failure running iptables:

Jul 11 14:39:28 undercloud-0.redhat.local iptables.init[562]: iptables: Applying firewall rules: Another app is currently holding the xtables lock. Perhaps you want to use the -w option?

Can we get more info on the environment, and what other iptables commands might be running?

Comment 14 mathieu bultel 2017-08-21 08:10:12 UTC
Lee,
Can you take a look to this one also, its seems to be under the same iptables set of issues

Comment 15 Assaf Muller 2017-08-22 17:44:09 UTC
Please see the attached Launchpad bug, Ihar is fixing a Neutron issue due to new incompat change introduced to iptables in RHEL 7.4, it looks like it's an issue in this context as well.

Comment 16 Ihar Hrachyshka 2017-08-22 17:49:02 UTC
This is same as https://bugzilla.redhat.com/show_bug.cgi?id=1481207

Comment 17 Ihar Hrachyshka 2017-08-22 17:52:55 UTC
My belief is that we should close-as-duplicate all OSP bugs that are due to iptables service failing because of xlock error; we still need a new bug for Neutron to track functional test failure that also result in xlock, but it's not director related, and not what we witness here.

Comment 18 Assaf Muller 2017-08-28 13:38:12 UTC
*** Bug 1463227 has been marked as a duplicate of this bug. ***

Comment 19 Assaf Muller 2017-08-28 13:38:26 UTC
*** Bug 1465382 has been marked as a duplicate of this bug. ***

Comment 20 mathieu bultel 2017-09-06 12:09:53 UTC

*** This bug has been marked as a duplicate of bug 1481207 ***