Bug 1400384 - instances hanged in "building" status after run "openstack overcloud deploy"
Summary: instances hanged in "building" status after run "openstack overcloud deploy"
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director
Version: 9.0 (Mitaka)
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: async
: ---
Assignee: Angus Thomas
QA Contact: Omri Hochman
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-12-01 03:19 UTC by PURANDHAR SAIRAM MANNIDI
Modified: 2016-12-06 13:58 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-12-06 13:58:40 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description PURANDHAR SAIRAM MANNIDI 2016-12-01 03:19:53 UTC
Description of problem:
instances hanged in "building" status after run "openstack overcloud deploy"

Version-Release number of selected component (if applicable):
RH OSP 9.0

How reproducible:
always

Steps to Reproduce:

openstack overcloud deploy   --templates   ~/templates/my-overcloud/ \
-e ~/templates/my-overcloud/environments/network-isolation.yaml  \
-e ~/templates/network-environment.yaml \
-e ~/templates/userdata.yaml \
-e ~/templates/extra_config.json -e ~/templates/pre-config.yaml \
--control-flavor control --compute-flavor compute \
--ceph-storage-flavor ceph-storage \
--control-scale 1    --compute-scale 1 \
--ceph-storage-scale 0 --block-storage-scale 0  \
--swift-storage-scale 0   \
--neutron-network-type vlan --neutron-disable-tunneling   \
--neutron-bridge-mappings datacentre:br-ex,Date_OVS_vlan_phynet0:br-phynet0  \
--neutron-network-vlan-ranges datacentre:1:4095,Date_OVS_vlan_phynet0:1:4095 \
--ntp-server   172.23.85.106   --libvirt-type  kvm

Actual results:
Nodes struck in building state. Introspection also effected after the deploy.

Expected results:
Deployment should be successful

Additional info:

Comment 2 Chen 2016-12-01 09:55:53 UTC
Hi,

The issue is within the dnsmasq inside the netns. 

tcpdump on tap device can not capture any packets. For the tap device there is a tag=10 on it. I changed it to tag=1 but it reverted back immediately to 10.

[root@director ~]# ovs-vsctl show 
90898546-cc35-420b-9e0f-1f07469a0726
    Bridge br-int
        fail_mode: secure
        Port int-br-ctlplane
            Interface int-br-ctlplane
                type: patch
                options: {peer=phy-br-ctlplane}
        Port br-int
            Interface br-int
                type: internal
        Port "tap3e5b5783-26"
            tag: 10
            Interface "tap3e5b5783-26"
                type: internal
    Bridge br-ctlplane
        fail_mode: secure
        Port "enp129s0f0"
            Interface "enp129s0f0"
        Port br-ctlplane
            Interface br-ctlplane
                type: internal
        Port phy-br-ctlplane
            Interface phy-br-ctlplane
                type: patch
                options: {peer=int-br-ctlplane}
    ovs_version: "2.4.0"

In addition, I don't see any actions=mod_vlan_vid in the flows of the br-int. This looks strange to me as well.

[root@director ~]# ovs-ofctl dump-flows br-int
NXST_FLOW reply (xid=0x4):
 cookie=0xa559d2777a5b6547, duration=54260.142s, table=0, n_packets=14362, n_bytes=1015055, idle_age=30, priority=2,in_port=1 actions=drop
 cookie=0xa559d2777a5b6547, duration=54260.243s, table=0, n_packets=0, n_bytes=0, idle_age=54260, priority=0 actions=NORMAL
 cookie=0xa559d2777a5b6547, duration=54260.238s, table=23, n_packets=0, n_bytes=0, idle_age=54260, priority=0 actions=drop
 cookie=0xa559d2777a5b6547, duration=54260.233s, table=24, n_packets=0, n_bytes=0, idle_age=54260, priority=0 actions=drop
[root@director ~]# 

Best Regards,
Chen

Comment 3 Robin Cernin 2016-12-06 13:58:40 UTC
The issue was resolved: ctlplane network was configured as local instead of flat.


Note You need to log in before you can comment on or make changes to this bug.