DescriptionArtem Hrechanychenko
2018-05-04 13:00:20 UTC
Description of problem:
I cannot deploy instance after reboot of OC nodes:
https://rhos-ci-staging-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DF%20Current%20release/job/DFG-df-13-deployment-7.5-virthost-3cont_3comp_3ceph-no_UC_SSL-no_OC_SSL-ceph-ipv4-vxlan-RHELOSP-31820/6/consoleFull
fault | {"message": "Exceeded maximum number of retries. Exceeded max scheduling attempts 3 for instance fcf97214-2d4f-4656-8600-f27c5fccde9e. Last exception: Binding failed for port b9bc7060-303b-4994-817e-acf8e836eba7, please check neutron logs for more information.", "code": 500, "details": " File \"/usr/lib/python2.7/site-packages/nova/conductor/manager.py\", line 566, in build_instances
But at the same time I was able to start instance which was create before reboot of OC nodes
4d266054-eaac-43b4-a5d9-9de92dfeafb7 | after_deploy | ACTIVE | - | Running | tenantvxlan=192.168.32.7, 10.0.0.176 |
most interesting from logs:
sudo grep "b9bc7060-303b-4994-817e-acf8e836eba7" -R /var/log/containers/neutron/ - http://pastebin.test.redhat.com/586112
Refusing to bind port b9bc7060-303b-4994-817e-acf8e836eba7 to dead agent: {'binary': u'neutron-openvswitch-agent', 'description': None, 'availability_zone': None, 'heartbeat_timestamp': datetime.datetime(2018, 5, 3, 22, 59, 54), 'admin_state_up': True, 'alive': False, 'topic': u'N/A', 'host': u'compute-2.localdomain', 'agent_type': u'Open vSwitch agent', 'resource_versions': {u'Subnet': u'1.0', u'Log': u'1.0', u'SubPort': u'1.0', u'SecurityGroup': u'1.0', u'SecurityGroupRule': u'1.0', u'Trunk': u'1.1', u'QosPolicy': u'1.7', u'Port': u'1.1', u'Network': u'1.0'}, 'created_at': datetime.datetime(2018, 5, 3, 18, 2, 31), 'started_at': datetime.datetime(2018, 5, 3, 18, 27, 54), 'id': 'daa5769a-3a02-4fef-8fed-a25fe35529ad', 'configurations': {u'ovs_hybrid_plug': True, u'in_distributed_mode': False, u'datapath_type': u'system', u'arp_responder_enabled': False, u'tunneling_ip': u'172.17.2.17', u'vhostuser_socket_dir': u'/var/run/openvswitch', u'devices': 1, u'ovs_capabilities': {u'datapath_types': [u'netdev', u'system'], u'iface_types': [u'geneve', u'gre', u'internal', u'lisp', u'patch', u'stt', u'system', u'tap', u'vxlan']}, u'extensions': [u'qos'], u'l2_population': False, u'tunnel_types': [u'vxlan'], u'log_agent_heartbeats': False, u'enable_distributed_routing': False, u'bridge_mappings': {u'datacentre': u'br-ex', u'tenant': u'br-isolated'}}}
/var/log/containers/neutron/server.log:2018-05-04 09:08:47.407 27 ERROR neutron.plugins.ml2.managers [req-b911d91a-f0be-4d2e-8a89-e429f8019c0a eaba5c2057a14a0aa057859fc1eea1d1 c1d9a1aa57f149e6b4fa7eed7416daf7 - default default] Failed to bind port b9bc7060-303b-4994-817e-acf8e836eba7 on host compute-2.localdomain for vnic_type normal using segments [{'network_id': '5e19a278-c1ec-4035-aed9-e019804b65f3', 'segmentation_id': 10, 'physical_network': None, 'id': '057af317-1d4f-4807-914c-b0a22073d9e1', 'network_type': u'vxlan'}]
http://pastebin.test.redhat.com/586121
Version-Release number of selected component (if applicable):
OSP 13 puddle - 2018-05-02.5
How reproducible:
always
Steps to Reproduce:
1.Deploy OSP13 with 3ctr+3com+3ceph+ OVN(default) using puddle = 2018-05-02.5
2.Deploy instance in OC
3. reboot of OC nodes one by one to simulate rack outage
4. Nova start instance from step #2
5. Deploy new instance
Actual results:
Failed on step#5
Expected results:
Instance was create and reachable via floating ip
Additional info:
Comment 3Artem Hrechanychenko
2018-05-04 13:57:12 UTC