Bug 1189836 - nova-compute fails to start when there is an instance with port with binding:vif_type=binding_failed
Summary: nova-compute fails to start when there is an instance with port with binding:...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 5.0 (RHEL 7)
Hardware: x86_64
OS: Linux
high
urgent
Target Milestone: z4
: 5.0 (RHEL 7)
Assignee: Artom Lifshitz
QA Contact: nlevinki
URL:
Whiteboard:
Depends On: 1199106
Blocks: 1190582
TreeView+ depends on / blocked
 
Reported: 2015-02-05 15:08 UTC by Tzach Shefi
Modified: 2022-07-09 07:19 UTC (History)
16 users (show)

Fixed In Version: openstack-nova-2014.1.4-1.el7ost
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1190582 (view as bug list)
Environment:
Last Closed: 2015-04-16 14:35:35 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Nova compute log and messages (800.39 KB, application/x-gzip)
2015-02-05 15:08 UTC, Tzach Shefi
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1419452 0 None None None Never
OpenStack gerrit 129158 0 None MERGED Compute: Catch binding failed exception while init host 2021-02-05 04:21:33 UTC
OpenStack gerrit 160541 0 None MERGED Compute: Catch binding failed exception while init host 2021-02-05 04:21:33 UTC
Red Hat Product Errata RHSA-2015:0843 0 normal SHIPPED_LIVE Important: openstack-nova security, bug fix, and enhancement update 2015-04-16 18:27:45 UTC

Description Tzach Shefi 2015-02-05 15:08:03 UTC
Created attachment 988535 [details]
Nova compute log and messages

Description of problem: On an HA deployment one of my compute nodes's compute services failed, restarting service doesn't help failed again
Attached nova and messages log. 

Version-Release number of selected component (if applicable):
rhel7
python-nova-2014.1.3-9.el7ost.noarch
openstack-nova-compute-2014.1.3-9.el7ost.noarch
openstack-nova-common-2014.1.3-9.el7ost.noarch
python-novaclient-2.17.0-2.el7ost.noarch
openstack-nova-novncproxy-2014.1.3-9.el7ost.noarch


How reproducible:
Unsure first time I see this

Steps to Reproduce:
1.  Service failed, restarring it then failed again. 
2.
3.

Actual results:
Nova compute service failed

Expected results:


Additional info:
Adding compute.log and messages log.

Comment 1 Kashyap Chamarthy 2015-02-05 15:33:18 UTC
Looking at your error message (one of the most common in Nova):

    2015-02-05 16:52:30.557 8107 TRACE nova.openstack.common.threadgroup NovaException: Unexpected vif_type=binding_failed

On a quick look ug looks like a duplicate of (which was closed as an
environment issue)

    https://bugzilla.redhat.com/show_bug.cgi?id=1183253 --  Nova boot
    failed in _build_and_run_instance --- Unexpected
    vif_type=binding_failed


It (your bug #1189836) most likely is not a "bug". Why?

  - First, the error means Neutron and Nova could not communicate for 
    any number of reasons (quoting from the bug I linked, 1183253):

      - neutron (-server) was unable to find an OVS agent with the 
        appropriate hostname
      - Do you have a running openvswitch-agent log?
      - Mis-configured agent, or if you've fiddled OVS in a way that was
        unexpected

So, on the above basis, I'm tempted to close your bug as a duplicate (of
118253) unless you can consistently reproduce this issue.


Can you try to restart all services in a systematic manner (using 
`openstack-service`) all services and see if you can *still* reproduce?


A gentle note: Next time, please add actual error log 
messages/tracebacks in the bug instead of only tar.gz files.

Comment 2 Tzach Shefi 2015-02-08 14:43:10 UTC
With help of Neutron QE guys Itzikb & Oblaut, we are able to resolve issue. 

Not a duplicate bug as in your case we are talking about a single instance being effected here we are talking about the while nova-compute service failing. 

For details see LP, we were able to reproduce issue on another deployment: 
https://bugs.launchpad.net/nova/+bug/1419452

Comment 4 Eoghan Glynn 2015-03-25 12:54:50 UTC
The upstream fix for:

  https://bugs.launchpad.net/nova/+bug/1324041

is already in the 2014.1.4 stable release, so will be pulled via the rebase BZ 1199106.

Comment 5 Eoghan Glynn 2015-03-26 10:40:33 UTC
Picked up in 2014.1.4 rebase.

Comment 8 nlevinki 2015-04-15 12:35:52 UTC
Did the steps as written in https://bugs.launchpad.net/nova/+bug/1419452
===============
1. Launch an instance.
2. Stop openvswitch agent on the compute node
3. Attach another interface to the instance using a second network
    # nova interface-attach --net-id <net-id> <server>

4. Restart nova-compute

vm came up with the a new interface connected to the other network

Comment 10 errata-xmlrpc 2015-04-16 14:35:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0843.html


Note You need to log in before you can comment on or make changes to this bug.