Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1587880

Summary: [RFE] 0ac0102 netdev: check for NULL fields in netdev_get_addrs is insufficient that won't cover all of possible situation. Need workaround urgently
Product: Red Hat OpenStack Reporter: Masaki Furuta ( RH ) <mfuruta>
Component: openvswitchAssignee: Assaf Muller <amuller>
Status: CLOSED CURRENTRELEASE QA Contact: Ofer Blaut <oblaut>
Severity: high Docs Contact:
Priority: high    
Version: 10.0 (Newton)CC: amuller, apevec, chrisw, dalvarez, fleitner, jbenc, knoha, majopela, mfuruta, ragiman, rhos-maint, rkhan, srevivo, tredaelli
Target Milestone: zstreamKeywords: FutureFeature, TestOnly, ZStream
Target Release: 10.0 (Newton)   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: openvswitch-2.9.0-62.el7fdn Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-03-01 11:39:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
untested more complete workaround none

Description Masaki Furuta ( RH ) 2018-06-06 08:52:10 UTC
Description of problem:

  Customer saw similar issue on https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335769.html / https://sourceware.org/bugzilla/show_bug.cgi?id=21812 and looks it'd been fixed partially from openvswitch side on Bug 1473735 and still not resolved from glibc side on Bug 1472832.
  
  Customers claim that this patch has the following problems.
  
    - When openvswitch-2.6.1-13.git20161206.el7ost.x86_64 is applied, error processing is not performed even if the value of ifa_name, ifa_netmask is Null.
    - For this reason, I think that there is a problem that there is a deviation between the information managed on Open vSwitch and the actual situation. (Such as assuming an abnormal situation that actually is an empty value)
  
  In order to resolve this inconsistency, we are considering periodically executing the following command.
  
  I would like to receive information on the following points on this countermeasure.
    1. Whether the above inconsistency is resolved by this countermeasure
    2. Are there other workarounds
    3. Are there any risks when applied?
  
  Custmer is considering the following workaround,
  
  Purpose of workaround
  ---------------------------------
  - Since Open vSwitch rebuilds the routing table cache when calling route_table_reset (), it is possible that inconsistency can be resolved by periodically executing this.
  
  Specific measures
  ------------------
  - Create a dummy device and carry out it once every five minutes in order to think that rebuilding the cache of the routing table will be done if it is started / stopped.
  
  1. Preparation
  
    # modprobe dummy
    (Dummy 0 device will be created automatically)
  
  2. Cache reconstruction of routing table
  
    # ip link set dummy 0 up
    Or (if dummy 0 has already started)
    # ip link set dummy 0 down
 

Version-Release number of selected component (if applicable):

  - Red Hat OpenStack Platform 10 (and above)

How reproducible:

  - Always 

Steps to Reproduce:

  - See Description of Problem section.

Comment 15 Jiri Benc 2018-06-19 08:22:44 UTC
Created attachment 1452839 [details]
untested more complete workaround

The customer is right that the workaround is incomplete.

But instead of forcing a rebuild of route cache periodically just to delete the cached addresses, I'd like to propose a more complete workaround of the glibc bug.

If ifa_name is NULL in the returned ifaddrs list, this means we've hit the glibc bug and that the results we've got are inconsistent. Instead of ignoring the inconsistent entry, we should just retry getifaddrs. This whole loop should be guarded by a counter and we should back up with a warning and continue with the last inconsistent dump if we don't get a consistent dump in a reasonable number of iterations.

Daniel, what do you think? See the attached patch.

Comment 16 Jiri Benc 2018-06-19 08:24:11 UTC
(Note that the "with a warning" part is not in the patch. It probably should be.)

Comment 18 Daniel Alvarez Sanchez 2018-07-05 09:17:59 UTC
Hi Jiri,
Your patch makes sense and makes the workaround more robust by retrying it 3 times.
Thanks!

The glibc patch got merged:
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=c1f86a33ca32e26a9d6e29fc961e5ecb5e2e5eb4

This one should be the proper fix to the issue.

Comment 19 Jiri Benc 2018-07-09 08:41:43 UTC
Daniel, do you plan to submit it upstream, preferably with the warning part added?

Comment 20 Daniel Alvarez Sanchez 2018-07-09 08:54:02 UTC
(In reply to Jiri Benc from comment #19)
> Daniel, do you plan to submit it upstream, preferably with the warning part
> added?

I thought you were going to do it as it's your patch! If you want me to I can do it and add you as signer but also if you do it I can ACK it... Let me know how you want to move forward :)
thanks!

Comment 21 Jiri Benc 2018-07-09 09:40:45 UTC
It would be great if you could submit it (and finish it first, I guess :-)). Thanks!

Comment 26 Daniel Alvarez Sanchez 2018-08-18 15:21:42 UTC
Patch is merged in OVS master and 2.10 branches upstream.
https://github.com/openvswitch/ovs/commit/3434d306866d825084d2d186d1f8dd98662ff650

I'll reassign this to Jiri/OVS team for downstream backport.
Also, note that this patch shouldn't be needed for glibc >= 2.28.

Comment 32 Lon Hohberger 2019-01-17 11:34:00 UTC
According to our records, this should be resolved by openvswitch-2.9.0-83.el7fdp.1.  This build is available now.