Bug 1277171

Summary: NM asserts when setting ipv6 shared method just after connection creation
Product: Red Hat Enterprise Linux 7 Reporter: Vladimir Benes <vbenes>
Component: NetworkManagerAssignee: Lubomir Rintel <lrintel>
Status: CLOSED UPSTREAM QA Contact: Desktop QE <desktop-qa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.3CC: aloughla, bgalvani, dcbw, lrintel, rkhan, thaller
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-01-06 11:28:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
[PATCH] don't use g_return_if_fail() for non-programming error case
none
[PATCH] don't use g_return_if_fail() for non-programming error case
none
patch from jk/ipv6-shared-SIGTRAP-rh1277171 none

Description Vladimir Benes 2015-11-02 14:49:57 UTC
Description of problem:
during test case:
    * Add connection type "ethernet" named "ethie" for device "eth1"
    * Execute "nmcli con modify ethie ipv4.method disabled ipv6.method shared"
    Then "Sharing IPv6 connections is not supported yet" is visible with command "nmcli connection up id ethie"

I get:
[ 5606.046412] traps: NetworkManager[13029] trap int3 ip:7f77c481c8c3 sp:7ffe63d48d50 error:0

when autoconnect no is added it's not happening so it's some race in between upping original connection and modified with shared ipv6 that's unsupported.

Version-Release number of selected component (if applicable):
NetworkManager-1.0.6-27.el7.x86_64

Comment 3 Jirka Klimes 2015-11-02 15:20:36 UTC
Backtrace from the testing machine:

NetworkManager[16397]: <info>  dhclient started with pid 16522
NetworkManager[16397]: <info>  (eth1): DHCPv4 client pid 16522 killed by signal 15
NetworkManager[16397]: <info>  (eth1): DHCPv4 state changed unknown -> fail
NetworkManager[16397]: <info>  (eth1): canceled DHCP transaction
NetworkManager[16397]: <info>  (eth1): DHCPv4 state changed fail -> done
NetworkManager[16397]: linklocal6_complete: assertion 'FALSE' failed

Program received signal SIGTRAP, Trace/breakpoint trap.
g_logv (log_domain=0x5555556ae4c0 "NetworkManager", log_level=G_LOG_LEVEL_CRITICAL, format=<optimized out>, args=args@entry=0x7fffffffddb0) at gmessages.c:1046
1046		  g_private_set (&g_log_depth, GUINT_TO_POINTER (depth));
(gdb) bt
#0  g_logv (log_domain=0x5555556ae4c0 "NetworkManager", log_level=G_LOG_LEVEL_CRITICAL, format=<optimized out>, args=args@entry=0x7fffffffddb0) at gmessages.c:1046
#1  0x00007ffff4a4ea3f in g_log (log_domain=<optimized out>, log_level=<optimized out>, format=<optimized out>) at gmessages.c:1079
#2  0x00005555555b4f11 in queued_ip6_config_change (user_data=<optimized out>) at devices/nm-device.c:7476
#3  0x00007ffff4a4779a in g_main_dispatch (context=0x55555593e400) at gmain.c:3109
#4  g_main_context_dispatch (context=context@entry=0x55555593e400) at gmain.c:3708
#5  0x00007ffff4a47ae8 in g_main_context_iterate (context=0x55555593e400, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at gmain.c:3779
#6  0x00007ffff4a47dba in g_main_loop_run (loop=0x55555593e4c0) at gmain.c:3973
#7  0x00005555555970a3 in main (argc=1, argv=0x7fffffffe198) at main.c:512
(gdb) 

It seems to be a bit misleading.
Actually
  g_return_if_fail (FALSE); in linklocal6_complete() was hit because method was "shared".

Comment 4 Jirka Klimes 2015-11-02 16:12:03 UTC
Created attachment 1088627 [details]
[PATCH] don't use g_return_if_fail() for non-programming error case

Well, NetworkManager terminated because of setting G_DEBUG to fatal-warnings on the machine. But g_return_if_fail() should not be used in this case, IMO.

Comment 5 Jirka Klimes 2015-11-02 16:18:34 UTC
Created attachment 1088628 [details]
[PATCH] don't use g_return_if_fail() for non-programming error case

Correction of the patch

Comment 7 Beniamino Galvani 2015-11-03 09:12:08 UTC
I think that the original code was right, if the IPv6 method is
'shared', link-local configuration should not have been started
and if we reach that line there's a bug somewhere.

I suspect that the root cause is that we are reading the ipv6.method
from the settings_connection (which can change on the fly). Instead we
should read it from the applied_connection?

Comment 8 Thomas Haller 2015-11-03 10:45:34 UTC
(In reply to Beniamino Galvani from comment #7)
> I think that the original code was right, if the IPv6 method is
> 'shared', link-local configuration should not have been started
> and if we reach that line there's a bug somewhere.

I tend to agree with Beniamino.



> I suspect that the root cause is that we are reading the ipv6.method
> from the settings_connection (which can change on the fly). Instead we
> should read it from the applied_connection?

Vladimir testes RHEL-7.2/nm-1-0. Applied-connection split is only on master. But yes, let's investigate master too.




The commit message reads:

    The g_return family of macros (g_return_if_fail()) should only be used for
    programming errors. And this is not the case here. Turning on GLib debugging
    would terminate NetworkManager.

it is well understood that g_return_if_fail() is not supposed to affect bug-free control-flow. I added it there, because I wanted to assert that this condition doesn't happen. The commit message better should explain, why the assertion is wrong at this place.

Dropping the assertion entirely is wrong. If you really expect shared there, then let's do:

     } else if (strcmp (method, NM_SETTING_IP6_CONFIG_METHOD_LINK_LOCAL) == 0)
          nm_device_activate_schedule_ip6_config_result (self);
+    } else if (strcmp (method, NM_SETTING_IP6_CONFIG_METHOD_SHARED) == 0) {
+         /* handle it somehow?? */
+    } else
+         g_return_if_reached ();



(note that g_return_if_reached() is better then g_return_if_fail(FALSE). I didn't get that originally).

Comment 10 Dan Williams 2015-11-06 22:57:09 UTC
(In reply to Beniamino Galvani from comment #7)
> I think that the original code was right, if the IPv6 method is
> 'shared', link-local configuration should not have been started
> and if we reach that line there's a bug somewhere.

I think in this case, the original IPv6 method was 'auto' or 'link-local' and so LL was started, but then before LL finished it got changed to 'shared'.  So I can see why the linklocal6_complete() code would be executed.

I agree with the g_return_if_reached() though.

In the end, this is an NM bug that a user can trigger and not really their fault, so we should handle it by not crashing at least.  Maybe if the method isn't recognized from linklocal6_complete() we should fail the activation?

Comment 11 Jirka Klimes 2015-11-09 16:38:42 UTC
Pushed the updated patch to jk/ipv6-shared-SIGTRAP-rh1277171

Comment 13 Dan Williams 2015-11-10 16:29:10 UTC
jk/ipv6-shared-SIGTRAP-rh1277171 LGTM

Comment 14 Dan Williams 2016-01-05 00:10:35 UTC
jk/ipv6-shared-SIGTRAP-rh1277171 has the patch; just need to merge to master and backport?

Comment 15 Lubomir Rintel 2016-01-06 11:28:32 UTC
This is a crash due to a known race when modifying the settings connection during activation.

Fixed in master by splitting settings and applied connections.

Comment 16 Thomas Haller 2016-01-06 13:28:02 UTC
Created attachment 1112146 [details]
patch from jk/ipv6-shared-SIGTRAP-rh1277171

I attach the patch from jk/ipv6-shared-SIGTRAP-rh1277171 here for reference.

I think it should be rejected because:

- the crash happened due to a "g_return_if_fail (FALSE)". This kind of assertion is already forgiving to the user which does not cause a real crash (unless user configured fatal-criticals).
- the patch doesn't fix the real issue, it just silences a non-fatal, correct assertion.