RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1451236 - [abrt] [faf] NetworkManager: g_logv(): /usr/sbin/NetworkManager killed by 5
Summary: [abrt] [faf] NetworkManager: g_logv(): /usr/sbin/NetworkManager killed by 5
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: NetworkManager
Version: 7.3
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: ---
Assignee: Beniamino Galvani
QA Contact: Desktop QE
URL: http://faf.lab.eng.brq.redhat.com/faf...
Whiteboard:
: 1451306 1492660 1493140 1495112 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-05-16 08:13 UTC by Beniamino Galvani
Modified: 2021-03-11 15:13 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-04-10 13:22:08 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
[PATCH] core: don't close input fd in nm_utils_fd_get_contents() (851 bytes, patch)
2017-10-17 09:23 UTC, Beniamino Galvani
no flags Details | Diff
[PATCH v2] core: don't close input fd in nm_utils_fd_get_contents() (939 bytes, patch)
2017-10-17 09:50 UTC, Beniamino Galvani
no flags Details | Diff
[PATCH v3] core: don't close input fd in nm_utils_fd_get_contents() (1.10 KB, patch)
2017-10-17 12:33 UTC, Beniamino Galvani
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:0778 0 None None None 2018-04-10 13:23:39 UTC

Description Beniamino Galvani 2017-05-16 08:13:41 UTC
This bug has been created based on an anonymous crash report requested by the package maintainer.

Report URL: http://faf.lab.eng.brq.redhat.com/faf/reports/bthash/babb88d25ffb46540bc0d5efa5454d6c8d901bc2/

 GLib-GIO-WARNING **: Error setting close-on-exec flag on incoming fd: Bad file descriptor

 #0  g_logv
 #1  g_log
 #2  g_unix_fd_message_deserialize
 #3  g_socket_control_message_deserialize
 #4  g_socket_receive_message
 #5  _g_socket_read_with_control_messages_ready
 #6  _g_dbus_worker_do_read_unlocked
 #7  _g_dbus_worker_do_read_unlocked
 #8  _g_dbus_worker_do_read_cb
 #9  g_task_return_now
 #10 complete_in_idle_cb
 #11 g_main_context_dispatch
 #12 g_main_context_dispatch
 #13 g_main_context_iterate
 #14 g_main_loop_run
 #15 gdbus_shared_thread_func
 #16 g_thread_proxy
 #17 start_thread
 #18 clone ()

Comment 2 Thomas Haller 2017-05-16 08:21:02 UTC
I think it's of similar nature as bug 1449833.

Comment 3 Thomas Haller 2017-05-16 13:54:03 UTC
Thread 1 (Thread 0x7f53fe79b700 (LWP 1106)):
#0  0x00007f54088de643 in g_logv (log_domain=0x7f5408f1b358 "GLib-GIO", log_level=G_LOG_LEVEL_WARNING, format=<optimized out>, args=args@entry=0x7f53fe799f70) at gmessages.c:1086
        domain = 0x0
        data = 0x0
        depth = 0
        log_func = 0x7f54088ddde0 <g_log_default_handler>
        domain_fatal_mask = <optimized out>
        masquerade_fatal = <optimized out>
        test_level = <optimized out>
        was_fatal = <optimized out>
        was_recursion = <optimized out>
        msg = 0x7f53f4007ea0 "Error setting close-on-exec flag on incoming fd: Bad file descriptor"
        msg_alloc = 0x7f53f4007ea0 "Error setting close-on-exec flag on incoming fd: Bad file descriptor"
        i = 4
#1  0x00007f54088de7bf in g_log (log_domain=log_domain@entry=0x7f5408f1b358 "GLib-GIO", log_level=log_level@entry=G_LOG_LEVEL_WARNING, format=format@entry=0x7f5408f33880 "Error setting close-on-exec flag on incoming fd: %s") at gmessages.c:1119
        args = {{gp_offset = 32, fp_offset = 48, overflow_arg_area = 0x7f53fe79a050, reg_save_area = 0x7f53fe799f90}}
#2  0x00007f5408eb3ce2 in g_unix_fd_message_deserialize (level=<optimized out>, type=<optimized out>, size=<optimized out>, data=0x7f53fe79a100) at gunixfdmessage.c:119
        message = <optimized out>
        list = <optimized out>
        n = 1
        s = <optimized out>
        i = <optimized out>
        fds = 0x7f53fe79a100
#3  0x00007f5408e942ce in g_socket_control_message_deserialize (level=1, type=1, size=4, data=0x7f53fe79a100) at gsocketcontrolmessage.c:196
        class = 0x7f53f4005fd0
        message = 0x0
        n_message_types = 2
        i = 1
#4  0x00007f5408e8e45a in g_socket_receive_message (socket=socket@entry=0x7f53f0007130 [GSocket], address=address@entry=0x0, vectors=<optimized out>, vectors@entry=0x7f53fe79aab0, num_vectors=<optimized out>, num_vectors@entry=1, messages=0x7f53f000be90, num_messages=0x7f53f000be98, flags=flags@entry=0x0, cancellable=0x7f53f0001f70 [GCancellable], error=error@entry=0x7f53fe79aaa8) at gsocket.c:4588
        message = <optimized out>
        my_messages = 0x0
        cmsg = 0x7f53fe79a0f0
        msg = {msg_name = 0x0, msg_namelen = 0, msg_iov = 0x7f53fe79aab0, msg_iovlen = 1, msg_control = 0x7f53fe79a0f0, msg_controllen = 24, msg_flags = 1073741824}
        result = <optimized out>
        one_sockaddr = {ss_family = 35344, __ss_align = 139998552661504, __ss_padding = "\000\000\000\000\000\000\000\000\264\300\275\bT\177\000\000pe\000\360S\177\000\000\230Q\215\bT\177", '\000' <repeats 11 times>, "\301\372rk\236E\365\002\000\000\000\000\000\000\000\332j\214\bT\177\000\000\020\212\357\bT\177\000\000\000\276\000\360S\177\000\000\000\f\000\364S\177\000\000\360\353\000\360S\177\000\000P4\374\rT\177\000\000`\305\000\360S\177\000"}
        one_vector = {buffer = 0x1, size = 0}
        one_byte = 0 '\000'
        __FUNCTION__ = "g_socket_receive_message"
#5  0x00007f5408ef6b9e in _g_socket_read_with_control_messages_ready (socket=socket@entry=0x7f53f0007130 [GSocket], condition=condition@entry=G_IO_IN, user_data=user_data@entry=0x7f53f4003200) at gdbusprivate.c:128
        task = 0x7f53f4003200 [GTask]
        data = 0x7f540de520e0
        error = 0x0
        result = <optimized out>
        vector = {buffer = 0x7f53f4001e00, size = 16}
        __FUNCTION__ = "_g_socket_read_with_control_messages_ready"
#6  0x00007f5408ef71bf in _g_dbus_worker_do_read_unlocked (callback=0x7f5408ef8a10 <_g_dbus_worker_do_read_cb>, io_priority=0, user_data=<optimized out>, cancellable=0x7f53f0001f70 [GCancellable], num_messages=<optimized out>, messages=<optimized out>, count=16, buffer=0x7f53f4001e00, socket=0x7f53f0007130 [GSocket]) at gdbusprivate.c:180
        task = 0x7f53f4003200 [GTask]
        data = 0x7f540de520e0
        source = <optimized out>
#7  0x00007f5408ef71bf in _g_dbus_worker_do_read_unlocked (worker=<optimized out>) at gdbusprivate.c:828
#8  0x00007f5408ef8ec2 in _g_dbus_worker_do_read_cb (input_stream=<optimized out>, res=<optimized out>, user_data=0x7f53f000be00) at gdbusprivate.c:774
        message = 0x7f540dfc3450
        worker = 0x7f53f000be00
        error = 0x0
        bytes_read = 206
#9  0x00007f5408e9af93 in g_task_return_now (task=0x7f53f40032d0 [GTask]) at gtask.c:1106
#10 0x00007f5408e9afc9 in complete_in_idle_cb (task=0x7f53f40032d0) at gtask.c:1120
#11 0x00007f54088d7d7a in g_main_context_dispatch (context=0x7f53f000c560) at gmain.c:3152
        dispatch = 0x7f54088d4a00 <g_idle_dispatch>
        prev_source = 0x0
        was_in_call = 0
        user_data = 0x7f53f40032d0
        callback = 0x7f5408e9afc0 <complete_in_idle_cb>
        cb_funcs = 0x7f5408bc38c0 <g_source_callback_funcs>
        cb_data = 0x7f53f4002ee0
        need_destroy = <optimized out>
        source = 0x7f53f4003eb0
        current = 0x7f53f000c380
        i = 0
#12 0x00007f54088d7d7a in g_main_context_dispatch (context=context@entry=0x7f53f000c560) at gmain.c:3767
#13 0x00007f54088d80b8 in g_main_context_iterate (context=0x7f53f000c560, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at gmain.c:3838
        max_priority = 0
        timeout = 0
        some_ready = 1
        nfds = <optimized out>
        allocated_nfds = 3
        fds = 0x7f53f40010c0
#14 0x00007f54088d838a in g_main_loop_run (loop=0x7f53f000e820) at gmain.c:4032
        __FUNCTION__ = "g_main_loop_run"
#15 0x00007f5408ef6c66 in gdbus_shared_thread_func (user_data=0x7f53f000c530) at gdbusprivate.c:246
        data = 0x7f53f000c530
#16 0x00007f54088fd0f5 in g_thread_proxy (data=0x7f540dde2ed0) at gthread.c:778
        thread = 0x7f540dde2ed0
#17 0x00007f540a1e3dc5 in start_thread (arg=0x7f53fe79b700) at pthread_create.c:308
        __res = <optimized out>
        pd = 0x7f53fe79b700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139998728402688, 5735798018442000178, 0, 139998728403392, 139998728402688, 139998485595440, -5675000504269146318, -5678562126826440910}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>

Comment 4 Thomas Haller 2017-05-16 15:02:06 UTC
*** Bug 1451306 has been marked as a duplicate of this bug. ***

Comment 5 Beniamino Galvani 2017-06-07 12:35:24 UTC
From analysis of http://faf.lab.eng.brq.redhat.com/faf/reports/2955/ :

 (gdb) frame 3
 #3  0x00007f9376833f32 in g_unix_fd_message_deserialize (level=<optimized out>,        type=<optimized out>, size=<optimized out>, data=0x7f9368f3b1f0) at gunixfdmessage.c:119

 (gdb) p fds[0]
 $x = 17

 (gdb) frame 6
 #6  0x00007f93767bef36 in g_socket_receive_message_with_timeout (socket=socket@entry=0x7f935c008130, address=address@entry=0x0,        vectors=<optimized out>, vectors@entry=0x7f9368f3bb60,
     num_vectors=<optimized out>, num_vectors@entry=1, messages=0x7f935c00e6b0, num_messages=0x7f935c00e6b8, flags=flags@entry=0x0, timeout=0, cancellable=cancellable@entry=0x7f935c0023a0,
     error=error@entry=0x7f9368f3bb58) at gsocket.c:4845

 (gdb) p *socket->priv
 $x = {family = G_SOCKET_FAMILY_UNIX, type = G_SOCKET_TYPE_STREAM, protocol = G_SOCKET_PROTOCOL_DEFAULT, fd = 8, listen_backlog = 10, timeout = 0, construct_error = 0x0, remote_address = 0x557854756ef0,
       inited = 1, blocking = 0, keepalive = 0, closed = 0, connected_read = 1, connected_write = 1,    listening = 0, timed_out = 0, connect_pending = 0, recv_addr_cache = {{addr = 0x0, native = 0x0,
       native_len = 0, last_used = 0}, {addr = 0x0, native = 0x0, native_len = 0, last_used = 0}, {addr = 0x0, native = 0x0, native_len = 0, last_used = 0}, {addr = 0x0, native = 0x0, native_len = 0,
       last_used        = 0}, {addr = 0x0, native = 0x0, native_len = 0, last_used = 0}, {addr = 0x0, native = 0x0, native_len = 0, last_used = 0}, {addr = 0x0, native = 0x0, native_len = 0, last_used = 0}, {
       addr = 0x0, native = 0x0,        native_len = 0, last_used = 0}}}

 (gdb) p *((GUnixSocketAddress *)socket->priv->remote_address)->priv
 $x = {path = "/var/run/dbus/system_bus_socket", '\000' <repeats 76 times>, path_len = 31, address_type = G_UNIX_SOCKET_ADDRESS_PATH}

 $ grep ^17: open_fds
 17:/proc/sys/net/ipv6/conf/eth6/forwarding

 (gdb) thread apply 4 bt
 Thread 4 (Thread 0x7f93782d1940 (LWP 19578)):
 #0  send () from /lib64/libc.so.6
 #1  __vsyslog_chk () from /lib64/libc.so.6
 #2  __syslog_chk () from /lib64/libc.so.6
 #3  syslog (__fmt=0x557852a24914 "%s%-7s [%ld.%04ld] %s", __pri=<optimized out>) at /usr/include/bits/syslog.h:31
 #4  _nm_log_impl (file=file@entry=0x557852a238f8 "src/platform/nm-linux-platform.c", line=line@entry=2643, func=func@entry=0x0, level=level@entry=LOGL_DEBUG, domain=domain@entry=LOGD_PLATFORM,
     error=error@entry=0, ifname=ifname@entry=0x0, conn_uuid=conn_uuid@entry=0x0, fmt=fmt@entry=0x557852a23c40 "%s: sysctl: setting '%s' to '%s' (current value is identical)") at src/nm-logging.c:725
 #5  _log_dbg_sysctl_set_impl (value=0x557852a46f85 "0", path=0x557852cb15c0 <path.32401> "/proc/sys/net/ipv6/conf/eth6/forwarding", dirfd=<optimized out>,
     pathid=0x557852cb15c0 <path.32401> "/proc/sys/net/ipv6/conf/eth6/forwarding", platform=0x557854733850) at src/platform/nm-linux-platform.c:2643
 #6  sysctl_set (platform=0x557854733850, pathid=0x557852cb15c0 <path.32401> "/proc/sys/net/ipv6/conf/eth6/forwarding", dirfd=<optimized out>,
     path=0x557852cb15c0 <path.32401> "/proc/sys/net/ipv6/conf/eth6/forwarding", value=0x557852a46f85 "0") at src/platform/nm-linux-platform.c:2712
 #7  _set_state_full (self=self@entry=0x55785484ecf0, state=state@entry=NM_DEVICE_STATE_DISCONNECTED, reason=<optimized out>, quitting=quitting@entry=0) at src/devices/nm-device.c:12572
 #8  nm_device_state_changed (self=self@entry=0x55785484ecf0, state=state@entry=NM_DEVICE_STATE_DISCONNECTED, reason=<optimized out>) at src/devices/nm-device.c:12772
 #9  queued_state_set (user_data=user_data@entry=0x55785484ecf0) at src/devices/nm-device.c:12798


The process receives fd 17 as ancillary data in a message from the
D-Bus system socket, but when it tries to set FD_CLOEXEC on it, the
file descriptor is invalid. At the same time another thread opens
"/proc/sys/net/ipv6/conf/eth6/forwarding" and obtains the same file
descriptor (17).

It seems to me that there is something wrong in the lower layers if we
receive an invalid fd over D-Bus.

Comment 6 Beniamino Galvani 2017-06-08 14:27:07 UTC
I think this is related to the systemd sleep monitor, which is the only place in NM where we call a D-Bus method returning a Unix fd. Indeed, few lines above the crash:

<debug> [1495478072.2589] sleep-monitor-sd: inhibit: taking sleep inhibitor...

Comment 7 Beniamino Galvani 2017-06-09 09:49:23 UTC
I'm reassigning this to systemd, not because I believe the bug is in
systemd, but only to have someone more familiar with the inhibitor API
look at the bug and try to identify which component (dbus-daemon,
kernel, glib, ...) is causing the problem.

What happens is that when NM calls the D-Bus "Inhibit" method of
logind, the returned fd is invalid and GLib throws an assertion just
after deserializing the message when it tries to set the FD_CLOEXEC on
the file descriptor. We have multiple reports (with core dumps) of the
same issue.

Feel free to reassign to other components or back to NM if you think
we may be doing something wrong. Thanks!

Comment 9 Michal Sekletar 2017-08-22 11:56:19 UTC
I have discussed this issue with David Herrmann (as I was suspecting this is due to some DBus bug) and David is working a lot with DBus these days.

However, as of now we think this is NM bug. Most likely close() was called on the fd in one thread, while other thread receives fd via unix socket and then tries to set O_CLOEXEC on fd that was closed by first thread,

Possible parallel execution, leading to bug,

Thread #1:
   Read gdbus
   get FD 17
   <pause 1>
   cloexec on 17
   <pause 2>
   continue

Thread #2:
  <during pause 1>
  close(17) /* for *whatever* reason */
  <during pause 2>
  open /proc/...  and get 17 assigned

Comment 10 Beniamino Galvani 2017-09-13 08:41:21 UTC
This hasn't happened anymore since CI machines started using RHEL 7.4, so I guess the bug was fixed implicitly, maybe when the glib version was updated.

I'll close this if the problem doesn't appear again in few weeks.

Comment 11 Beniamino Galvani 2017-09-18 12:58:46 UTC
*** Bug 1492660 has been marked as a duplicate of this bug. ***

Comment 12 Beniamino Galvani 2017-09-18 13:00:47 UTC
(In reply to Beniamino Galvani from comment #10)
> This hasn't happened anymore since CI machines started using RHEL 7.4, so I
> guess the bug was fixed implicitly, maybe when the glib version was updated.

I take it back, it happened again. See details in bug 1492660.

Comment 13 Beniamino Galvani 2017-09-19 14:10:08 UTC
*** Bug 1493140 has been marked as a duplicate of this bug. ***

Comment 14 Beniamino Galvani 2017-09-25 20:17:16 UTC
*** Bug 1495112 has been marked as a duplicate of this bug. ***

Comment 15 Vladimir Benes 2017-10-16 22:06:50 UTC
(In reply to Beniamino Galvani from comment #12)
> (In reply to Beniamino Galvani from comment #10)
> > This hasn't happened anymore since CI machines started using RHEL 7.4, so I
> > guess the bug was fixed implicitly, maybe when the glib version was updated.
> 
> I take it back, it happened again. See details in bug 1492660.

I can reproduce this crash with https://github.com/NetworkManager/NetworkManager-ci/blob/master/nmcli/features/bond.feature#L459 running in cycle. Sooner or latter there is a crash (in less than 20 cycles). It has to run with real devices. With veth interfaces it does happen very very rarely.

Could you please create a scratch build again so I can reverify?

Comment 16 Beniamino Galvani 2017-10-17 09:23:21 UTC
Created attachment 1339634 [details]
[PATCH] core: don't close input fd in nm_utils_fd_get_contents()

Comment 18 Beniamino Galvani 2017-10-17 09:35:43 UTC
There was a double close(); usually the second close() would simply fail:

 open("/proc/sys/net/ipv6/conf/eth4/accept_ra", O_RDONLY|O_CLOEXEC) = 15
 fcntl(15, F_GETFL) = 0x8000 (flags O_RDONLY|O_LARGEFILE)
 close(15)       = 0
 close(15)       = -1 EBADF (Bad file descriptor)

but sometimes it closed somebody else's file descriptor:

 open("/proc/sys/net/ipv6/conf/eth1/accept_ra_defrtr", O_WRONLY|O_TRUNC|O_CLOEXEC) = 16
 open("/proc/sys/net/ipv6/conf/eth1/accept_ra_defrtr", O_RDONLY|O_CLOEXEC) = 17
 fcntl(17, F_GETFL) = 0x8000 (flags O_RDONLY|O_LARGEFILE)
 close(17)       = 0
 # ... receive fd 17 via D-Bus ...
 close(17)       = 0
 # try to set close-on-exec on the new fd
 fcntl(17, F_SETFD, FD_CLOEXEC) = -1 EBADF (Bad file descriptor)

Comment 19 Beniamino Galvani 2017-10-17 09:50:54 UTC
Created attachment 1339638 [details]
[PATCH v2] core: don't close input fd in nm_utils_fd_get_contents()

Comment 20 Beniamino Galvani 2017-10-17 12:33:32 UTC
Created attachment 1339681 [details]
[PATCH v3] core: don't close input fd in nm_utils_fd_get_contents()

v3: don't leak fd on fdopen() error.

Comment 21 Thomas Haller 2017-10-18 12:07:07 UTC
(In reply to Beniamino Galvani from comment #20)
> Created attachment 1339681 [details]
> [PATCH v3] core: don't close input fd in nm_utils_fd_get_contents()
> 
> v3: don't leak fd on fdopen() error.

lgtm

Comment 22 Beniamino Galvani 2017-10-19 07:12:17 UTC
Applied to master:

https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=597072296a19be051ea70b017fd19505ebee6530

and backported to nm-1-8, nm-1-6 branches.

Comment 26 errata-xmlrpc 2018-04-10 13:22:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0778


Note You need to log in before you can comment on or make changes to this bug.