Bug 1255092 - "Docker build" crashes wireless as soon as Dockerfile tries to access network
Summary: "Docker build" crashes wireless as soon as Dockerfile tries to access network
Keywords:
Status: CLOSED DUPLICATE of bug 1253949
Alias: None
Product: Fedora
Classification: Fedora
Component: NetworkManager
Version: 22
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Lubomir Rintel
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-08-19 15:01 UTC by Robert P. J. Day
Modified: 2015-08-20 14:21 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-08-20 14:21:06 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
setroubleshoot output (2.23 KB, text/plain)
2015-08-19 15:01 UTC, Robert P. J. Day
no flags Details

Description Robert P. J. Day 2015-08-19 15:01:42 UTC
Created attachment 1064913 [details]
setroubleshoot output

Scenario:

* Fully-updated Fedora 22 ASUS laptop, using "updates-testing", so Docker is currently at 1.8.1.

Trying to do a "docker build" using Dockerfile whose first few lines are:

  FROM ubuntu:14.04
  MAINTAINER Robert P. J. Day
  ENV REFRESHED_AT 2015-08-18

  RUN apt-get -y -q update && apt-get -y -q install nginx
  ... snip ...

Symptom: The instant the "docker build" tries to execute the "RUN apt-get" instruction, wireless crashes on the laptop. This is entirely reproducible, and always coincides with the build operation attempting network access for the first time.

  During one of the incidents, I got a SELinux report that I attached as "setroubleshoot." I followed the advice there, running:

# grep abrt-hook-ccpp /var/log/audit/audit.log | audit2allow -M mypol
# semodule -i mypol.pp

but that made no apparent difference. I also disabled SELinux entirely, but no difference.

  I captured several hundred lines of "journalctl" output and posted them at http://pastebin.com/vD6fCwgP, where you can see the docker build command being run in line 1. I'm open to suggestions.

Comment 1 Daniel Walsh 2015-08-19 17:45:10 UTC
Don't really think this is a docker issue.  Reassiging.

Comment 2 Robert P. J. Day 2015-08-19 17:56:12 UTC
I didn't think it was, but I didn't know where else to put it. Let me know if there's any further testing you want me to do.

Comment 3 Dan Williams 2015-08-19 19:29:05 UTC
NetworkManager-1.0.6-0.1.20150813git7e2caa2.fc22

Aug 19 10:45:40 localhost.localdomain NetworkManager[4138]: <info>  (docker0): device state change: disconnected -> prepare (reason 'none') [30 40 0]
Aug 19 10:45:40 localhost.localdomain NetworkManager[4138]: <info>  (veth3891250): device state change: disconnected -> prepare (reason 'none') [30 40 0]
Aug 19 10:45:40 localhost.localdomain NetworkManager[4138]: <warn>  (vethb10fd05): failed to disable userspace IPv6LL address handling
Aug 19 10:45:40 localhost.localdomain audit[4138]: <audit-1701> auid=4294967295 uid=0 gid=0 ses=4294967295 pid=4138 comm="NetworkManager" exe="/usr/sbin/NetworkManager" sig=11
Aug 19 10:45:40 localhost.localdomain NetworkManager[4138]: (NetworkManager:4138): GLib-GObject-CRITICAL **: g_type_instance_get_private: assertion 'instance != NULL && instance->g_class != NULL' failed
Aug 19 10:45:40 localhost.localdomain kernel: NetworkManager[4138]: segfault at 18 ip 000055719fd2cd84 sp 00007ffe412605b0 error 4 in NetworkManager[55719fc69000+1cf000]
Aug 19 10:45:40 localhost.localdomain systemd[1]: NetworkManager.service: main process exited, code=dumped, status=11/SEGV
Aug 19 10:45:40 localhost.localdomain systemd[1]: Unit NetworkManager.service entered failed state.
Aug 19 10:45:40 localhost.localdomain systemd[1]: NetworkManager.service failed.

So yeah, NM is crashing, which obviously should not be happening.

Robert, could you get things to a working state, then:

sudo gdb attach `pidof NetworkManager`

then in that window type 'continue'.  Then reproduce the problem, and when NM crashes you should see the window in which you are running 'gdb' sitting at the (gdb) prompt.  Then type:

backtrace

and copy & paste the terminal output into this bug or as an attachment and we can see what the problem is.

Comment 4 Robert P. J. Day 2015-08-19 19:55:01 UTC
This is kind of weird since, this time, wireless seemed to stay up. But here's the output:

(gdb) continue
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x0000558c67848d84 in nm_ip4_config_get_num_addresses ()
(gdb) backtrace
#0  0x0000558c67848d84 in nm_ip4_config_get_num_addresses ()
#1  0x0000558c677dd32c in _update_ip4_address ()
#2  0x0000558c677e3508 in nm_device_set_ip4_config ()
#3  0x0000558c677eb6c6 in _cleanup_generic_post ()
#4  0x0000558c677ebc51 in dispose ()
#5  0x00007f804150ea5c in g_object_unref () at /lib64/libgobject-2.0.so.0
#6  0x0000558c6785973b in remove_device ()
#7  0x0000558c6785e07a in _platform_link_cb_idle ()
#8  0x00007f8041209a8a in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
#9  0x00007f8041209e20 in g_main_context_iterate.isra ()
    at /lib64/libglib-2.0.so.0
#10 0x00007f804120a142 in g_main_loop_run () at /lib64/libglib-2.0.so.0
#11 0x0000558c677c76df in main ()
(gdb) 

  I'm puzzled that host wireless wasn't blown away ... I'm going to try this again without the gdb session to see what happens, but feel free to start playing with this.

  Oh, wait ... I went to the gdb session and typed "quit" to get out, and that's when wireless was trashed.

Comment 5 Robert P. J. Day 2015-08-19 20:01:48 UTC
OK, I did it all again, just for comparison, here's the results of the second run:

(gdb) continue
Continuing.
[New Thread 0x7f98452c3700 (LWP 7494)]

Program received signal SIGSEGV, Segmentation fault.
0x000055afa2c32d84 in nm_ip4_config_get_num_addresses ()
(gdb) backtrace
#0  0x000055afa2c32d84 in nm_ip4_config_get_num_addresses ()
#1  0x000055afa2bc732c in _update_ip4_address ()
#2  0x000055afa2bcd508 in nm_device_set_ip4_config ()
#3  0x000055afa2bd56c6 in _cleanup_generic_post ()
#4  0x000055afa2bd5c51 in dispose ()
#5  0x00007f9856d72a5c in g_object_unref () at /lib64/libgobject-2.0.so.0
#6  0x000055afa2c4373b in remove_device ()
#7  0x000055afa2c4807a in _platform_link_cb_idle ()
#8  0x00007f9856a6da8a in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
#9  0x00007f9856a6de20 in g_main_context_iterate.isra () at /lib64/libglib-2.0.so.0
#10 0x00007f9856a6e142 in g_main_loop_run () at /lib64/libglib-2.0.so.0
#11 0x000055afa2bb16df in main ()
(gdb) 

The command ran to completion and wireless is still up. So I quit and ...

(gdb) quit
A debugging session is active.

	Inferior 1 [process 7171] will be detached.

Quit anyway? (y or n) y
Detaching from program: /usr/sbin/NetworkManager, process 7171

and that's when wireless disappeared. Again.

Comment 6 Robert P. J. Day 2015-08-20 11:02:29 UTC
Here's what I find interesting ... this is the Dockerfile I'm using as the basis for my build (first few lines), and you can see that the last line I provide here is the first one in the Dockerfile that requires network access (to update the ubuntu base image and install nginx):

FROM ubuntu:14.04
MAINTAINER Robert P. J. Day
ENV REFRESHED_AT 2015-08-18

RUN apt-get -y -q update && apt-get -y -q install nginx
... snip ...

  Now, when I do a regular "docker build" (with no debugging), the output is:

$ sudo docker build -t jamtur01/nginx .
Sending build context to Docker daemon 4.608 kB
Step 0 : FROM ubuntu:14.04
 ---> 8251da35e7a7
Step 1 : MAINTAINER Robert P. J. Day
 ---> Running in c1cc1acbc0bc
 ---> 95434d56c8a1
Removing intermediate container c1cc1acbc0bc
Step 2 : ENV REFRESHED_AT 2015-08-18
 ---> Running in acf68ed9cffd
 ---> 2b54c1852480
Removing intermediate container acf68ed9cffd
Step 3 : RUN apt-get -y -q update && apt-get -y -q install nginx
 ---> Running in ed194f8c36b6
Err http://archive.ubuntu.com trusty InRelease

BOOM! That's when wireless goes down on the host (causing the build to fail), and I have to manually bring wireless up again.

  However, when I use gdb as you instruct, when the build hits that Dockerfile instruction, the machine becomes temporarily unresponsive (I'm guessing just from pure CPU overload), then in the gdb terminal window, I see:

(gdb) continue
Continuing.
[New Thread 0x7fd3467b3700 (LWP 9041)]

Program received signal SIGSEGV, Segmentation fault.
0x000055c2648cdd84 in nm_ip4_config_get_num_addresses ()
(gdb) 

***but*** wireless stays up and the docker build runs to completion. It's been a couple minutes now, build completed and wireless still running. So here's the gdb backtrace one more time (after letting build complete, appears to be same as before):

(gdb) backtrace
#0  0x000055c2648cdd84 in nm_ip4_config_get_num_addresses ()
#1  0x000055c26486232c in _update_ip4_address ()
#2  0x000055c264868508 in nm_device_set_ip4_config ()
#3  0x000055c2648706c6 in _cleanup_generic_post ()
#4  0x000055c264870c51 in dispose ()
#5  0x00007fd350432a5c in g_object_unref () at /lib64/libgobject-2.0.so.0
#6  0x000055c2648de73b in remove_device ()
#7  0x000055c2648e307a in _platform_link_cb_idle ()
#8  0x00007fd35012da8a in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
#9  0x00007fd35012de20 in g_main_context_iterate.isra ()
    at /lib64/libglib-2.0.so.0
#10 0x00007fd35012e142 in g_main_loop_run () at /lib64/libglib-2.0.so.0
#11 0x000055c26484c6df in main ()
(gdb) 

and when I once again quit from gdb, that's when wireless gets killed:

(gdb) quit
A debugging session is active.

	Inferior 1 [process 8685] will be detached.

Quit anyway? (y or n) y
Detaching from program: /usr/sbin/NetworkManager, process 8685
$

but it appears to re-establish fairly quickly.

  So it seems that, to get a successful docker build, I can just use the cheap workaround of gdb attaching to the Network Manager before starting the build. Weird.

Comment 7 Dan Williams 2015-08-20 14:21:06 UTC
there's a fix in f22-updates-testing, I believe

*** This bug has been marked as a duplicate of bug 1253949 ***


Note You need to log in before you can comment on or make changes to this bug.