Bug 1304062

Summary: [docker-storage-setup] Docker daemon does not start on Rawhide: core dump
Product: [Fedora] Fedora Reporter: Randy Barlow <rbarlow>
Component: dockerAssignee: Lokesh Mandvekar <lsm5>
Status: CLOSED DUPLICATE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: rawhideCC: adimania, admiller, amurdaca, dustymabe, dwalsh, fweimer, ichavero, jcajka, jchaloup, lsm5, marianne, miminar, vbatts, vgoyal
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-02-18 04:49:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Docker core file none

Description Randy Barlow 2016-02-02 19:08:34 UTC
Created attachment 1120539 [details]
Docker core file

Description of problem:
On a fresh docker installation on Rawhide, I cannot start docker daemon:

[root@boole var]# systemctl start docker
Job for docker.service failed because a fatal signal was delivered causing the control process to dump core. See "systemctl status docker.service" and "journalctl -xe" for details.
[root@boole var]# systemctl status docker.service
● docker.service - Docker Application Container Engine
   Loaded: loaded (/usr/lib/systemd/system/docker.service; disabled; vendor preset: disabled)
   Active: failed (Result: core-dump) since Tue 2016-02-02 14:02:29 EST; 7s ago
     Docs: http://docs.docker.com
  Process: 6836 ExecStart=/usr/bin/docker daemon $OPTIONS $DOCKER_STORAGE_OPTIONS $DOCKER_NETWORK_OPTIONS $INSECURE_REGISTRY (code=dumped, signal=SEGV)
 Main PID: 6836 (code=dumped, signal=SEGV)

Feb 02 14:02:29 boole.usersys.redhat.com systemd[1]: Starting Docker Application Container Engine...
Feb 02 14:02:29 boole.usersys.redhat.com systemd[1]: docker.service: Main process exited, code=dumped, status=11/SEGV
Feb 02 14:02:29 boole.usersys.redhat.com systemd[1]: Failed to start Docker Application Container Engine.
Feb 02 14:02:29 boole.usersys.redhat.com systemd[1]: docker.service: Unit entered failed state.
Feb 02 14:02:29 boole.usersys.redhat.com systemd[1]: docker.service: Failed with result 'core-dump'.
Feb 02 14:02:29 boole.usersys.redhat.com systemd-coredump[6839]: Process 6836 (docker) of user 0 dumped core.
                                                                 
                                                                 Stack trace of thread 6836:
                                                                 #0  0x00007fd1ee5eab85 n/a (n/a)

From journalctl -xe:

Feb 02 14:02:29 boole.usersys.redhat.com systemd[1]: Starting Docker Storage Setup...
-- Subject: Unit docker-storage-setup.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit docker-storage-setup.service has begun starting up.
Feb 02 14:02:29 boole.usersys.redhat.com docker-storage-setup[6802]: Rounding up size to full physical extent 20.00 MiB
Feb 02 14:02:29 boole.usersys.redhat.com docker-storage-setup[6802]: Volume group "fedora_dhcp129-24" has insufficient free space (0 extents): 5 required.
Feb 02 14:02:29 boole.usersys.redhat.com systemd[1]: docker-storage-setup.service: Main process exited, code=exited, status=5/NOTINSTALLED
Feb 02 14:02:29 boole.usersys.redhat.com systemd[1]: Failed to start Docker Storage Setup.
-- Subject: Unit docker-storage-setup.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit docker-storage-setup.service has failed.
-- 
-- The result is failed.
Feb 02 14:02:29 boole.usersys.redhat.com systemd[1]: docker-storage-setup.service: Unit entered failed state.
Feb 02 14:02:29 boole.usersys.redhat.com audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=docker-storage-setup comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? te
Feb 02 14:02:29 boole.usersys.redhat.com systemd[1]: docker-storage-setup.service: Failed with result 'exit-code'.
Feb 02 14:02:29 boole.usersys.redhat.com systemd[1]: Starting Docker Application Container Engine...
-- Subject: Unit docker.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit docker.service has begun starting up.
Feb 02 14:02:29 boole.usersys.redhat.com audit[6836]: ANOM_ABEND auid=4294967295 uid=0 gid=0 ses=4294967295 subj=system_u:system_r:docker_t:s0 pid=6836 comm="docker" exe="/usr/bin/docker" sig=11
Feb 02 14:02:29 boole.usersys.redhat.com kernel: traps: docker[6836] general protection ip:7fd1ee5eab85 sp:7ffc14248448 error:0 in ld-2.22.90.so[7fd1ee5e0000+26000]
Feb 02 14:02:29 boole.usersys.redhat.com systemd[1]: docker.service: Main process exited, code=dumped, status=11/SEGV
Feb 02 14:02:29 boole.usersys.redhat.com systemd[1]: Failed to start Docker Application Container Engine.
-- Subject: Unit docker.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit docker.service has failed.
-- 
-- The result is failed.
Feb 02 14:02:29 boole.usersys.redhat.com systemd[1]: docker.service: Unit entered failed state.
Feb 02 14:02:29 boole.usersys.redhat.com audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=docker comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=f
Feb 02 14:02:29 boole.usersys.redhat.com systemd[1]: docker.service: Failed with result 'core-dump'.
Feb 02 14:02:29 boole.usersys.redhat.com systemd-coredump[6839]: Process 6836 (docker) of user 0 dumped core.
                                                                 
                                                                 Stack trace of thread 6836:
                                                                 #0  0x00007fd1ee5eab85 n/a (n/a)
-- Subject: Process 6836 (docker) dumped core
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- Documentation: man:core(5)
-- 
-- Process 6836 (docker) crashed and dumped core.
-- 
-- This usually indicates a programming error in the crashing program and
-- should be reported to its vendor as a bug.

Comment 1 Randy Barlow 2016-02-02 19:10:03 UTC
I do see this message in the logs: Volume group "fedora_dhcp129-24" has insufficient free space (0 extents): 5 required.

I don't believe Docker in F23 "requires" free extents, or otherwise depends on LVM. Is this error message why the core dump is happening, or is it just an info message that is harmless?

Comment 2 Randy Barlow 2016-02-02 19:19:17 UTC
And of course I forgot this important info:

$ rpm -q docker
docker-1.10.0-20.gitd3f4a34.fc24.x86_64

Comment 3 Lokesh Mandvekar 2016-02-02 19:22:46 UTC
Vivek, looks like this is something for you.

(seems to work fine on my rawhide machine)

Comment 4 Lokesh Mandvekar 2016-02-02 19:26:37 UTC
more info: docker-1.10.0-20.gitd3f4a34.fc24.x86_64 used docker-storage-setup commit#1c2b95b

Comment 5 Vivek Goyal 2016-02-02 19:43:36 UTC
I think this is a docker issue. docker-storage-setup is just saying that it can't setup an lvm thin pool as there is not sufficient free space in root volume group. That means docker will use loop devices to setup thin pool. If docker is crashing after that, there is something wrong with docker.

Comment 6 Daniel Walsh 2016-02-03 08:30:28 UTC
Randy.

Do you see AVC's 

ausearch -m avc -ts recent


If you just run 

# docker daemon

Does this blow up?

Comment 7 Randy Barlow 2016-02-03 20:35:25 UTC
Hello Dan!

The ausearch command did return some output, but it was ~10 minutes before I ran systemctl start docker so I don't think it's related. They all seem iptables related and look similar to this:

time->Wed Feb  3 15:22:05 2016
type=AVC msg=audit(1454530925.689:444): avc:  denied  { read write } for  pid=1218 comm="ip6tables" path=2F746D702F6666694D66306F784A202864656C6574656429 dev="tmpfs" ino=33197 scontext=system_u:system_r:iptables_t:s0 tcontext=system_u:object_r:firewalld_tmpfs_t:s0 tclass=file permissive=0

If I run docker daemon directly, it seg faults:

$ sudo docker daemon
Segmentation fault

If I re-run ausearch after this (or after another invocation with systemctl) it says <no matches> so I don't believe this is selinux related.

Thanks for the tips, and I'm happy to help more if I can. I'm bowlofeggs on Freenode.

Comment 8 Daniel Walsh 2016-02-04 06:28:41 UTC
Does not look like an SELinux issue.

Randy did you have containers/images stored in /var/lib/docker?

Seems to be something to do with your content under /var/lib/docker.   If you have nothing of value there, I would run rm -rf /var/lib/docker and then see if you can start the docker daemon with systemctl start docker.


Another idea would be to make sure there is no problem with your install

dnf -y reinstall docker

Comment 9 Randy Barlow 2016-02-04 14:52:39 UTC
Since this is the first time docker has been installed on this particular system, there is nothing in /var/lib/docker:

$ sudo ls -lah /var/lib/docker/                                                         
total 8.0K
drwxr-xr-x.  2 root root 4.0K Jan 29 12:51 .
drwxr-xr-x. 34 root root 4.0K Feb  2 13:47 ..

I saw that there was an update available for Docker in Rawhide, so I updated but the problem remains:

$ rpm -q docker
docker-1.10.0-21.gitd3f4a34.fc24.x86_64

Even though the update probably was sufficient, I went ahead and did the reinstall as well just in case that causes different script to run or something, and that also did not solve the seg fault.

Anything else I can look at?

Comment 11 Randy Barlow 2016-02-10 21:34:57 UTC
Dan and I have done a bit of sideline chatting about this issue, as he nor anyone else is able to reproduce this problem. I also tried to reproduce it and was not able to. However, it just occurred to me that there was one difference that might matter between my reproduction machine and the machine that had the issue. The original machine I reported this issue about runs Fedora Rawhide Server, and is a kvm guest. The machine I attempted to reproduce it on was an OpenStack Fedora 23 Cloud image that I then used dnf to upgrade to Rawhide. Could Fedora Server vs. Cloud be the difference that matters? Dan, which type(s) of environment(s) did you attempt to reproduce this in?

Comment 12 Jakub Čajka 2016-02-11 08:19:30 UTC
(In reply to Randy Barlow from comment #11)
> Dan and I have done a bit of sideline chatting about this issue, as he nor
> anyone else is able to reproduce this problem. I also tried to reproduce it
> and was not able to. However, it just occurred to me that there was one
> difference that might matter between my reproduction machine and the machine
> that had the issue. The original machine I reported this issue about runs
> Fedora Rawhide Server, and is a kvm guest. The machine I attempted to
> reproduce it on was an OpenStack Fedora 23 Cloud image that I then used dnf
> to upgrade to Rawhide. Could Fedora Server vs. Cloud be the difference that
> matters? Dan, which type(s) of environment(s) did you attempt to reproduce
> this in?

I think that might be caused by BZ1304591, I'm on it right now. 

Could you provide output of lscpu in meantime? (or more specifically does your cpu support AVX?)

Comment 13 Randy Barlow 2016-02-11 15:56:05 UTC
Hello Jakub!

I do not see the string AVX (or avx) in the output of lscpu. Here is lscpu on the guest that has this issue:

$ ssh boole lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    1
Core(s) per socket:    1
Socket(s):             4
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 44
Model name:            Westmere E56xx/L56xx/X56xx (Nehalem-C)
Stepping:              1
CPU MHz:               2128.000
BogoMIPS:              4256.00
Hypervisor vendor:     KVM
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              4096K
NUMA node0 CPU(s):     0-3
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx lm constant_tsc rep_good nopl pni pclmulqdq ssse3 cx16 sse4_1 sse4_2 x2apic popcnt aes hypervisor lahf_lm arat

Here is lscpu on the host:

$ lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    1
Core(s) per socket:    4
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 44
Model name:            Intel(R) Xeon(R) CPU           E5606  @ 2.13GHz
Stepping:              2
CPU MHz:               1197.000
CPU max MHz:           2128.0000
CPU min MHz:           1197.0000
BogoMIPS:              4255.91
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              8192K
NUMA node0 CPU(s):     0-3
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 popcnt aes lahf_lm arat epb dtherm tpr_shadow vnmi flexpriority ept vpid

Comment 14 Florian Weimer 2016-02-18 04:49:49 UTC
This matches the known golang ABI issue.

*** This bug has been marked as a duplicate of bug 1304591 ***