Bug 1099375

Summary: docker fail to start when dm metadata are corrupted
Product: Red Hat Enterprise Linux 7 Reporter: Lukáš Doktor <ldoktor>
Component: kernelAssignee: LVM and device-mapper development team <lvm-team>
kernel sub component: Thin Provisioning QA Contact: Storage QE <storage-qe>
Status: CLOSED WONTFIX Docs Contact:
Severity: unspecified    
Priority: unspecified CC: agk, dwalsh, mclasen, msnitzer, thornber, vgoyal
Version: 7.0   
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-04-01 16:17:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Lukáš Doktor 2014-05-20 07:09:45 UTC
Description of problem:
Hello guys, today after a docker daemon crash I had problems to get it working again. The problem was, that metadata (/var/lib/docker/devicemapper/devicemapper/metadata) was corrupted. The older version of docker seems to ignores/workarounds that and works perfectly, but the lates version of docker simply stops.

Version-Release number of selected component (if applicable):
docker-0.11.1-2.el7.x86_64

How reproducible:
On metadata corruption

Steps to Reproduce:
1. corrupt metadata (eg. hexedit /var/lib/docker/devicemapper/devicemapper/metadata and override first few characters)
2. docker -D -d

Actual results:
2014/05/20 08:55:40 docker daemon: 0.11.1-dev abf8689/0.11.1; execdriver: native; graphdriver: 
[47686eaa] +job serveapi(unix:///var/run/docker.sock)
[47686eaa] +job initserver()
[47686eaa.initserver()] Creating server
[debug] deviceset.go:527 Generated prefix: docker-253:3-4210871
[debug] deviceset.go:530 Checking for existence of the pool 'docker-253:3-4210871-pool'
[debug] deviceset.go:549 Pool doesn't exist. Creating it.
[debug] daemon.go:747 Using graph driver devicemapper
[debug] daemon.go:764 Creating images graph
[debug] graph.go:65 Restored 34 elements
[debug] daemon.go:776 Creating volumes graph
[debug] graph.go:65 Restored 3 elements
[debug] daemon.go:781 Creating repository list
[47686eaa] +job init_networkdriver()
[DEBUG] [iptables]: /sbin/iptables, [-C POSTROUTING -t nat -s 172.17.42.1/16 ! -d 172.17.42.1/16 -j MASQUERADE]
2014/05/20 08:55:40 Listening for HTTP on unix (/var/run/docker.sock)
[debug] server.go:1095 Registering GET, /events
[debug] server.go:1095 Registering GET, /images/search
[debug] server.go:1095 Registering GET, /images/{name:.*}/history
[debug] server.go:1095 Registering GET, /containers/{name:.*}/json
[debug] server.go:1095 Registering GET, /containers/{name:.*}/top
[debug] server.go:1095 Registering GET, /_ping
[debug] server.go:1095 Registering GET, /info
[debug] server.go:1095 Registering GET, /version
[debug] server.go:1095 Registering GET, /images/{name:.*}/get
[debug] server.go:1095 Registering GET, /containers/ps
[debug] server.go:1095 Registering GET, /containers/{name:.*}/export
[debug] server.go:1095 Registering GET, /containers/{name:.*}/attach/ws
[debug] server.go:1095 Registering GET, /images/{name:.*}/json
[debug] server.go:1095 Registering GET, /containers/{name:.*}/logs
[debug] server.go:1095 Registering GET, /images/json
[debug] server.go:1095 Registering GET, /images/viz
[debug] server.go:1095 Registering GET, /containers/json
[debug] server.go:1095 Registering GET, /containers/{name:.*}/changes
[debug] server.go:1095 Registering POST, /images/create
[debug] server.go:1095 Registering POST, /containers/{name:.*}/kill
[debug] server.go:1095 Registering POST, /containers/{name:.*}/copy
[debug] server.go:1095 Registering POST, /commit
[debug] server.go:1095 Registering POST, /images/{name:.*}/tag
[debug] server.go:1095 Registering POST, /containers/{name:.*}/restart
[debug] server.go:1095 Registering POST, /containers/{name:.*}/wait
[debug] server.go:1095 Registering POST, /build
[debug] server.go:1095 Registering POST, /images/{name:.*}/push
[debug] server.go:1095 Registering POST, /containers/{name:.*}/start
[debug] server.go:1095 Registering POST, /containers/{name:.*}/resize
[debug] server.go:1095 Registering POST, /auth
[debug] server.go:1095 Registering POST, /images/{name:.*}/insert
[debug] server.go:1095 Registering POST, /images/load
[debug] server.go:1095 Registering POST, /containers/create
[debug] server.go:1095 Registering POST, /containers/{name:.*}/stop
[debug] server.go:1095 Registering POST, /containers/{name:.*}/attach
[debug] server.go:1095 Registering DELETE, /containers/{name:.*}
[debug] server.go:1095 Registering DELETE, /images/{name:.*}
[debug] server.go:1095 Registering OPTIONS, 
[debug] server.go:1185 docker group found. gid: 985
[DEBUG] [iptables]: /sbin/iptables, [-D FORWARD -i docker0 -o docker0 -j DROP]
[DEBUG] [iptables]: /sbin/iptables, [-C FORWARD -i docker0 -o docker0 -j ACCEPT]
[DEBUG] [iptables]: /sbin/iptables, [-C FORWARD -i docker0 ! -o docker0 -j ACCEPT]
[DEBUG] [iptables]: /sbin/iptables, [-C FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT]
[DEBUG] [iptables]: /sbin/iptables, [-t nat -D PREROUTING -m addrtype --dst-type LOCAL -j DOCKER]
[DEBUG] [iptables]: /sbin/iptables, [-t nat -D OUTPUT -m addrtype --dst-type LOCAL ! --dst 127.0.0.0/8 -j DOCKER]
[DEBUG] [iptables]: /sbin/iptables, [-t nat -D OUTPUT -m addrtype --dst-type LOCAL -j DOCKER]
[DEBUG] [iptables]: /sbin/iptables, [-t nat -D PREROUTING -j DOCKER]
[DEBUG] [iptables]: /sbin/iptables, [-t nat -D OUTPUT -j DOCKER]
[DEBUG] [iptables]: /sbin/iptables, [-t nat -F DOCKER]
[DEBUG] [iptables]: /sbin/iptables, [-t nat -X DOCKER]
[DEBUG] [iptables]: /sbin/iptables, [-t nat -N DOCKER]
[DEBUG] [iptables]: /sbin/iptables, [-t nat -A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER]
[DEBUG] [iptables]: /sbin/iptables, [-t nat -A OUTPUT -m addrtype --dst-type LOCAL ! --dst 127.0.0.0/8 -j DOCKER]
[47686eaa] -job init_networkdriver() = OK (0)
[47686eaa.initserver()] Creating pidfile
[47686eaa.initserver()] Setting up signal traps
[47686eaa] -job initserver() = OK (0)
[47686eaa] +job acceptconnections()
[47686eaa] -job acceptconnections() = OK (0)
^C2014/05/20 08:55:53 Received signal 'interrupt', starting shutdown of docker...
[debug] daemon.go:861 starting clean shutdown of all containers...
[debug] deviceset.go:832 [deviceset docker-253:3-4210871] shutdown()
[debug] deviceset.go:833 [devmapper] Shutting down DeviceSet: /home/medic/Work/Projekty/Docker/root/devicemapper
[debug] deviceset.go:719 [devmapper] deactivateDevice()
[debug] deviceset.go:815 Waiting for unmount of : opencount=0
[debug] deviceset.go:741 [devmapper] deactivateDevice END
[debug] deviceset.go:703 [devmapper] deactivatePool()
[debug] devmapper.go:525 [devmapper] removeDevice START
[debug] devmapper.go:539 [devmapper] removeDevice END
[debug] deviceset.go:716 [devmapper] deactivatePool END
[debug] deviceset.go:881 [deviceset docker-253:3-4210871] shutdown END


Expected results (obtained from upstream "Docker version 0.10.0, build dc9c28f"):
[root@t530 ~ ]$ docker-upstream -d -D | tee /tmp/docker.log
[/home/medic/Work/Projekty/Docker/root|262a1674] +job serveapi(unix:///var/run/docker.sock)
[/home/medic/Work/Projekty/Docker/root|262a1674] +job initserver()
[/home/medic/Work/Projekty/Docker/root|262a1674.initserver()] Creating server
2014/05/20 08:59:26 Listening for HTTP on unix (/var/run/docker.sock)
[debug] driver.go:78 Error loading driver aufs: AUFS was not found in /proc/filesystems
[debug] driver.go:78 Error loading driver btrfs: /home/medic/Work/Projekty/Docker/root is not a btrfs filesystem
[debug] deviceset.go:528 Generated prefix: docker-253:3-4210871
[debug] deviceset.go:531 Checking for existence of the pool 'docker-253:3-4210871-pool'
[debug] deviceset.go:545 Pool doesn't exist. Creating it.
[debug] deviceset.go:408 libdevmapper(3): ioctl/libdm-iface.c:1768 (-1) device-mapper: reload ioctl on docker-253:3-4210871-pool failed: Invalid or incomplete multibyte or wide character
[debug] deviceset.go:562 
--->Err: Error running DeviceCreate (createPool)

[debug] driver.go:78 Error loading driver devicemapper: Error running DeviceCreate (createPool)
[debug] runtime.go:690 Using graph driver vfs
[debug] runtime.go:707 Creating images graph
[debug] graph.go:63 Restored 34 elements
[debug] runtime.go:719 Creating volumes graph
[debug] graph.go:63 Restored 3 elements
[debug] runtime.go:724 Creating repository list
[/home/medic/Work/Projekty/Docker/root|262a1674] +job init_networkdriver()
[DEBUG] [iptables]: /sbin/iptables, [-C POSTROUTING -t nat -s 172.17.42.1/16 ! -d 172.17.42.1/16 -j MASQUERADE]
[DEBUG] [iptables]: /sbin/iptables, [-D FORWARD -i docker0 -o docker0 -j DROP]
[DEBUG] [iptables]: /sbin/iptables, [-C FORWARD -i docker0 -o docker0 -j ACCEPT]
[debug] server.go:1048 Registering GET, /events
[debug] server.go:1048 Registering GET, /containers/{name:.*}/top
[debug] server.go:1048 Registering GET, /containers/{name:.*}/attach/ws
[debug] server.go:1048 Registering GET, /images/json
[debug] server.go:1048 Registering GET, /images/{name:.*}/history
[debug] server.go:1048 Registering GET, /containers/ps
[debug] server.go:1048 Registering GET, /containers/json
[debug] server.go:1048 Registering GET, /containers/{name:.*}/json
[debug] server.go:1048 Registering GET, /images/viz
[debug] server.go:1048 Registering GET, /images/{name:.*}/json
[debug] server.go:1048 Registering GET, /info
[debug] server.go:1048 Registering GET, /version
[debug] server.go:1048 Registering GET, /images/search
[debug] server.go:1048 Registering GET, /images/{name:.*}/get
[debug] server.go:1048 Registering GET, /containers/{name:.*}/export
[debug] server.go:1048 Registering GET, /containers/{name:.*}/changes
[debug] server.go:1048 Registering POST, /auth
[debug] server.go:1048 Registering POST, /images/create
[debug] server.go:1048 Registering POST, /images/{name:.*}/tag
[debug] server.go:1048 Registering POST, /containers/{name:.*}/restart
[debug] server.go:1048 Registering POST, /containers/{name:.*}/start
[debug] server.go:1048 Registering POST, /containers/{name:.*}/copy
[debug] server.go:1048 Registering POST, /build
[debug] server.go:1048 Registering POST, /containers/{name:.*}/stop
[debug] server.go:1048 Registering POST, /commit
[debug] server.go:1048 Registering POST, /containers/{name:.*}/kill
[debug] server.go:1048 Registering POST, /containers/{name:.*}/wait
[debug] server.go:1048 Registering POST, /images/{name:.*}/insert
[debug] server.go:1048 Registering POST, /images/load
[debug] server.go:1048 Registering POST, /images/{name:.*}/push
[debug] server.go:1048 Registering POST, /containers/create
[debug] server.go:1048 Registering POST, /containers/{name:.*}/resize
[debug] server.go:1048 Registering POST, /containers/{name:.*}/attach
[debug] server.go:1048 Registering DELETE, /containers/{name:.*}
[debug] server.go:1048 Registering DELETE, /images/{name:.*}
[debug] server.go:1048 Registering OPTIONS, 
[debug] server.go:1138 docker group found. gid: 985
[DEBUG] [iptables]: /sbin/iptables, [-C FORWARD -i docker0 ! -o docker0 -j ACCEPT]
[DEBUG] [iptables]: /sbin/iptables, [-C FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT]
[DEBUG] [iptables]: /sbin/iptables, [-t nat -D PREROUTING -m addrtype --dst-type LOCAL -j DOCKER]
[DEBUG] [iptables]: /sbin/iptables, [-t nat -D OUTPUT -m addrtype --dst-type LOCAL ! --dst 127.0.0.0/8 -j DOCKER]
[DEBUG] [iptables]: /sbin/iptables, [-t nat -D OUTPUT -m addrtype --dst-type LOCAL -j DOCKER]
[DEBUG] [iptables]: /sbin/iptables, [-t nat -D PREROUTING -j DOCKER]
[DEBUG] [iptables]: /sbin/iptables, [-t nat -D OUTPUT -j DOCKER]
[DEBUG] [iptables]: /sbin/iptables, [-t nat -F DOCKER]
[DEBUG] [iptables]: /sbin/iptables, [-t nat -X DOCKER]
[DEBUG] [iptables]: /sbin/iptables, [-t nat -N DOCKER]
[DEBUG] [iptables]: /sbin/iptables, [-t nat -A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER]
[DEBUG] [iptables]: /sbin/iptables, [-t nat -A OUTPUT -m addrtype --dst-type LOCAL ! --dst 127.0.0.0/8 -j DOCKER]
[/home/medic/Work/Projekty/Docker/root|262a1674] -job init_networkdriver() = OK (0)
[/home/medic/Work/Projekty/Docker/root|262a1674.initserver()] Creating pidfile
[/home/medic/Work/Projekty/Docker/root|262a1674.initserver()] Setting up signal traps
[/home/medic/Work/Projekty/Docker/root|262a1674] -job initserver() = OK (0)
[/home/medic/Work/Projekty/Docker/root|262a1674] +job acceptconnections()
[/home/medic/Work/Projekty/Docker/root|262a1674] -job acceptconnections() = OK (0)
# Docker is up and running

Additional info:
Workaround is simple, just rm -rf /var/lib/docker/devicemapper/ ...

Comment 1 Lukáš Doktor 2014-05-20 07:12:35 UTC
The output of /var/log/messages:
May 20 08:47:38 t530 kernel: [  227.203463] loop: module loaded
May 20 08:47:38 t530 kernel: loop: module loaded
May 20 08:47:38 t530 kernel: [  227.235138] device-mapper: space map common: index_check failed: csum 2396624857 != wanted 2396666900
May 20 08:47:38 t530 kernel: [  227.235163] device-mapper: block manager: index validator check failed for block 1054
May 20 08:47:38 t530 kernel: [  227.235165] device-mapper: transaction manager: couldn't open metadata space map
May 20 08:47:38 t530 kernel: [  227.235170] device-mapper: thin metadata: tm_open_with_sm failed
May 20 08:47:38 t530 kernel: device-mapper: space map common: index_check failed: csum 2396624857 != wanted 2396666900
May 20 08:47:38 t530 kernel: device-mapper: block manager: index validator check failed for block 1054
May 20 08:47:38 t530 kernel: device-mapper: transaction manager: couldn't open metadata space map
May 20 08:47:38 t530 kernel: device-mapper: thin metadata: tm_open_with_sm failed
May 20 08:47:38 t530 kernel: [  227.239388] device-mapper: table: 253:4: thin-pool: Error creating metadata object
May 20 08:47:38 t530 kernel: [  227.239399] device-mapper: ioctl: error adding target to table
May 20 08:47:38 t530 kernel: device-mapper: table: 253:4: thin-pool: Error creating metadata object
May 20 08:47:38 t530 kernel: device-mapper: ioctl: error adding target to table

Comment 3 Alexander Larsson 2014-06-18 13:14:05 UTC
I don't see what docker could do in this situation. It seems like the error is in the kernel. Dunno how that could have worked before...

Comment 4 Daniel Walsh 2014-06-18 14:00:58 UTC
Should we reassign to the kernel?

Comment 5 Lukáš Doktor 2014-06-30 14:02:27 UTC
I tested this with the same kernel using two docker versions. Older version just put a message about this error and proceed, newer version logged the same error and stopped. So apparently it can be worked around...

Comment 6 Matthias Clasen 2014-09-30 15:00:14 UTC
moving docker bugs off alexl

Comment 7 Vincent Batts 2014-10-21 16:48:05 UTC
On docker-1.3 if you corrupt the metadata file, then it fails with the following

```
vbatts@valse ~ (master) $ sudo docker -d -D -g /home/docker
2014/10/21 12:44:39 docker daemon: 1.3.0 c78088f; execdriver: native; graphdriver: 
[ab4be98f] +job serveapi(unix:///var/run/docker.sock)
[debug] deviceset.go:565 Generated prefix: docker-253:2-4980740
[debug] deviceset.go:568 Checking for existence of the pool 'docker-253:2-4980740-pool'
[debug] deviceset.go:587 Pool doesn't exist. Creating it.
[debug] deviceset.go:455 libdevmapper(3): ioctl/libdm-iface.c:1769 (-1) device-mapper: reload ioctl on docker-253:2-4980740-pool failed: Invalid or incomplete multibyte or wide character
2014/10/21 12:44:39 Error running DeviceCreate (createPool) dm_task_run failed
```


I don't think the solution is to failback to the 'vfs' graph driver, but perhaps have clearer output that the metadata is corrupt.

Comment 9 Vincent Batts 2017-03-13 13:22:15 UTC
After chatting with Vivek, I don't this is any longer an issue. Can you reproduce with a recent version of docker?

Comment 10 Lukáš Doktor 2017-03-14 11:30:33 UTC
Hello Vincent,

not sure how about RHEL, but on F25 it still fails. My steps are:

$ dd of=/var/lib/docker/devicemapper/devicemapper/metadata if=/dev/urandom count=1 bs=5 skip=10
$ systemctl start docker
Job for docker.service failed because the control process exited with error code.
See "systemctl status docker.service" and "journalctl -xe" for details.
$ journalctl -xe
...
bře 14 12:27:40 localhost.localdomain audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=docker-stor
bře 14 12:27:40 localhost.localdomain systemd[1]: docker-storage-setup.service: Unit entered failed state.
bře 14 12:27:40 localhost.localdomain systemd[1]: docker-storage-setup.service: Failed with result 'exit-code'.
bře 14 12:27:40 localhost.localdomain systemd[1]: Starting Docker Application Container Engine...
-- Subject: Unit docker.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit docker.service has begun starting up.
bře 14 12:27:40 localhost.localdomain kernel: device-mapper: space map common: index_check failed: blocknr 0 != wanted 148
bře 14 12:27:40 localhost.localdomain kernel: device-mapper: block manager: index validator check failed for block 148
bře 14 12:27:40 localhost.localdomain kernel: device-mapper: transaction manager: couldn't open metadata space map
bře 14 12:27:40 localhost.localdomain kernel: device-mapper: thin metadata: tm_open_with_sm failed
bře 14 12:27:40 localhost.localdomain kernel: device-mapper: table: 253:8: thin-pool: Error creating metadata object
bře 14 12:27:40 localhost.localdomain kernel: device-mapper: ioctl: error adding target to table
bře 14 12:27:40 localhost.localdomain dockerd-current[30313]: time="2017-03-14T12:27:40.742875167+01:00" level=error msg="[graphdriver] prior storage driver \"
bře 14 12:27:40 localhost.localdomain dockerd-current[30313]: time="2017-03-14T12:27:40.742954033+01:00" level=fatal msg="Error starting daemon: error initiali
bře 14 12:27:40 localhost.localdomain systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
bře 14 12:27:40 localhost.localdomain systemd[1]: Failed to start Docker Application Container Engine.
-- Subject: Unit docker.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit docker.service has failed.
-- 
-- The result is failed.
bře 14 12:27:40 localhost.localdomain audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=docker comm
bře 14 12:27:40 localhost.localdomain systemd[1]: docker.service: Unit entered failed state.
bře 14 12:27:40 localhost.localdomain systemd[1]: docker.service: Failed with result 'exit-code'.


I have F25 with docker-1.12.6-6.gitae7d637.fc25.x86_64

Comment 11 Vincent Batts 2018-11-14 14:37:38 UTC
Is this still an issue? Vivek, do you know whether this even applies to containers/storage now?

Comment 14 Red Hat Bugzilla 2023-09-14 23:57:40 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days