Bug 1572266

Summary:	docker-storage-setup fails, creates incomplete lvm pv
Product:	Red Hat Enterprise Linux 7	Reporter:	Ilkka Tengvall <ikke>
Component:	docker	Assignee:	Vivek Goyal <vgoyal>
Status:	CLOSED WONTFIX	QA Contact:	atomic-bugs <atomic-bugs>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	7.5	CC:	agk, amurdaca, fche, ikke, lsm5, lvm-team, myllynen, pasik, smakarov, tsweeney, walters, zkabelac
Target Milestone:	rc	Keywords:	Extras
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-06-09 21:41:36 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Ilkka Tengvall 2018-04-26 14:48:33 UTC

Description of problem:

I try running docker-storage-setup, and it leads to incomplete physical volume setup which blocks further installs. This is regression, never happened on earlier RHELs. Now constantly.

Version-Release number of selected component (if applicable):

docker-1.13.1-58.git87f2fab.el7.x86_64
python-2.7.5-65.el7.x86_64
ansible 2.5.1 (on mac)

How reproducible:

Happens randomly. I have 7 machines in my OCP cluster which I try to create, and it always fails on one or two of them.

Steps to Reproduce:
1. wipe disk labels
2. configure docker-storage-setup
3. run docker-storage-setup

Actual results:

This in log:
Apr 26 16:12:42 ocp-apps2 systemd[1]: Starting system-lvm2\x2dpvscan.slice.
Apr 26 16:12:42 ocp-apps2 systemd[1]: Starting LVM2 PV scan on device 253:17...
Apr 26 16:12:42 ocp-apps2 kernel: device-mapper: uevent: version 1.0.3
Apr 26 16:12:42 ocp-apps2 kernel: device-mapper: ioctl: 4.37.0-ioctl (2017-09-20) initialised: dm-devel
Apr 26 16:12:43 ocp-apps2 lvm[14076]: 1 logical volume(s) in volume group "ocp" now active
Apr 26 16:12:43 ocp-apps2 systemd[1]: Started LVM2 PV scan on device 253:17.
Apr 26 16:12:43 ocp-apps2 systemd[1]: Started /usr/sbin/lvm pvscan --cache 253:17.
Apr 26 16:12:43 ocp-apps2 systemd[1]: Starting /usr/sbin/lvm pvscan --cache 253:17...
Apr 26 16:12:43 ocp-apps2 lvm[14104]: No PV label found on /dev/vdb1.
Apr 26 16:12:43 ocp-apps2 lvmetad[11672]: vg_lookup vgid 8avUed-Tksy-6Zdv-BhP2-ROUb-Zmx7-6w2RkY name ocp found incomplete mapping uuid none name none

Expected results:

docker-storage setup would run through, and there would be this line in /etc/sysconfig/docker-storage:
[cloud-user@ocp-apps1 ~]$ cat /etc/sysconfig/docker-storage
DOCKER_STORAGE_OPTIONS="--storage-driver overlay2 "


Additional info:

Automation commands are here:

https://github.com/ikke-t/ocp-libvirt-infra-ansibles/blob/master/roles/ocp_node/tasks/main.yml


For later debug:
[root@ocp-apps2 ~]# vgs
[root@ocp-apps2 ~]# lvs
[root@ocp-apps2 ~]# vgs -a
[root@ocp-apps2 ~]# pvs
[root@ocp-apps2 ~]# pvs -a
  PV                VG Fmt Attr PSize PFree
  /dev/ocp/dockerlv        ---     0     0 
  /dev/vda1                ---     0     0 
  /dev/vda2                ---     0     0 
  /dev/vda3                ---     0     0 
  /dev/vdb1                ---     0     0 


[root@ocp-apps2 ~]# cat /etc/sysconfig/docker-storage-setup 
STORAGE_DRIVER=overlay2
DEVS=/dev/vdb
CONTAINER_ROOT_LV_NAME=dockerlv
CONTAINER_ROOT_LV_MOUNT_PATH=/var/lib/docker
VG=ocp

[root@ocp-apps2 ~]# cat /etc/sysconfig/docker-storage
DOCKER_STORAGE_OPTIONS=
[root@ocp-apps2 ~]# 

[root@ocp-apps2 ~]# blkid
/dev/sr0: UUID="2018-04-26-14-44-30-00" LABEL="cidata" TYPE="iso9660" 
/dev/vda1: UUID="a7050b3c-1998-4c4e-9c8d-8a5181c7ed59" TYPE="xfs" 
/dev/vda2: UUID="ad3f65a6-391e-405f-b1d1-9503a6c8ec07" TYPE="swap" 
/dev/vda3: UUID="d6796c0f-08ef-4190-b202-ec3298d074d6" TYPE="xfs" 
/dev/mapper/ocp-dockerlv: UUID="5860a78a-34d1-4f7f-ab60-0762fd7de76f" TYPE="xfs" 

[root@ocp-apps2 ~]# fdisk -l /dev/vdb

Disk /dev/vdb: 64.4 GB, 64424509440 bytes, 125829120 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x0006603e

   Device Boot      Start         End      Blocks   Id  System
/dev/vdb1            2048   125829119    62913536   8e  Linux LVM

Comment 2 Ilkka Tengvall 2018-04-26 14:52:05 UTC

See also https://bugzilla.redhat.com/show_bug.cgi?id=1429658

Comment 3 Vivek Goyal 2018-04-27 12:16:16 UTC

So what happens when you run docker-storage-setup? Does it fail? Can you give status of "docker-storage-setup" service and any messages it has outputted.

What do you see in /etc/sysconfig/docker-storage file?

Comment 4 Serguei Makarov 2018-04-27 13:51:57 UTC

I have seen the same problem on RHEL 7.4 when trying to use Marko Myllynen's openshift-automation-tools (https://github.com/myllynen/openshift-automation-tools). It is much more intermittent, with docker-storage-setup on the VMs failing roughly every 1 in 3 or 1 in 4 times I run the playbooks.

Below are results from my latest run which had 4 VMs fail (including docker-storage-setup output), the failing VM I checked is consistent with what Ilkka sees on RHEL 7.5:

PLAY RECAP *****************************************************************************
infra01.tofu.yyz.redhat.com : ok=34   changed=16   unreachable=0    failed=0
infra02.tofu.yyz.redhat.com : ok=29   changed=11   unreachable=0    failed=1
master01.tofu.yyz.redhat.com : ok=29   changed=11   unreachable=0    failed=1
nfs01.tofu.yyz.redhat.com  : ok=24   changed=7    unreachable=0    failed=0
node01.tofu.yyz.redhat.com : ok=34   changed=16   unreachable=0    failed=0
node02.tofu.yyz.redhat.com : ok=29   changed=11   unreachable=0    failed=1
node03.tofu.yyz.redhat.com : ok=29   changed=11   unreachable=0    failed=1

/* WORKING ONE */

$ ssh infra01.tofu.yyz.redhat.com

[root@infra01 ~]# vgs
  VG        #PV #LV #SN Attr   VSize   VFree
  docker-vg   1   1   0 wz--n- <48.00g 28.80g
  vg00        1   1   0 wz--n- <39.30g     0

[root@infra01 ~]# lvs
  LV          VG        Attr       LSize   Pool Origin Data%  Meta%
Move Log Cpy%Sync Convert
  docker-pool docker-vg twi-a-t---  19.09g             0.00   0.08

  lv_root     vg00      -wi-ao---- <39.30g

[root@infra01 ~]# vgs -a
  VG        #PV #LV #SN Attr   VSize   VFree
  docker-vg   1   1   0 wz--n- <48.00g 28.80g
  vg00        1   1   0 wz--n- <39.30g     0
[root@infra01 ~]# pvs -a
  PV                VG        Fmt  Attr PSize   PFree
  /dev/vda1                        ---       0      0
  /dev/vda2         vg00      lvm2 a--  <39.30g     0
  /dev/vdb1         docker-vg lvm2 a--  <48.00g 28.80g
  /dev/vg00/lv_root                ---       0      0

[root@infra01 ~]# cat /etc/sysconfig/docker-storage-setup
# Ansible managed

# See docker-storage-setup(1)
WIPE_SIGNATURES=true
DEVS=/dev/vdb
VG=docker-vg

[root@infra01 ~]# cat /etc/sysconfig/docker-storage
DOCKER_STORAGE_OPTIONS="--storage-driver devicemapper --storage-opt
dm.fs=xfs --storage-opt
dm.thinpooldev=/dev/mapper/docker--vg-docker--pool --storage-opt
dm.use_deferred_removal=true --storage-opt
dm.use_deferred_deletion=true "

[root@infra01 ~]# blkid
/dev/mapper/vg00-lv_root: UUID="76a3baf2-8d71-46bf-a34c-c5e2a1050194" TYPE="xfs"
/dev/vda2: UUID="d1ya9l-j0fu-fFQe-TkYH-Bur0-Clqb-75pKjj" TYPE="LVM2_member"
/dev/vda1: UUID="8e3e33d5-5cd5-4cf3-a547-370c3744ff5a" TYPE="xfs"
/dev/vdb1: UUID="qCNFtB-PS8C-TL1F-0AhD-shti-MeXK-eSlBJC" TYPE="LVM2_member"

[root@infra01 ~]# fdisk -l /dev/vdb

Disk /dev/vdb: 51.5 GB, 51539607552 bytes, 100663296 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x000b117a

   Device Boot      Start         End      Blocks   Id  System
/dev/vdb1            2048   100663295    50330624   8e  Linux LVM

/* NOW THE NOT-WORKING ONE */

$ ssh infra02.tofu.yyz.redhat.com

[root@infra02 ~]# vgs
  VG   #PV #LV #SN Attr   VSize   VFree
  vg00   1   1   0 wz--n- <39.30g    0
[root@infra02 ~]# lvs
  LV      VG   Attr       LSize   Pool Origin Data%  Meta%  Move Log
Cpy%Sync Convert
  lv_root vg00 -wi-ao---- <39.30g
[root@infra02 ~]# vgs -a
  VG   #PV #LV #SN Attr   VSize   VFree
  vg00   1   1   0 wz--n- <39.30g    0
[root@infra02 ~]# pvs -a
  PV                VG   Fmt  Attr PSize   PFree
  /dev/vda1                   ---       0     0
  /dev/vda2         vg00 lvm2 a--  <39.30g    0
  /dev/vdb1                   ---       0     0
  /dev/vg00/lv_root           ---       0     0

[root@infra02 ~]# cat /etc/sysconfig/docker-storage-setup
# Ansible managed

# See docker-storage-setup(1)
WIPE_SIGNATURES=true
DEVS=/dev/vdb
VG=docker-vg

[root@infra02 ~]# cat /etc/sysconfig/docker-storage
# This file may be automatically generated by an installation program.
# Please DO NOT edit this file directly. Instead edit
# /etc/sysconfig/docker-storage-setup and/or refer to
# "man docker-storage-setup".

# By default, Docker uses a loopback-mounted sparse file in
# /var/lib/docker.  The loopback makes it slower, and there are some
# restrictive defaults, such as 100GB max storage.

DOCKER_STORAGE_OPTIONS=

[root@infra02 ~]# blkid
/dev/mapper/vg00-lv_root: UUID="1e751a48-3270-473a-a464-7db6bf1e69fd" TYPE="xfs"
/dev/vda2: UUID="wvCIvh-tEN9-NtMI-dfZ5-1TDl-xGcW-XSv67w" TYPE="LVM2_member"
/dev/vda1: UUID="779d3056-af4e-4303-8fea-c851bb3d8cbe" TYPE="xfs"

[root@infra02 ~]# fdisk -l /dev/vdb

Disk /dev/vdb: 51.5 GB, 51539607552 bytes, 100663296 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x0006e84c

   Device Boot      Start         End      Blocks   Id  System
/dev/vdb1            2048   100663295    50330624   8e  Linux LVM

[root@infra02 ~]# docker-storage-setup
INFO: Wipe Signatures is set to true. Any signatures on /dev/vdb will be wiped.
wipefs: error: /dev/vdb: probing initialization failed: Device or resource busy
ERROR: Failed to wipe signatures on device /dev/vdb

Comment 5 Ilkka Tengvall 2018-04-27 14:04:23 UTC

Vivek, the docker-storage and docker-storage-setup contents are in my first post, please re-read. The docker-storage-setup command fails like this in ansible:

----------------------

TASK [ocp_node : Apply Docker storage configuration changes] *************************************************************************************
fatal: [ocp-apps2]: FAILED! => {"changed": true, "cmd": ["docker-storage-setup"], "delta": "0:00:01.411891", "end": "2018-04-26 18:33:05.915931", "msg": "non-zero return code", "rc": 5, "start": "2018-04-26 18:33:04.504040", "stderr": "INFO: Volume group backing root filesystem could not be determined\nINFO: Device node /dev/vdb1 exists.\n  Can't open /dev/vdb1 exclusively.  Mounted filesystem?", "stderr_lines": ["INFO: Volume group backing root filesystem could not be determined", "INFO: Device node /dev/vdb1 exists.", "  Can't open /dev/vdb1 exclusively.  Mounted filesystem?"], "stdout": "/dev/vdb1: 8 bytes were erased at offset 0x00000218 (LVM2_member): 4c 56 4d 32 20 30 30 31", "stdout_lines": ["/dev/vdb1: 8 bytes were erased at offset 0x00000218 (LVM2_member): 4c 56 4d 32 20 30 30 31"]}
fatal: [ocp-apps1]: FAILED! => {"changed": true, "cmd": ["docker-storage-setup"], "delta": "0:00:01.741506", "end": "2018-04-26 18:33:06.202183", "msg": "non-zero return code", "rc": 5, "start": "2018-04-26 18:33:04.460677", "stderr": "INFO: Volume group backing root filesystem could not be determined\nINFO: Device node /dev/vdb1 exists.\n  Can't open /dev/vdb1 exclusively.  Mounted filesystem?", "stderr_lines": ["INFO: Volume group backing root filesystem could not be determined", "INFO: Device node /dev/vdb1 exists.", "  Can't open /dev/vdb1 exclusively.  Mounted filesystem?"], "stdout": "/dev/vdb1: 8 bytes were erased at offset 0x00000218 (LVM2_member): 4c 56 4d 32 20 30 30 31", "stdout_lines": ["/dev/vdb1: 8 bytes were erased at offset 0x00000218 (LVM2_member): 4c 56 4d 32 20 30 30 31"]}
fatal: [ocp-infra]: FAILED! => {"changed": true, "cmd": ["docker-storage-setup"], "delta": "0:00:01.850122", "end": "2018-04-26 18:33:06.246404", "msg": "non-zero return code", "rc": 5, "start": "2018-04-26 18:33:04.396282", "stderr": "INFO: Volume group backing root filesystem could not be determined\nINFO: Device node /dev/vdb1 exists.\n  Can't open /dev/vdb1 exclusively.  Mounted filesystem?", "stderr_lines": ["INFO: Volume group backing root filesystem could not be determined", "INFO: Device node /dev/vdb1 exists.", "  Can't open /dev/vdb1 exclusively.  Mounted filesystem?"], "stdout": "/dev/vdb1: 8 bytes were erased at offset 0x00000218 (LVM2_member): 4c 56 4d 32 20 30 30 31", "stdout_lines": ["/dev/vdb1: 8 bytes were erased at offset 0x00000218 (LVM2_member): 4c 56 4d 32 20 30 30 31"]}
changed: [ocp-master]

--------------------------------


So you see it just fails. This is what the file looks like after succesful run:

----
$ cat /etc/sysconfig/docker-storage
DOCKER_STORAGE_OPTIONS="--storage-driver overlay2 "
----

where as the driver is not set in faulty one like you seen in the first post.

Comment 6 Ilkka Tengvall 2018-04-27 14:06:19 UTC

$ journalctl -xu docker-storage-setup
-- No entries --

Comment 7 Vivek Goyal 2018-04-27 18:51:50 UTC

Ilkka, 

Can you also provide output of following.

- container-storage-setup --version
- rpm -aq | grep container-storage-setup
- cat /usr/share/container-storage-setup/container-storage-setup | grep STORAGE_DRIVER

Comment 8 Vivek Goyal 2018-04-27 18:54:23 UTC

Can I have access to node where this problem is being seen/preproduced.

Comment 9 Vivek Goyal 2018-04-27 19:06:35 UTC

Looking at the messages, it looks like some kind of race between container-storage-setup and what lvm is doing.

I have few more questions. I see following in ansible script. What does it mean and what it is supposed to do.


- name: Detect Docker storage configuration status
  command: grep -q overlay2 /etc/sysconfig/docker-storage
  register: docker_storage_test
  changed_when: false
  failed_when: false

- name: Wipe signatures from docker disk
  command: "wipefs --all --force /dev/vdb"
  when: docker_storage_test.rc != 0


What's the status of storage when this wipefs is called. Have we made sure that if this disk is part of volume group, we have removed volume group first.

Comment 10 Vivek Goyal 2018-04-27 19:13:38 UTC

CCed lvm team on the bug. I am pretty sure that it is some kind of strange race/interaction with lvmmetad. Just that can't pin point that race yet.

From container-storage-setup point of view, we look at device /dev/vdb, create partition and then try to do "wipefs -a /dev/vdb1". At that time I think wipefs is complaining that it could not open device exclusively. I suspect that now lemetad is looking at it.

INFO: Device node /dev/vdb1 exists
Can't open /dev/vdb1 exclusively.  Mounted filesystem?
/dev/vdb1: 8 bytes were erased at offset 0x00000218 (LVM2_member): 4c 56 4d 32 20 30 30 31
..
..

Question is, why this race is happening now in 7.5. What has changed.

Comment 11 Mike Snitzer 2018-04-27 19:16:48 UTC

Difference I see is the "working" config is using "devicemapper" for the storage driver in /etc/sysconfig/docker-storage whereas the non-working default is apparently using overlayfs.

That could be coincidental but is it possible the overlayfs setup is getting ahead of itself and racing with lvm's setup of the underlying device overlayfs depends on?

Whereas there is no race like that with "devicemapper" because the thin-pool is created in terms of lvm2?

Just an observation, no idea if this is helpful ;)

Comment 12 Vivek Goyal 2018-04-27 19:34:08 UTC

Actually, they are using a config file like this.

STORAGE_DRIVER=overlay2
DEVS=/dev/vdb
CONTAINER_ROOT_LV_NAME=dockerlv
CONTAINER_ROOT_LV_MOUNT_PATH=/var/lib/docker
VG=ocp

This means use overlay2 driver for docker but also do following steps.

- Create a volume group named "ocp" with a disk "/dev/vdb"
- Create a logical volume named "dockerlv" from volume group "ocp"
- Mount logical volume "dockerlv" on path /var/lib/docker/


Looking at lvm messages.

- Looks like dockerlv logical volume first became active.

  Apr 26 16:12:43 ocp-apps2 lvm[14076]: 1 logical volume(s) in volume group "ocp" now active

- Then for some reason we seem to be scanning /dev/vdb1 again and don't find PV signature. May be ansible script wiped all sigunatures on /dev/vdb while /dev/vdb was still part of volume group and that led to all the problem?

Apr 26 16:12:43 ocp-apps2 systemd[1]: Started LVM2 PV scan on device 253:17.
Apr 26 16:12:43 ocp-apps2 systemd[1]: Started /usr/sbin/lvm pvscan --cache 253:17.
Apr 26 16:12:43 ocp-apps2 systemd[1]: Starting /usr/sbin/lvm pvscan --cache 253:17...
Apr 26 16:12:43 ocp-apps2 lvm[14104]: No PV label found on /dev/vdb1.
Apr 26 16:12:43 ocp-apps2 lvmetad[11672]: vg_lookup vgid 8avUed-Tksy-6Zdv-BhP2-ROUb-Zmx7-6w2RkY name ocp found incomplete mapping uuid none name none

Comment 13 Ilkka Tengvall 2018-04-27 19:36:20 UTC

It must be some race, as there was at some point log:

---
-- Unit lvm2-pvscan@253:17.service has finished shutting down.
Apr 26 16:12:41 ocp-master lvmetad[11562]: vg_lookup vgid HBf0Fp-d4Ah-wuUH-iWLi-bu68-AI8z-ugA7B8 name ocp found incomplete mapping uuid none name none

Apr 26 16:12:41 ocp-master kernel: device-mapper: ioctl: 4.37.0-ioctl (2017-09-20) initialised: dm-devel
Apr 26 16:12:41 ocp-master kernel:  vdb: vdb1
Apr 26 16:12:41 ocp-master systemd[1]: Requested transaction contradicts existing jobs: Resource deadlock avoided
---

 I set you up with remote access. I'll add next the connection details to faulty machine in private comment.

Comment 15 Ilkka Tengvall 2018-04-27 19:42:50 UTC

It definately looks exactly like there has been the tags wiped after the pvcreate.

Note, I only added this option to storage config yesterday, it happened even without it. So only Ansible wiping it.:

WIPE_SIGNATURES=true

Comment 16 Ilkka Tengvall 2018-04-27 19:49:01 UTC

And little bit more about the boot stages.

1. The machine is RHEL 7.5 qcow image dumped to LVM on vda
2. empty LV is created from hosts
3. VM get's booted, booting from vda with fresh RHEL7.5
4. Cloud-init sets up the networking
5. Ansible logs in and starts configuring the system.
6. Ansible install packages, among the others docker-1.13
7. Ansible has command to wipe signatures on vdb
8. Ansible sets up file docker-storage-setup
9. Ansible runs docker-storage-setup
-> Failure occationally. Note that many VMs are setup fine with no problems

So some race, depending on timing. At least half of the machines do work, and the faulty ones are random, not the same always.

All LVMs are deleted during machine deletion, and the new ones to create new VMs are freshly created with lvcreate, so there should never be partition on the vdc at boot time.

Comment 17 Vivek Goyal 2018-04-27 19:50:26 UTC

On that system lsblk output looks as follows.

vdb              253:16   0   60G  0 disk 
└─vdb1           253:17   0   60G  0 part 
  └─ocp-dockerlv 252:0    0   24G  0 lvm  

So 253:17 is /dev/vdb1 which lvmetad is scanning.

strange thing is that there is no volume group "ocp" listed in "vgs" output. "lvs" does not list anything either.

Comment 18 Vivek Goyal 2018-04-27 19:51:28 UTC

Ilkka, What happens if you remove wipe signature logic from ansible. And only use WIPE_SIGNATURES=true in /etc/sysconfig/docker-storage-setup.

Comment 19 Ilkka Tengvall 2018-04-27 19:54:24 UTC

I can do that. But notice, the remote connection for you will be gone now that I do it.

Comment 20 Vivek Goyal 2018-04-27 20:11:06 UTC

(In reply to Ilkka Tengvall from comment #19)
> I can do that. But notice, the remote connection for you will be gone now
> that I do it.

That's fine. Once you are done testing, setup remote connection for me again. system definitely seems in inconsistent state. Only thing now is to figure out how did we end up here.

I suspect that "wipefs" in ansible has something to do with it.

Comment 21 Ilkka Tengvall 2018-04-27 20:34:02 UTC

done. different host fails this time. Change the port in ssh command to be 2210. So the wipefs remove did not help.

Comment 22 Ilkka Tengvall 2018-04-27 20:39:03 UTC

BTW, take a note of this deadlock line again:

Apr 27 23:26:09 ocp-master kernel: device-mapper: uevent: version 1.0.3
Apr 27 23:26:09 ocp-master kernel: device-mapper: ioctl: 4.37.0-ioctl (2017-09-20) initialised: dm-devel
Apr 27 23:26:09 ocp-master kernel:  vdb: vdb1
Apr 27 23:26:09 ocp-master systemd[1]: Requested transaction contradicts existing jobs: Resource deadlock avoided
Apr 27 23:26:09 ocp-master lvm[13951]: 1 logical volume(s) in volume group "ocp" now active
Apr 27 23:26:09 ocp-master systemd[1]: Stopped LVM2 PV scan on device 253:17.
-- Subject: Unit lvm2-pvscan@253:17.service has finished shutting down

Comment 23 Vivek Goyal 2018-04-27 20:53:37 UTC

Hm..., above messages from systemd are strange. I am not sure what exactly do they mean.

One more thing I noticed though is that we are calling docker-storage-setup directly from ansible. I think starting "docker" service should be enough and that will automatically start "docker-storage-setup" service.

Given we are not starting docker-storage-setup as a service, that's why probably no message from it are going into journal.

So can you get rid of direct call to "docker-storage-setup" and just rely on starting docker service and see if problem still happens.

Comment 24 Ilkka Tengvall 2018-04-27 21:02:49 UTC

No, that's how it's been istructed to do in OCP installs. You write the storage config, and run the setup. Yes, docker would run it anyway, but this is just normal procedure.

Comment 25 Vivek Goyal 2018-04-27 21:11:33 UTC

That's fine. Just run it for now to see if that fixes the issue. I am trying to isolate the issue a bit.

If you really want to setup storage first, can you do "systemctl start docker-storage-setup" instead of invoking docker-storage-setup directly.

I see that docker-storage-setup.service runs after cloud-init.service has finished. I am wondering if that has any affect on this.

So start docker-storage-setup as a service and not directly using command line.

Comment 26 Vivek Goyal 2018-04-27 21:12:32 UTC

This should also send docker-storage-setup messages in journal and that will help too in debugging problems later.

Comment 27 Ilkka Tengvall 2018-04-28 06:23:03 UTC

I set it up to call systemctl start docker-storage-setup instead of calling the command directly. Result is the same, again random two machines not working. You can use the ssh port 2230 to get to the machine to debug.

I believe Mike's guess is the best now, to look what overlayfs2 does differently. I don't believe changing the way it get's called changes the result, it's something within then internal logic of docker-storage-setup. But it's good to have now the logs in one place:


Apr 28 09:04:35 ocp-apps1 container-storage-setup[13899]: INFO: Volume group backing root filesystem could not be determined
Apr 28 09:04:36 ocp-apps1 systemd-udevd[420]: Network interface NamePolicy= disabled on kernel command line, ignoring.
Apr 28 09:04:36 ocp-apps1 systemd[1]: Created slice system-lvm2\x2dpvscan.slice.
-- Subject: Unit system-lvm2\x2dpvscan.slice has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit system-lvm2\x2dpvscan.slice has finished starting up.
-- 
-- The start-up result is done.
Apr 28 09:04:36 ocp-apps1 systemd[1]: Starting system-lvm2\x2dpvscan.slice.
-- Subject: Unit system-lvm2\x2dpvscan.slice has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit system-lvm2\x2dpvscan.slice has begun starting up.
Apr 28 09:04:36 ocp-apps1 systemd[1]: Starting LVM2 PV scan on device 253:17...
-- Subject: Unit lvm2-pvscan@253:17.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit lvm2-pvscan@253:17.service has begun starting up.
Apr 28 09:04:36 ocp-apps1 kernel: device-mapper: uevent: version 1.0.3
Apr 28 09:04:36 ocp-apps1 kernel: device-mapper: ioctl: 4.37.0-ioctl (2017-09-20) initialised: dm-devel
Apr 28 09:04:36 ocp-apps1 lvm[13933]: 1 logical volume(s) in volume group "ocp" now active
Apr 28 09:04:36 ocp-apps1 systemd[1]: Started LVM2 PV scan on device 253:17.
-- Subject: Unit lvm2-pvscan@253:17.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit lvm2-pvscan@253:17.service has finished starting up.
-- 
-- The start-up result is done.
Apr 28 09:04:36 ocp-apps1 container-storage-setup[13899]: INFO: Device node /dev/vdb1 exists.
Apr 28 09:04:36 ocp-apps1 container-storage-setup[13899]: /dev/vdb1: 8 bytes were erased at offset 0x00000218 (LVM2_member): 4c 56 4d 32 20 30 30 31
Apr 28 09:04:36 ocp-apps1 systemd[1]: Started /usr/sbin/lvm pvscan --cache 253:17.
-- Subject: Unit run-13960.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedeskt
-- Unit run-13960.service has finished starting up.
-- 
-- The start-up result is done.
Apr 28 09:04:36 ocp-apps1 systemd[1]: Starting /usr/sbin/lvm pvscan --cache 253:17...
-- Subject: Unit run-13960.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit run-13960.service has begun starting up.
Apr 28 09:04:37 ocp-apps1 lvm[13962]: No PV label found on /dev/vdb1.
Apr 28 09:04:37 ocp-apps1 container-storage-setup[13899]: Can't open /dev/vdb1 exclusively.  Mounted filesystem?
Apr 28 09:04:37 ocp-apps1 systemd[1]: docker-storage-setup.service: main process exited, code=exited, status=5/NOTINSTALLED
Apr 28 09:04:37 ocp-apps1 systemd[1]: Failed to start Docker Storage Setup.
-- Subject: Unit docker-storage-setup.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit docker-storage-setup.service has failed.
-- 
-- The result is failed.
Apr 28 09:04:37 ocp-apps1 systemd[1]: Unit docker-storage-setup.service entered failed state.
Apr 28 09:04:37 ocp-apps1 systemd[1]: docker-storage-setup.service failed.
Apr 28 09:05:16 ocp-apps1 sshd[13977]: Connection closed by 192.168.122.1 port 24924 [preauth]
Apr 28 09:05:34 ocp-apps1 sshd[13980]: Connection closed by 192.168.122.1 port 63031 [preauth]
Apr 28 09:05:37 ocp-apps1 sshd[11258]: Received disconnect from 192.168.122.1 port 49482:11: disconnected by user
Apr 28 09:05:37 ocp-apps1 sshd[11258]: Disconnected from 192.168.122.1 port 49482
Apr 28 09:05:37 ocp-apps1 sshd[11255]: pam_unix(sshd:session): session closed for user cloud-user
Apr 28 09:05:37 ocp-apps1 systemd-logind[575]: Removed session 2.
-- Subject: Session 2 has been terminated
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- Documentation: http://www.freedesktop.org/wiki/Software/systemd/multiseat
--

Comment 28 Vivek Goyal 2018-04-30 19:46:54 UTC

I think docker-storage-setup and lvm pv scan logic is racing. docker-storage-setup thinks that disk/partition is not in use by lvm and disk is not part of the volume group. So it tries to create partition, wipe signatures etc and thats seems to be racing with lvm-pvscan logic.

So is this a new node which is being provisioned? Or it is a provisioned node which where /dev/vdb is already in ocp volume group and this node is simply rebooting and then this happens?

Comment 29 Ilkka Tengvall 2018-05-01 08:12:02 UTC

It is a new node, and new fresh disk. The linked ansible playbook creates it from scratch.

Comment 30 Vivek Goyal 2018-05-01 15:33:43 UTC

Looking at container storage setup. It seems to fail at "pvcreate" where pvcreate seems to fail as device is opened by somebody else (may be lvmetad).

    if ! wipefs -f -a ${part}; then
      Fatal "Failed to wipe signatures on device ${part}"
    fi
    pvcreate ${part}


So we first wipe signatures on partition /dev/vdb1 and then do "pvcreate /dev/vdb1". I think pvcreate is failing.

So there are two odd things.

- If this is a new disk with first time boot, why do logs say that activated 1 logical volume in volume group ocp. That means somehow lvm found signatures on /dev/vdb1 already and activated volume. Is it vestige of some other VM.

- How to make sure that udev rules have settled down by the time "pvcreate" is called. May be another call to "udevadm settle" after wipefs and before pvcreate.

Ilkka, if I give you a test patch to container-storage-setup, are you in a position to test it somehow.

Comment 31 Vivek Goyal 2018-05-01 20:23:57 UTC

I have been trying to reproduce it but have not been successful so far. Few observations.

- Problem seems to be happen when disk /dev/vdb already had PVs and it was part of a volume group. Looks like this disk was used by docker-storage-setup to setup container storage and now same disk has been reprovisioned in a different VM. so when this disk is added in VM, lvm scans disk, auto activates VGs and logical volume.

- In the initial report, ansible was calling "wipefs -a -f". I think that's part of the problem and will run into issues. If there is already a disk with lvm volume in it, it needs to be cleaned properly. I will let lvm folks comment on what's the best way to clean it up.

- Once ansible call to "wipefs -a -f" was removed, for some reason docker-storage-setup still does not recognize that this disk is part of volume group already and continues with wipefs. That seems to be the second problem.

Anyway, I think if we clean disk before it is added to VM, it probably will solve the issue.

Comment 32 Vivek Goyal 2018-05-02 13:22:37 UTC

Ok, after some tweaking, I think I figured a way out to reproduce the issue. So core of the problem is that this disk was being used with some other VM and has lvm signature on /dev/vdb1. Before disk was given to docker-storage-setup, /dev/vdb1 was removed and "wipefs -a /dev/vdb" was done. That means, disk will seem clean to docker-storage-setup and it will take ownership of disk and start setting up things on it.


First thing it does is create a partition, and clean any signatures on partition. The moment a partition is created, previous lvm signatures on /dev/vdb1 become visible and udev rules kick in and activate that volume. Now container-storage-setup wipes signatures on /dev/vdb1 and after all hell breaks loose. 

This does not happen all the time as it is a race. After partition creation we
call "udevadm settle", If udevadm settle finishes fairly fast, then we quickly
wipe signatures on partition and lvm volume never becomes active. But sometimes "udevadm settle" takes much longer and it gives enough time for volume go become active. And later "wipefs -a /dev/vdb1" kicks in and all bad things happen.

I had to add a delay of few seconds after partition creation to reproduce this problem and not it reproduces reliably.

Comment 33 Ilkka Tengvall 2018-05-02 13:57:05 UTC

Great that you got process with it. Note about disk being used: It can be, but it is quite a co-incidence. The ansibles here have do_guests.yml, which on the host creates empty LVs, which are later used as a disk for VM. So in theory, it's always a fresh disk with random data on the block. Then there is nuke_guests.yml, which deletes the LVs after VM is deleted. So with quite a luck on the next do_guests.yml run, LVM might use exact same places from disk for LV, which might contain the partition.

To test that, I should add wipefs for the LV device on host system before the VM get's created. Or right after it's nuked.

I'll do that when I have the time. Hopefully tomorrow.

Comment 34 Colin Walters 2018-05-02 14:53:17 UTC

Wiping the disk *before* plugging it is clearly a good idea.

But I think LVM should have a command to run *before* `wipefs` that removes all PVs on the disk and drops them from the kernel state.  (Also that command should wait for udev to settle probably, otherwise there can still be races)

Basically there's nothing new in this bug specific to Docker or whatever; running `wipefs` or performing other raw partitioning changes while LVM is active is always asking for bugs.

You could probably script this today with `lvm pvs --noheadings | awk '{ print $1 }' ...` etc.

Comment 35 Vivek Goyal 2018-05-02 16:47:37 UTC

I am thinking of zeroing first few MB of disk before creating partition. That way any LVM signatures at offset 1MB/2MB will get wiped out and when I create partition that should not be problem.

Also talked to lvm folks and they now seem to be of the opinion that we probably can get rid of logic creating partition on disk before adding that to volume group. That will simplify life. I will track that work in a separate bz.

Comment 36 Vivek Goyal 2018-05-02 17:59:38 UTC

I have created a PR for this issue. I think it should solve it.

https://github.com/projectatomic/container-storage-setup/pull/269

Comment 37 Vivek Goyal 2018-05-02 18:00:10 UTC

Ilkka, any chance you can test this PR in your setup and see if problem you face is resolved or not.

Comment 38 Ilkka Tengvall 2018-05-03 16:27:43 UTC

Hi, I added similar stuff to my ansible to test the logic. You are right. I added this:

- name: Wipe ghost PV from docker disk
  command: "pvremove -fy /dev/vdb1"
  ignore_errors: yes
  when: docker_storage_test.rc != 0

- name: Wipe signatures from docker disk
  command: "wipefs --all --force /dev/vdb"
  when: docker_storage_test.rc != 0

- name: Wipe the beginning of the disk from docker disk
  command: "dd if=/dev/zero of=/dev/vdb bs=1M count=4"
  when: docker_storage_test.rc != 0


And run it twice. Both times, there were some disks that got successfull runs of deleting "pvremove -fy /dev/vdb1", and installations went successfully through after that. So successfull run of that pvremove meant that there were actually those PV pre-existing, and just having wipefs wasn't enough.

I agree with the LVM guys, having the partition there has always made me wonder why. I can't see any reason of having partition table there, it's legacy. But good if you have that in other ticket already.

Great that you spotted it!

Comment 39 Tom Sweeney 2020-06-09 21:41:36 UTC

We have no plans to ship another version of Docker at this time. RHEL7 is in final support stages where only security fixes will get released.  Customers should move to use Podman which is available starting in RHEL 7.6.