Bug 1572266
Summary: | docker-storage-setup fails, creates incomplete lvm pv | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Ilkka Tengvall <ikke> |
Component: | docker | Assignee: | Vivek Goyal <vgoyal> |
Status: | CLOSED WONTFIX | QA Contact: | atomic-bugs <atomic-bugs> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 7.5 | CC: | agk, amurdaca, fche, ikke, lsm5, lvm-team, myllynen, pasik, smakarov, tsweeney, walters, zkabelac |
Target Milestone: | rc | Keywords: | Extras |
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-06-09 21:41:36 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Ilkka Tengvall
2018-04-26 14:48:33 UTC
So what happens when you run docker-storage-setup? Does it fail? Can you give status of "docker-storage-setup" service and any messages it has outputted. What do you see in /etc/sysconfig/docker-storage file? I have seen the same problem on RHEL 7.4 when trying to use Marko Myllynen's openshift-automation-tools (https://github.com/myllynen/openshift-automation-tools). It is much more intermittent, with docker-storage-setup on the VMs failing roughly every 1 in 3 or 1 in 4 times I run the playbooks. Below are results from my latest run which had 4 VMs fail (including docker-storage-setup output), the failing VM I checked is consistent with what Ilkka sees on RHEL 7.5: PLAY RECAP ***************************************************************************** infra01.tofu.yyz.redhat.com : ok=34 changed=16 unreachable=0 failed=0 infra02.tofu.yyz.redhat.com : ok=29 changed=11 unreachable=0 failed=1 master01.tofu.yyz.redhat.com : ok=29 changed=11 unreachable=0 failed=1 nfs01.tofu.yyz.redhat.com : ok=24 changed=7 unreachable=0 failed=0 node01.tofu.yyz.redhat.com : ok=34 changed=16 unreachable=0 failed=0 node02.tofu.yyz.redhat.com : ok=29 changed=11 unreachable=0 failed=1 node03.tofu.yyz.redhat.com : ok=29 changed=11 unreachable=0 failed=1 /* WORKING ONE */ $ ssh infra01.tofu.yyz.redhat.com [root@infra01 ~]# vgs VG #PV #LV #SN Attr VSize VFree docker-vg 1 1 0 wz--n- <48.00g 28.80g vg00 1 1 0 wz--n- <39.30g 0 [root@infra01 ~]# lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert docker-pool docker-vg twi-a-t--- 19.09g 0.00 0.08 lv_root vg00 -wi-ao---- <39.30g [root@infra01 ~]# vgs -a VG #PV #LV #SN Attr VSize VFree docker-vg 1 1 0 wz--n- <48.00g 28.80g vg00 1 1 0 wz--n- <39.30g 0 [root@infra01 ~]# pvs -a PV VG Fmt Attr PSize PFree /dev/vda1 --- 0 0 /dev/vda2 vg00 lvm2 a-- <39.30g 0 /dev/vdb1 docker-vg lvm2 a-- <48.00g 28.80g /dev/vg00/lv_root --- 0 0 [root@infra01 ~]# cat /etc/sysconfig/docker-storage-setup # Ansible managed # See docker-storage-setup(1) WIPE_SIGNATURES=true DEVS=/dev/vdb VG=docker-vg [root@infra01 ~]# cat /etc/sysconfig/docker-storage DOCKER_STORAGE_OPTIONS="--storage-driver devicemapper --storage-opt dm.fs=xfs --storage-opt dm.thinpooldev=/dev/mapper/docker--vg-docker--pool --storage-opt dm.use_deferred_removal=true --storage-opt dm.use_deferred_deletion=true " [root@infra01 ~]# blkid /dev/mapper/vg00-lv_root: UUID="76a3baf2-8d71-46bf-a34c-c5e2a1050194" TYPE="xfs" /dev/vda2: UUID="d1ya9l-j0fu-fFQe-TkYH-Bur0-Clqb-75pKjj" TYPE="LVM2_member" /dev/vda1: UUID="8e3e33d5-5cd5-4cf3-a547-370c3744ff5a" TYPE="xfs" /dev/vdb1: UUID="qCNFtB-PS8C-TL1F-0AhD-shti-MeXK-eSlBJC" TYPE="LVM2_member" [root@infra01 ~]# fdisk -l /dev/vdb Disk /dev/vdb: 51.5 GB, 51539607552 bytes, 100663296 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk label type: dos Disk identifier: 0x000b117a Device Boot Start End Blocks Id System /dev/vdb1 2048 100663295 50330624 8e Linux LVM /* NOW THE NOT-WORKING ONE */ $ ssh infra02.tofu.yyz.redhat.com [root@infra02 ~]# vgs VG #PV #LV #SN Attr VSize VFree vg00 1 1 0 wz--n- <39.30g 0 [root@infra02 ~]# lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert lv_root vg00 -wi-ao---- <39.30g [root@infra02 ~]# vgs -a VG #PV #LV #SN Attr VSize VFree vg00 1 1 0 wz--n- <39.30g 0 [root@infra02 ~]# pvs -a PV VG Fmt Attr PSize PFree /dev/vda1 --- 0 0 /dev/vda2 vg00 lvm2 a-- <39.30g 0 /dev/vdb1 --- 0 0 /dev/vg00/lv_root --- 0 0 [root@infra02 ~]# cat /etc/sysconfig/docker-storage-setup # Ansible managed # See docker-storage-setup(1) WIPE_SIGNATURES=true DEVS=/dev/vdb VG=docker-vg [root@infra02 ~]# cat /etc/sysconfig/docker-storage # This file may be automatically generated by an installation program. # Please DO NOT edit this file directly. Instead edit # /etc/sysconfig/docker-storage-setup and/or refer to # "man docker-storage-setup". # By default, Docker uses a loopback-mounted sparse file in # /var/lib/docker. The loopback makes it slower, and there are some # restrictive defaults, such as 100GB max storage. DOCKER_STORAGE_OPTIONS= [root@infra02 ~]# blkid /dev/mapper/vg00-lv_root: UUID="1e751a48-3270-473a-a464-7db6bf1e69fd" TYPE="xfs" /dev/vda2: UUID="wvCIvh-tEN9-NtMI-dfZ5-1TDl-xGcW-XSv67w" TYPE="LVM2_member" /dev/vda1: UUID="779d3056-af4e-4303-8fea-c851bb3d8cbe" TYPE="xfs" [root@infra02 ~]# fdisk -l /dev/vdb Disk /dev/vdb: 51.5 GB, 51539607552 bytes, 100663296 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk label type: dos Disk identifier: 0x0006e84c Device Boot Start End Blocks Id System /dev/vdb1 2048 100663295 50330624 8e Linux LVM [root@infra02 ~]# docker-storage-setup INFO: Wipe Signatures is set to true. Any signatures on /dev/vdb will be wiped. wipefs: error: /dev/vdb: probing initialization failed: Device or resource busy ERROR: Failed to wipe signatures on device /dev/vdb Vivek, the docker-storage and docker-storage-setup contents are in my first post, please re-read. The docker-storage-setup command fails like this in ansible: ---------------------- TASK [ocp_node : Apply Docker storage configuration changes] ************************************************************************************* fatal: [ocp-apps2]: FAILED! => {"changed": true, "cmd": ["docker-storage-setup"], "delta": "0:00:01.411891", "end": "2018-04-26 18:33:05.915931", "msg": "non-zero return code", "rc": 5, "start": "2018-04-26 18:33:04.504040", "stderr": "INFO: Volume group backing root filesystem could not be determined\nINFO: Device node /dev/vdb1 exists.\n Can't open /dev/vdb1 exclusively. Mounted filesystem?", "stderr_lines": ["INFO: Volume group backing root filesystem could not be determined", "INFO: Device node /dev/vdb1 exists.", " Can't open /dev/vdb1 exclusively. Mounted filesystem?"], "stdout": "/dev/vdb1: 8 bytes were erased at offset 0x00000218 (LVM2_member): 4c 56 4d 32 20 30 30 31", "stdout_lines": ["/dev/vdb1: 8 bytes were erased at offset 0x00000218 (LVM2_member): 4c 56 4d 32 20 30 30 31"]} fatal: [ocp-apps1]: FAILED! => {"changed": true, "cmd": ["docker-storage-setup"], "delta": "0:00:01.741506", "end": "2018-04-26 18:33:06.202183", "msg": "non-zero return code", "rc": 5, "start": "2018-04-26 18:33:04.460677", "stderr": "INFO: Volume group backing root filesystem could not be determined\nINFO: Device node /dev/vdb1 exists.\n Can't open /dev/vdb1 exclusively. Mounted filesystem?", "stderr_lines": ["INFO: Volume group backing root filesystem could not be determined", "INFO: Device node /dev/vdb1 exists.", " Can't open /dev/vdb1 exclusively. Mounted filesystem?"], "stdout": "/dev/vdb1: 8 bytes were erased at offset 0x00000218 (LVM2_member): 4c 56 4d 32 20 30 30 31", "stdout_lines": ["/dev/vdb1: 8 bytes were erased at offset 0x00000218 (LVM2_member): 4c 56 4d 32 20 30 30 31"]} fatal: [ocp-infra]: FAILED! => {"changed": true, "cmd": ["docker-storage-setup"], "delta": "0:00:01.850122", "end": "2018-04-26 18:33:06.246404", "msg": "non-zero return code", "rc": 5, "start": "2018-04-26 18:33:04.396282", "stderr": "INFO: Volume group backing root filesystem could not be determined\nINFO: Device node /dev/vdb1 exists.\n Can't open /dev/vdb1 exclusively. Mounted filesystem?", "stderr_lines": ["INFO: Volume group backing root filesystem could not be determined", "INFO: Device node /dev/vdb1 exists.", " Can't open /dev/vdb1 exclusively. Mounted filesystem?"], "stdout": "/dev/vdb1: 8 bytes were erased at offset 0x00000218 (LVM2_member): 4c 56 4d 32 20 30 30 31", "stdout_lines": ["/dev/vdb1: 8 bytes were erased at offset 0x00000218 (LVM2_member): 4c 56 4d 32 20 30 30 31"]} changed: [ocp-master] -------------------------------- So you see it just fails. This is what the file looks like after succesful run: ---- $ cat /etc/sysconfig/docker-storage DOCKER_STORAGE_OPTIONS="--storage-driver overlay2 " ---- where as the driver is not set in faulty one like you seen in the first post. $ journalctl -xu docker-storage-setup -- No entries -- Ilkka, Can you also provide output of following. - container-storage-setup --version - rpm -aq | grep container-storage-setup - cat /usr/share/container-storage-setup/container-storage-setup | grep STORAGE_DRIVER Can I have access to node where this problem is being seen/preproduced. Looking at the messages, it looks like some kind of race between container-storage-setup and what lvm is doing. I have few more questions. I see following in ansible script. What does it mean and what it is supposed to do. - name: Detect Docker storage configuration status command: grep -q overlay2 /etc/sysconfig/docker-storage register: docker_storage_test changed_when: false failed_when: false - name: Wipe signatures from docker disk command: "wipefs --all --force /dev/vdb" when: docker_storage_test.rc != 0 What's the status of storage when this wipefs is called. Have we made sure that if this disk is part of volume group, we have removed volume group first. CCed lvm team on the bug. I am pretty sure that it is some kind of strange race/interaction with lvmmetad. Just that can't pin point that race yet. From container-storage-setup point of view, we look at device /dev/vdb, create partition and then try to do "wipefs -a /dev/vdb1". At that time I think wipefs is complaining that it could not open device exclusively. I suspect that now lemetad is looking at it. INFO: Device node /dev/vdb1 exists Can't open /dev/vdb1 exclusively. Mounted filesystem? /dev/vdb1: 8 bytes were erased at offset 0x00000218 (LVM2_member): 4c 56 4d 32 20 30 30 31 .. .. Question is, why this race is happening now in 7.5. What has changed. Difference I see is the "working" config is using "devicemapper" for the storage driver in /etc/sysconfig/docker-storage whereas the non-working default is apparently using overlayfs. That could be coincidental but is it possible the overlayfs setup is getting ahead of itself and racing with lvm's setup of the underlying device overlayfs depends on? Whereas there is no race like that with "devicemapper" because the thin-pool is created in terms of lvm2? Just an observation, no idea if this is helpful ;) Actually, they are using a config file like this. STORAGE_DRIVER=overlay2 DEVS=/dev/vdb CONTAINER_ROOT_LV_NAME=dockerlv CONTAINER_ROOT_LV_MOUNT_PATH=/var/lib/docker VG=ocp This means use overlay2 driver for docker but also do following steps. - Create a volume group named "ocp" with a disk "/dev/vdb" - Create a logical volume named "dockerlv" from volume group "ocp" - Mount logical volume "dockerlv" on path /var/lib/docker/ Looking at lvm messages. - Looks like dockerlv logical volume first became active. Apr 26 16:12:43 ocp-apps2 lvm[14076]: 1 logical volume(s) in volume group "ocp" now active - Then for some reason we seem to be scanning /dev/vdb1 again and don't find PV signature. May be ansible script wiped all sigunatures on /dev/vdb while /dev/vdb was still part of volume group and that led to all the problem? Apr 26 16:12:43 ocp-apps2 systemd[1]: Started LVM2 PV scan on device 253:17. Apr 26 16:12:43 ocp-apps2 systemd[1]: Started /usr/sbin/lvm pvscan --cache 253:17. Apr 26 16:12:43 ocp-apps2 systemd[1]: Starting /usr/sbin/lvm pvscan --cache 253:17... Apr 26 16:12:43 ocp-apps2 lvm[14104]: No PV label found on /dev/vdb1. Apr 26 16:12:43 ocp-apps2 lvmetad[11672]: vg_lookup vgid 8avUed-Tksy-6Zdv-BhP2-ROUb-Zmx7-6w2RkY name ocp found incomplete mapping uuid none name none It must be some race, as there was at some point log: --- -- Unit lvm2-pvscan@253:17.service has finished shutting down. Apr 26 16:12:41 ocp-master lvmetad[11562]: vg_lookup vgid HBf0Fp-d4Ah-wuUH-iWLi-bu68-AI8z-ugA7B8 name ocp found incomplete mapping uuid none name none Apr 26 16:12:41 ocp-master kernel: device-mapper: ioctl: 4.37.0-ioctl (2017-09-20) initialised: dm-devel Apr 26 16:12:41 ocp-master kernel: vdb: vdb1 Apr 26 16:12:41 ocp-master systemd[1]: Requested transaction contradicts existing jobs: Resource deadlock avoided --- I set you up with remote access. I'll add next the connection details to faulty machine in private comment. It definately looks exactly like there has been the tags wiped after the pvcreate. Note, I only added this option to storage config yesterday, it happened even without it. So only Ansible wiping it.: WIPE_SIGNATURES=true And little bit more about the boot stages. 1. The machine is RHEL 7.5 qcow image dumped to LVM on vda 2. empty LV is created from hosts 3. VM get's booted, booting from vda with fresh RHEL7.5 4. Cloud-init sets up the networking 5. Ansible logs in and starts configuring the system. 6. Ansible install packages, among the others docker-1.13 7. Ansible has command to wipe signatures on vdb 8. Ansible sets up file docker-storage-setup 9. Ansible runs docker-storage-setup -> Failure occationally. Note that many VMs are setup fine with no problems So some race, depending on timing. At least half of the machines do work, and the faulty ones are random, not the same always. All LVMs are deleted during machine deletion, and the new ones to create new VMs are freshly created with lvcreate, so there should never be partition on the vdc at boot time. On that system lsblk output looks as follows. vdb 253:16 0 60G 0 disk └─vdb1 253:17 0 60G 0 part └─ocp-dockerlv 252:0 0 24G 0 lvm So 253:17 is /dev/vdb1 which lvmetad is scanning. strange thing is that there is no volume group "ocp" listed in "vgs" output. "lvs" does not list anything either. Ilkka, What happens if you remove wipe signature logic from ansible. And only use WIPE_SIGNATURES=true in /etc/sysconfig/docker-storage-setup. I can do that. But notice, the remote connection for you will be gone now that I do it. (In reply to Ilkka Tengvall from comment #19) > I can do that. But notice, the remote connection for you will be gone now > that I do it. That's fine. Once you are done testing, setup remote connection for me again. system definitely seems in inconsistent state. Only thing now is to figure out how did we end up here. I suspect that "wipefs" in ansible has something to do with it. done. different host fails this time. Change the port in ssh command to be 2210. So the wipefs remove did not help. BTW, take a note of this deadlock line again: Apr 27 23:26:09 ocp-master kernel: device-mapper: uevent: version 1.0.3 Apr 27 23:26:09 ocp-master kernel: device-mapper: ioctl: 4.37.0-ioctl (2017-09-20) initialised: dm-devel Apr 27 23:26:09 ocp-master kernel: vdb: vdb1 Apr 27 23:26:09 ocp-master systemd[1]: Requested transaction contradicts existing jobs: Resource deadlock avoided Apr 27 23:26:09 ocp-master lvm[13951]: 1 logical volume(s) in volume group "ocp" now active Apr 27 23:26:09 ocp-master systemd[1]: Stopped LVM2 PV scan on device 253:17. -- Subject: Unit lvm2-pvscan@253:17.service has finished shutting down Hm..., above messages from systemd are strange. I am not sure what exactly do they mean. One more thing I noticed though is that we are calling docker-storage-setup directly from ansible. I think starting "docker" service should be enough and that will automatically start "docker-storage-setup" service. Given we are not starting docker-storage-setup as a service, that's why probably no message from it are going into journal. So can you get rid of direct call to "docker-storage-setup" and just rely on starting docker service and see if problem still happens. No, that's how it's been istructed to do in OCP installs. You write the storage config, and run the setup. Yes, docker would run it anyway, but this is just normal procedure. That's fine. Just run it for now to see if that fixes the issue. I am trying to isolate the issue a bit. If you really want to setup storage first, can you do "systemctl start docker-storage-setup" instead of invoking docker-storage-setup directly. I see that docker-storage-setup.service runs after cloud-init.service has finished. I am wondering if that has any affect on this. So start docker-storage-setup as a service and not directly using command line. This should also send docker-storage-setup messages in journal and that will help too in debugging problems later. I set it up to call systemctl start docker-storage-setup instead of calling the command directly. Result is the same, again random two machines not working. You can use the ssh port 2230 to get to the machine to debug. I believe Mike's guess is the best now, to look what overlayfs2 does differently. I don't believe changing the way it get's called changes the result, it's something within then internal logic of docker-storage-setup. But it's good to have now the logs in one place: Apr 28 09:04:35 ocp-apps1 container-storage-setup[13899]: INFO: Volume group backing root filesystem could not be determined Apr 28 09:04:36 ocp-apps1 systemd-udevd[420]: Network interface NamePolicy= disabled on kernel command line, ignoring. Apr 28 09:04:36 ocp-apps1 systemd[1]: Created slice system-lvm2\x2dpvscan.slice. -- Subject: Unit system-lvm2\x2dpvscan.slice has finished start-up -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit system-lvm2\x2dpvscan.slice has finished starting up. -- -- The start-up result is done. Apr 28 09:04:36 ocp-apps1 systemd[1]: Starting system-lvm2\x2dpvscan.slice. -- Subject: Unit system-lvm2\x2dpvscan.slice has begun start-up -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit system-lvm2\x2dpvscan.slice has begun starting up. Apr 28 09:04:36 ocp-apps1 systemd[1]: Starting LVM2 PV scan on device 253:17... -- Subject: Unit lvm2-pvscan@253:17.service has begun start-up -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit lvm2-pvscan@253:17.service has begun starting up. Apr 28 09:04:36 ocp-apps1 kernel: device-mapper: uevent: version 1.0.3 Apr 28 09:04:36 ocp-apps1 kernel: device-mapper: ioctl: 4.37.0-ioctl (2017-09-20) initialised: dm-devel Apr 28 09:04:36 ocp-apps1 lvm[13933]: 1 logical volume(s) in volume group "ocp" now active Apr 28 09:04:36 ocp-apps1 systemd[1]: Started LVM2 PV scan on device 253:17. -- Subject: Unit lvm2-pvscan@253:17.service has finished start-up -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit lvm2-pvscan@253:17.service has finished starting up. -- -- The start-up result is done. Apr 28 09:04:36 ocp-apps1 container-storage-setup[13899]: INFO: Device node /dev/vdb1 exists. Apr 28 09:04:36 ocp-apps1 container-storage-setup[13899]: /dev/vdb1: 8 bytes were erased at offset 0x00000218 (LVM2_member): 4c 56 4d 32 20 30 30 31 Apr 28 09:04:36 ocp-apps1 systemd[1]: Started /usr/sbin/lvm pvscan --cache 253:17. -- Subject: Unit run-13960.service has finished start-up -- Defined-By: systemd -- Support: http://lists.freedeskt -- Unit run-13960.service has finished starting up. -- -- The start-up result is done. Apr 28 09:04:36 ocp-apps1 systemd[1]: Starting /usr/sbin/lvm pvscan --cache 253:17... -- Subject: Unit run-13960.service has begun start-up -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit run-13960.service has begun starting up. Apr 28 09:04:37 ocp-apps1 lvm[13962]: No PV label found on /dev/vdb1. Apr 28 09:04:37 ocp-apps1 container-storage-setup[13899]: Can't open /dev/vdb1 exclusively. Mounted filesystem? Apr 28 09:04:37 ocp-apps1 systemd[1]: docker-storage-setup.service: main process exited, code=exited, status=5/NOTINSTALLED Apr 28 09:04:37 ocp-apps1 systemd[1]: Failed to start Docker Storage Setup. -- Subject: Unit docker-storage-setup.service has failed -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit docker-storage-setup.service has failed. -- -- The result is failed. Apr 28 09:04:37 ocp-apps1 systemd[1]: Unit docker-storage-setup.service entered failed state. Apr 28 09:04:37 ocp-apps1 systemd[1]: docker-storage-setup.service failed. Apr 28 09:05:16 ocp-apps1 sshd[13977]: Connection closed by 192.168.122.1 port 24924 [preauth] Apr 28 09:05:34 ocp-apps1 sshd[13980]: Connection closed by 192.168.122.1 port 63031 [preauth] Apr 28 09:05:37 ocp-apps1 sshd[11258]: Received disconnect from 192.168.122.1 port 49482:11: disconnected by user Apr 28 09:05:37 ocp-apps1 sshd[11258]: Disconnected from 192.168.122.1 port 49482 Apr 28 09:05:37 ocp-apps1 sshd[11255]: pam_unix(sshd:session): session closed for user cloud-user Apr 28 09:05:37 ocp-apps1 systemd-logind[575]: Removed session 2. -- Subject: Session 2 has been terminated -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- Documentation: http://www.freedesktop.org/wiki/Software/systemd/multiseat -- I think docker-storage-setup and lvm pv scan logic is racing. docker-storage-setup thinks that disk/partition is not in use by lvm and disk is not part of the volume group. So it tries to create partition, wipe signatures etc and thats seems to be racing with lvm-pvscan logic. So is this a new node which is being provisioned? Or it is a provisioned node which where /dev/vdb is already in ocp volume group and this node is simply rebooting and then this happens? It is a new node, and new fresh disk. The linked ansible playbook creates it from scratch. Looking at container storage setup. It seems to fail at "pvcreate" where pvcreate seems to fail as device is opened by somebody else (may be lvmetad). if ! wipefs -f -a ${part}; then Fatal "Failed to wipe signatures on device ${part}" fi pvcreate ${part} So we first wipe signatures on partition /dev/vdb1 and then do "pvcreate /dev/vdb1". I think pvcreate is failing. So there are two odd things. - If this is a new disk with first time boot, why do logs say that activated 1 logical volume in volume group ocp. That means somehow lvm found signatures on /dev/vdb1 already and activated volume. Is it vestige of some other VM. - How to make sure that udev rules have settled down by the time "pvcreate" is called. May be another call to "udevadm settle" after wipefs and before pvcreate. Ilkka, if I give you a test patch to container-storage-setup, are you in a position to test it somehow. I have been trying to reproduce it but have not been successful so far. Few observations. - Problem seems to be happen when disk /dev/vdb already had PVs and it was part of a volume group. Looks like this disk was used by docker-storage-setup to setup container storage and now same disk has been reprovisioned in a different VM. so when this disk is added in VM, lvm scans disk, auto activates VGs and logical volume. - In the initial report, ansible was calling "wipefs -a -f". I think that's part of the problem and will run into issues. If there is already a disk with lvm volume in it, it needs to be cleaned properly. I will let lvm folks comment on what's the best way to clean it up. - Once ansible call to "wipefs -a -f" was removed, for some reason docker-storage-setup still does not recognize that this disk is part of volume group already and continues with wipefs. That seems to be the second problem. Anyway, I think if we clean disk before it is added to VM, it probably will solve the issue. Ok, after some tweaking, I think I figured a way out to reproduce the issue. So core of the problem is that this disk was being used with some other VM and has lvm signature on /dev/vdb1. Before disk was given to docker-storage-setup, /dev/vdb1 was removed and "wipefs -a /dev/vdb" was done. That means, disk will seem clean to docker-storage-setup and it will take ownership of disk and start setting up things on it. First thing it does is create a partition, and clean any signatures on partition. The moment a partition is created, previous lvm signatures on /dev/vdb1 become visible and udev rules kick in and activate that volume. Now container-storage-setup wipes signatures on /dev/vdb1 and after all hell breaks loose. This does not happen all the time as it is a race. After partition creation we call "udevadm settle", If udevadm settle finishes fairly fast, then we quickly wipe signatures on partition and lvm volume never becomes active. But sometimes "udevadm settle" takes much longer and it gives enough time for volume go become active. And later "wipefs -a /dev/vdb1" kicks in and all bad things happen. I had to add a delay of few seconds after partition creation to reproduce this problem and not it reproduces reliably. Great that you got process with it. Note about disk being used: It can be, but it is quite a co-incidence. The ansibles here have do_guests.yml, which on the host creates empty LVs, which are later used as a disk for VM. So in theory, it's always a fresh disk with random data on the block. Then there is nuke_guests.yml, which deletes the LVs after VM is deleted. So with quite a luck on the next do_guests.yml run, LVM might use exact same places from disk for LV, which might contain the partition. To test that, I should add wipefs for the LV device on host system before the VM get's created. Or right after it's nuked. I'll do that when I have the time. Hopefully tomorrow. Wiping the disk *before* plugging it is clearly a good idea. But I think LVM should have a command to run *before* `wipefs` that removes all PVs on the disk and drops them from the kernel state. (Also that command should wait for udev to settle probably, otherwise there can still be races) Basically there's nothing new in this bug specific to Docker or whatever; running `wipefs` or performing other raw partitioning changes while LVM is active is always asking for bugs. You could probably script this today with `lvm pvs --noheadings | awk '{ print $1 }' ...` etc. I am thinking of zeroing first few MB of disk before creating partition. That way any LVM signatures at offset 1MB/2MB will get wiped out and when I create partition that should not be problem. Also talked to lvm folks and they now seem to be of the opinion that we probably can get rid of logic creating partition on disk before adding that to volume group. That will simplify life. I will track that work in a separate bz. I have created a PR for this issue. I think it should solve it. https://github.com/projectatomic/container-storage-setup/pull/269 Ilkka, any chance you can test this PR in your setup and see if problem you face is resolved or not. Hi, I added similar stuff to my ansible to test the logic. You are right. I added this: - name: Wipe ghost PV from docker disk command: "pvremove -fy /dev/vdb1" ignore_errors: yes when: docker_storage_test.rc != 0 - name: Wipe signatures from docker disk command: "wipefs --all --force /dev/vdb" when: docker_storage_test.rc != 0 - name: Wipe the beginning of the disk from docker disk command: "dd if=/dev/zero of=/dev/vdb bs=1M count=4" when: docker_storage_test.rc != 0 And run it twice. Both times, there were some disks that got successfull runs of deleting "pvremove -fy /dev/vdb1", and installations went successfully through after that. So successfull run of that pvremove meant that there were actually those PV pre-existing, and just having wipefs wasn't enough. I agree with the LVM guys, having the partition there has always made me wonder why. I can't see any reason of having partition table there, it's legacy. But good if you have that in other ticket already. Great that you spotted it! We have no plans to ship another version of Docker at this time. RHEL7 is in final support stages where only security fixes will get released. Customers should move to use Podman which is available starting in RHEL 7.6. |