Bug 1330290
Summary: | docker-storage-setup frequently fails when lvm2 not initially installed in system | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Erik M Jacobs <ejacobs> | ||||||||||
Component: | docker | Assignee: | Vivek Goyal <vgoyal> | ||||||||||
Status: | CLOSED WONTFIX | QA Contact: | atomic-bugs <atomic-bugs> | ||||||||||
Severity: | unspecified | Docs Contact: | |||||||||||
Priority: | unspecified | ||||||||||||
Version: | 7.2 | CC: | agk, danielepasquini, dwalsh, ejacobs, lsm5, prajnoha, teigland, tsweeney, vgoyal, zkabelac | ||||||||||
Target Milestone: | rc | Keywords: | Extras | ||||||||||
Target Release: | --- | ||||||||||||
Hardware: | Unspecified | ||||||||||||
OS: | Unspecified | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | |||||||||||||
: | 1330706 1330714 (view as bug list) | Environment: | |||||||||||
Last Closed: | 2020-06-09 21:11:09 UTC | Type: | Bug | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Bug Depends On: | |||||||||||||
Bug Blocks: | 1330706, 1330714 | ||||||||||||
Attachments: |
|
Description
Erik M Jacobs
2016-04-25 19:29:14 UTC
openshift-ansible will force installation of docker 1.8.2 So following is the real problem. Apr 25 12:15:56 ip-172-18-0-177.us-west-2.compute.internal docker-storage-setup[10943]: Device /dev/xvdb1 not found (or ignored by filtering). pvcreate decided to not convert device into physical volume. Error message is confusing. Initially we thought that for some reason /dev/xvdb1 is not available yet after creation. So I put in a patch to wait for device in loop after partition creation. But even with that docker-storage-setup fails sometime. So now we are sure that /dev/xvdb1 is present and that's not the error. Also, lvm2 is not installed in the image to begin with. docker has a dependency on lvm2 and it pulls it in when installed. After that user starts docker service. Are lvm2 services in proper shape after installation or some user intervention required? If lvmetad is being used, and if lvmetad has not been notified about /dev/xvdb1, then that would explain the error. There are different reasons why lvmetad might not be notified about that device. Once guess is that doing: 'rpm -i lvm2; pvcreate /dev/xvdb1' is missing some lvmetad service start or initialization. You should verify that lvmetad is running and manually run 'pvscan --cache' after rpm install. Or, you could disable lvmetad (use_lvmetad=0 in lvm.conf) since it may not be a benefit in this environment. I installed lvm2 after booting an rhel7.2 AMI in amazon and lvm2-lvmetad started running automatically. # systemctl status lvm2-lvmetad ● lvm2-lvmetad.service - LVM2 metadata daemon Loaded: loaded (/usr/lib/systemd/system/lvm2-lvmetad.service; disabled; vendor preset: enabled) Active: active (running) since Mon 2016-04-25 16:12:01 EDT; 18s ago Docs: man:lvmetad(8) Main PID: 9403 (lvmetad) CGroup: /system.slice/lvm2-lvmetad.service └─9403 /usr/sbin/lvmetad -f Created attachment 1150633 [details] docker log when using +x and pvscan after adding "set +x" and a "pvscan --cache" to the docker-storage-setup and using the latest master (https://github.com/projectatomic/docker-storage-setup) here is the log output. lvm2-libs-2.02.130-5.el7_2.1.x86_64 lvm2-2.02.130-5.el7_2.1.x86_64 Created attachment 1150634 [details]
tweaked docker-storage-setup
Removing "-vvv" from pvcreate and removing the pvscan I was able to get the error again: -- Logs begin at Mon 2016-04-25 18:14:36 EDT, end at Mon 2016-04-25 18:18:36 EDT. -- Apr 25 18:17:16 ip-172-18-0-149.ec2.internal systemd[1]: Starting Docker Storage Setup... Apr 25 18:17:16 ip-172-18-0-149.ec2.internal docker-storage-setup[10904]: INFO: Volume group backing root filesystem could not be determined Apr 25 18:17:16 ip-172-18-0-149.ec2.internal docker-storage-setup[10904]: Checking that no-one is using this disk right now ... Apr 25 18:17:16 ip-172-18-0-149.ec2.internal docker-storage-setup[10904]: OK Apr 25 18:17:16 ip-172-18-0-149.ec2.internal docker-storage-setup[10904]: sfdisk: /dev/xvdb: unrecognized partition table type Apr 25 18:17:16 ip-172-18-0-149.ec2.internal docker-storage-setup[10904]: sfdisk: No partitions found Apr 25 18:17:16 ip-172-18-0-149.ec2.internal docker-storage-setup[10904]: Disk /dev/xvdb: 3263 cylinders, 255 heads, 63 sectors/track Apr 25 18:17:16 ip-172-18-0-149.ec2.internal docker-storage-setup[10904]: Old situation: Apr 25 18:17:16 ip-172-18-0-149.ec2.internal docker-storage-setup[10904]: New situation: Apr 25 18:17:16 ip-172-18-0-149.ec2.internal docker-storage-setup[10904]: Units: sectors of 512 bytes, counting from 0 Apr 25 18:17:16 ip-172-18-0-149.ec2.internal docker-storage-setup[10904]: Device Boot Start End #sectors Id System Apr 25 18:17:16 ip-172-18-0-149.ec2.internal docker-storage-setup[10904]: /dev/xvdb1 2048 52428799 52426752 8e Linux LVM Apr 25 18:17:16 ip-172-18-0-149.ec2.internal docker-storage-setup[10904]: /dev/xvdb2 0 - 0 0 Empty Apr 25 18:17:16 ip-172-18-0-149.ec2.internal docker-storage-setup[10904]: /dev/xvdb3 0 - 0 0 Empty Apr 25 18:17:16 ip-172-18-0-149.ec2.internal docker-storage-setup[10904]: /dev/xvdb4 0 - 0 0 Empty Apr 25 18:17:16 ip-172-18-0-149.ec2.internal docker-storage-setup[10904]: Warning: partition 1 does not end at a cylinder boundary Apr 25 18:17:16 ip-172-18-0-149.ec2.internal docker-storage-setup[10904]: Warning: no primary partition is marked bootable (active) Apr 25 18:17:16 ip-172-18-0-149.ec2.internal docker-storage-setup[10904]: This does not matter for LILO, but the DOS MBR will not boot this disk. Apr 25 18:17:16 ip-172-18-0-149.ec2.internal docker-storage-setup[10904]: Successfully wrote the new partition table Apr 25 18:17:16 ip-172-18-0-149.ec2.internal docker-storage-setup[10904]: Re-reading the partition table ... Apr 25 18:17:16 ip-172-18-0-149.ec2.internal docker-storage-setup[10904]: If you created or changed a DOS partition, /dev/foo7, say, then use dd(1) Apr 25 18:17:16 ip-172-18-0-149.ec2.internal docker-storage-setup[10904]: to zero the first 512 bytes: dd if=/dev/zero of=/dev/foo7 bs=512 count=1 Apr 25 18:17:16 ip-172-18-0-149.ec2.internal docker-storage-setup[10904]: (See fdisk(8).) Apr 25 18:17:16 ip-172-18-0-149.ec2.internal systemd[1]: docker-storage-setup.service: main process exited, code=exited, status=5/NOTINSTALLED Apr 25 18:17:16 ip-172-18-0-149.ec2.internal systemd[1]: Failed to start Docker Storage Setup. Apr 25 18:17:16 ip-172-18-0-149.ec2.internal systemd[1]: Unit docker-storage-setup.service entered failed state. Apr 25 18:17:16 ip-172-18-0-149.ec2.internal systemd[1]: docker-storage-setup.service failed. Apr 25 18:17:16 ip-172-18-0-149.ec2.internal systemd[1]: Starting Docker Application Container Engine... I will now try to include the pvscan but not the -vvv on pvcreate. OK, I've done several (4+) tests now with the addition of pvscan and had zero failures. I am thinking that adding the pvscan is a valuable step. (In reply to Erik M Jacobs from comment #11) > OK, I've done several (4+) tests now with the addition of pvscan and had > zero failures. > > I am thinking that adding the pvscan is a valuable step. Hi Please attach 'pvcreate -vvvv /dev/xvdb1' trace from the 'failing' case. 'pvscan --cache' is not correct fix here - lvmetad should be doing this in background - so adding it 'extra' should not be needed. Also: ls -la /lib/udev/rules Note: if your formated device has an 'existing' signature - pvcreate may refuse to use unless forced - in this case - please open RFE to handle such case more user-friendly. Comment #1 has pvcreate output from a failure with -vvv (not 4 vs). I will try again. These should be new/clean EBS volumes but who knows with Amazon... I'll try to get the needinfo. I can't predictably produce the failures. Zdenek, - In latest patches we are using pvcreate with option -f. So that if there are signatures on disk, they should be overwritten. So this is not the case of bailing out because signatures are already there. - How do we take care of races with lvmetad. If a partition is created and while pvscan is going on, one could try to do "pvcreate" in the mean time and fail. I suspect that we are hitting this race window. Because if docker-storage-setup is re-run after the failure, it succeeds. Startup pvscan is meant to be handled by lvmetad - whoever is 1st. user of lvmetad is supposed to fill it with complete device scan. That's why it's important to see full 'pvcreate -vvvv' trace. Created attachment 1150969 [details]
no pvscan, pvcreate -vvvv -f
Just to make sure we're on the same page: 1) lvm2 is not installed when the OS comes up 2) installation of docker starts lvm2 services because it pulls in the lvm2 package(s) 3) startup of docker involves running docker-storage-setup 4) docker-storage-setup creates a partition and then runs pvcreate 5) OS thinks that partition/device exists, pvcreate *does not* If you add a pvscan --cache before running pvcreate, the problem goes away. So something is "wrong" with lvm in that pvcreate is not detecting a device even though the rest of the system "sees" the device. pvscan makes the lvm subsystem know about the device. Perhaps this bz should be transitioned to the lvm2 package? What I wrote in comment 4 (about pvscan notifying lvmetad about the device), now seems wrong. lvmetad is only notified about *PVs* and only caches info about PVs. In this case, a PV does not exist on /dev/xvdb1 prior to the pvcreate. So, we don't expect xvdb1 to exist in lvmetad prior to that. If pvscan seemed to help, it was because of some side effect. Sorry for that misdirection. Created attachment 1150999 [details]
another tweak to docker-storage-setup for more output
Above is a good log message. It shows clearly that partition device was present before pvcreate was called. INFO: Device node /dev/xvdb1 exists. And after that pvcreate failed. This time we did not get "stat" node does not exist error. Looks like device got filtered. filters/filter-sysfs.c:274 /dev/xvdb1: Skipping (sysfs) Link for device is created by devtmpfs. Unfortunately sysfs is not created in sync - though this should be more closely explored - it should not be possibly to easily trigger such race on unloaded machine. Anyway - as long as the even is not fully processed - any tool (included) lvm2 is 'free' to not see a device - this is 'built-in' race in the whole udev logic. The only 'weapon' available to a user to fight with such issue - is to wait for result of 'udevadm settle' lvm2 validate devices in number of ways - and in this case the missing 'sysfs' entry 'kills' such device from lvm2 usable devices (as validation filter). Solution is - to no pass devices which have 'creation of device node' in progress. TODO: explore why it's actually so slow, lvm2 command is even able to spot this race after shell loop.. It's unclear to me how it could be possible that 'sfdisk' ends with BLKRRPART ioctl - and there is actually not available /sysfs matching entry. Vivek do we still have this issue? Do we have this issue with container-storage-setup? Configure direct-lvm mode manually https://docs.docker.com/engine/userguide/storagedriver/device-mapper-driver/#configure-direct-lvm-mode-for-production We have no plans to ship another version of Docker at this time. RHEL7 is in final support stages where only security fixes will get released. Customers should move to use Podman which is available starting in RHEL 7.6. |