Bug 1564671
Summary: | Container configuration generation fails if the host file system is xfs that was created with ftype=0 | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Alex Schultz <aschultz> | ||||
Component: | openstack-tripleo-heat-templates | Assignee: | Emilien Macchi <emacchi> | ||||
Status: | CLOSED CANTFIX | QA Contact: | Gurenko Alex <agurenko> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 8.0 (Liberty) | CC: | augol, ccamacho, dwalsh, esandeen, jcoufal, jschluet, mburns, mcornea, morazi, mszeredi, pasik, rhel-osp-director-maint, roxenham, rscarazz, sbaker, vgoyal | ||||
Target Milestone: | zstream | Keywords: | Triaged, ZStream | ||||
Target Release: | 8.0 (Liberty) | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1575115 (view as bug list) | Environment: | |||||
Last Closed: | 2018-10-25 20:48:15 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1575115, 1580463, 1580469, 1580476 | ||||||
Attachments: |
|
Description
Alex Schultz
2018-04-06 20:33:21 UTC
CRC was enabled in RHEL in rhbz#1309498 I hit this 12 months ago which resulted in the following bugs: https://bugs.launchpad.net/tripleo/+bug/1693398 https://bugzilla.redhat.com/show_bug.cgi?id=1455713 https://bugzilla.redhat.com/show_bug.cgi?id=1288162 My takeaway at the time was that the next unreleased RHEL kernel might improve the situation for overlay2 on ftype=0 xfs, so it should be retested then. The situation is improved, now we get an early error message instead of weird behaviour on deleted files. But yes, we have a problem now for those who have upgraded all the way from early OSP versions when the default ftype was still 0. It would be interesting to know which OSP/RHEL version combo was the last one to be deployed with xfs ftype=0 to get an idea of the scope of this upgrade problem. Based on what I found, it was changed in RHEL7.3. According to the lifecycle page, we shipped OSP10 on 7.3. https://access.redhat.com/support/policy/updates/openstack/platform So <=OSP9 upgrades may be affected. I did some checks of the overcloud images that we shipped in the past and below are my results: OSP10 shipped rhel 7.3 overcloud image at GA time so we should be safe there. The overcloud image shipped at 9 GA(rhosp-director-image rpm in [1]) has the root fs formatted as ext4. According to [2] xfs is the only supported lower layer fs for OverlayFS so I believe deployments that used this image for initial deployment cannot be upgraded to containers. Overcloud image in the following 9-director builds are RHEL 7.3. Regarding the initial XFS issue - I found a RHEL 7.2 xfs root fs with ftype=0 overcloud image in OSP8 director Y1[3]. I can confirm that I reproduced the issue reported by Alex during FFU of the OSP8 environment deployed with that image(8->9->10->FFU->13). To summarize: OSP7/8/9 deployments are potentially blocked from being upgraded to containerized deployments(depending if the initial deployment was on RHEL 7.3 or earlier). Created attachment 1419558 [details]
fast_forward_upgrade_playbook.yaml output
Attaching the output of fast_forward_upgrade_playbook.yaml playbook where these errors show up.
(In reply to Marius Cornea from comment #5) > Created attachment 1419558 [details] > fast_forward_upgrade_playbook.yaml output > > Attaching the output of fast_forward_upgrade_playbook.yaml playbook where > these errors show up. Small correction - it's actually the deploy_steps_playbook.yaml playbook which fails. It is true that overlayfs in RHEL7 requires ftype to be enabled on XFS: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/7.2_release_notes/technology-preview-file_systems > Note that XFS file systems must be created with the -n ftype=1 option enabled for use as an overlay. If you are attempting to use overlayfs on a filesystem without the ftype feature enabled, then unfortunately this behavior is expected. Further, there is no in-place upgrade to add the ftype feature; dump, mkfs, & restore is the only path forward if you require ftype. ftype was indeed made default in RHEL7.3 / xfsprogs-4.5.0-1 in March 2016. however, I'm digging into this situation a bit more, upstream lack of d_type causes overlayfs to warn but not fail. cc: miklos as well. -Eric (But this bug may be conflating two issues, I'm not sure that
> rsync warning: some files vanished before they could be transferred
has anything to do with ftype support. Doesn't that simply mean that the source files were removed while rsync was running?)
(In reply to Alex Schultz from comment #0) > > In trying to figure out what was happening, I noticed that in the dmesg > output there would be these messages: > [79910.073570] overlayfs: upper fs needs to support d_type. This is an > invalid configuration. > [79910.091994] overlayfs: upper fs needs to support d_type. This is an > invalid configuration. > [79910.110953] overlayfs: upper fs needs to support d_type. This is an > invalid configuration. > This just means that overlay has undrelying xfs with ftype=0 and side effect of this should be that whiteout files will become visible to user/container. It should not lead to missing files during rsync. So something else is wrong. > > From these messages I found, > https://github.com/moby/moby/issues/10294#issuecomment-267846091 > > From this comment I found the deprecation notice for v1.13 around this > message which indicates that xfs doesn't support d_type if it was formated > with ftype=0 > > https://github.com/moby/moby/blob/v1.13.0-rc4/docs/deprecated.md#backing- > filesystem-without-d_type-support-for-overlayoverlay2 > BTW, to catch errors during configuration, I had modified container-storage-setup and error out if overlay is being setup with ftype=0 on underlying fs. But looks like in your setup you are somehow bypassing it. https://github.com/projectatomic/container-storage-setup/commit/7fffea78b4195bdb883c3dada90d11d140a2c60a > > So the system I was using was from a centos guest image that did not have > crc enabled for the xfs. > > $ xfs_info / > meta-data=/dev/vda1 isize=256 agcount=20, agsize=524224 blks > = sectsz=512 attr=2, projid32bit=1 > = crc=0 finobt=0 spinodes=0 > data = bsize=4096 blocks=10484164, imaxpct=25 > = sunit=0 swidth=0 blks > naming =version 2 bsize=4096 ascii-ci=0 ftype=0 > log =internal bsize=4096 blocks=2560, version=2 > = sectsz=512 sunit=0 blks, lazy-count=1 > realtime =none extsz=4096 blocks=0, rtextents=0 > > > Impact: > New installs won't be affected as we've have the correct fs setting now. > Older systems being upgraded from baremetal installations to containers may > fail. Was this system using overlay even before upgrade? Or it setup a fresh docker after upgrade? Can you paste "docker info" output after upgrade and possibly before upgrade as well. > has anything to do with ftype support. Doesn't that simply mean that the source files were removed while rsync was running?)
So because overlayfs doesn't fail, you end up with weird results. I had some containers not start up and some would start but get the rsync issues. So it "works" but you get some really odd interactions in the containers.
(In reply to Alex Schultz from comment #12) > > has anything to do with ftype support. Doesn't that simply mean that the source files were removed while rsync was running?) > > So because overlayfs doesn't fail, you end up with weird results. I had some > containers not start up and some would start but get the rsync issues. So > it "works" but you get some really odd interactions in the containers. I doubt that this is related to fype=0. Even if it is, simply don't use overlay with ftype=0. And, to make it easy, we put a check in container-storage-setup. Docker will fail, user will notice it and change your storage driver to say devicemapper. (In reply to Vivek Goyal from comment #13) > I doubt that this is related to fype=0. Even if it is, simply don't use > overlay with ftype=0. And, to make it easy, we put a check in > container-storage-setup. Docker will fail, user will notice it and change > your storage driver to say devicemapper. So for the openstack deployments we've settled on overlayfs and this bug is around the fact that there is an issue with older xfs and overlayfs. We'll have to evaluate the various issues related to to not using it. Currently this is not a configurable thing. The problem is not on new installs where everyone is getting compatible xfs but rather systems customers may be migrating from baremetal installations (done with <=7.2) to containerized installations (done with >=7.4). We're trying to figure out a solution that isn't reformat your system. NOTE: In my original test the same processes/software versions where used and the only difference was 1 node was a 7.2 node that was yum updated to 7.4. And the other node was a 7.4 node. Once both systems were up to date, the brand new installation proceeded and the 7.2 node exhibited odd docker behavior while the 7.4 worked fine. The only thing different was the xfs version. (In reply to Vivek Goyal from comment #11) > This just means that overlay has undrelying xfs with ftype=0 and side effect > of this should be that whiteout files will become visible to user/container. > It should not lead to missing files during rsync. So something else is wrong. Yea it just seemed to be the only difference between the two machines when one was successful and the other was not. > > BTW, to catch errors during configuration, I had modified > container-storage-setup and error out if overlay is being setup with ftype=0 > on underlying fs. But looks like in your setup you are somehow bypassing it. > > https://github.com/projectatomic/container-storage-setup/commit/ > 7fffea78b4195bdb883c3dada90d11d140a2c60a > We're not using this in openstack. I think we might need to add a similar check to prevent anything from proceeding. > > Was this system using overlay even before upgrade? Or it setup a fresh > docker after upgrade? > > Can you paste "docker info" output after upgrade and possibly before upgrade > as well. Fresh docker install after system updated to 7.4 [centos@undercloud ~]$ sudo docker info Containers: 11 Running: 0 Paused: 0 Stopped: 11 Images: 21 Server Version: 1.13.1 Storage Driver: overlay2 Backing Filesystem: xfs Supports d_type: false Native Overlay Diff: true Logging Driver: journald Cgroup Driver: systemd Plugins: Volume: local Network: bridge host macvlan null overlay Swarm: inactive Runtimes: docker-runc runc Default Runtime: docker-runc Init Binary: docker-init containerd version: (expected: aa8187dbd3b7ad67d8e5e3a15115d3eef43a7ed1) runc version: N/A (expected: 9df8b306d01f59d3a8029be411de015b7304dd8f) init version: N/A (expected: 949e6facb77383876aeff8a6944dde66b3089574) Security Options: seccomp WARNING: You're not using the default seccomp profile Profile: /etc/docker/seccomp.json Kernel Version: 3.10.0-693.21.1.el7.x86_64 Operating System: CentOS Linux 7 (Core) OSType: linux Architecture: x86_64 Number of Docker Hooks: 3 CPUs: 4 Total Memory: 7.639 GiB Name: undercloud.localdomain ID: SCW7:NFRC:TDQB:DF7A:PIT3:JDZB:RE4W:FL3K:2YEZ:W7LD:YPGO:EFOH Docker Root Dir: /var/lib/docker Debug Mode (client): false Debug Mode (server): true File Descriptors: 15 Goroutines: 23 System Time: 2018-04-12T17:20:13.340489947Z EventsListeners: 0 Registry: https://index.docker.io/v1/ WARNING: bridge-nf-call-iptables is disabled WARNING: bridge-nf-call-ip6tables is disabled Experimental: false Insecure Registries: 192.168.24.1:8787 127.0.0.0/8 Live Restore Enabled: true Registries: docker.io (secure) (In reply to Alex Schultz from comment #14) > (In reply to Vivek Goyal from comment #13) > > I doubt that this is related to fype=0. Even if it is, simply don't use > > overlay with ftype=0. And, to make it easy, we put a check in > > container-storage-setup. Docker will fail, user will notice it and change > > your storage driver to say devicemapper. > > So for the openstack deployments we've settled on overlayfs and this bug is > around the fact that there is an issue with older xfs and overlayfs. We'll > have to evaluate the various issues related to to not using it. Currently > this is not a configurable thing. The problem is not on new installs where > everyone is getting compatible xfs but rather systems customers may be > migrating from baremetal installations (done with <=7.2) to containerized > installations (done with >=7.4). We're trying to figure out a solution that > isn't reformat your system. > > NOTE: In my original test the same processes/software versions where used > and the only difference was 1 node was a 7.2 node that was yum updated to > 7.4. And the other node was a 7.4 node. Once both systems were up to date, > the brand new installation proceeded and the 7.2 node exhibited odd docker > behavior while the 7.4 worked fine. The only thing different was the xfs > version. If it was working on 7.2 and stopped working after upgrading to 7.4, this is really strange. Are you able to reproduce this consistently. If yes, let us try to narrow it down. I don't understand puppet and all the operations which are happening. If somebody can bring down the reproducer to docker level, I might be able to help you. (In reply to Alex Schultz from comment #15) > > BTW, to catch errors during configuration, I had modified > > container-storage-setup and error out if overlay is being setup with ftype=0 > > on underlying fs. But looks like in your setup you are somehow bypassing it. > > > > https://github.com/projectatomic/container-storage-setup/commit/ > > 7fffea78b4195bdb883c3dada90d11d140a2c60a > > > > We're not using this in openstack. I think we might need to add a similar > check to prevent anything from proceeding. Why did you decide to bypass container-storage-setup in openstack. I think it is a good idea to keep container-storage-setup in the path. > Operating System: CentOS Linux 7 (Core) Hmmm... you are using CentOS. Interesting. (In reply to Vivek Goyal from comment #16) > If it was working on 7.2 and stopped working after upgrading to 7.4, this is > really strange. Are you able to reproduce this consistently. If yes, let us > try to narrow it down. I don't understand puppet and all the operations > which are happening. If somebody can bring down the reproducer to docker > level, I might be able to help you. Yes it's consistent. Also it's not puppet, we're actually running a shell script to do some file copy operations during the config generation phase. Specifically it's this bit of code: https://github.com/openstack/tripleo-heat-templates/blob/master/docker/docker-puppet.py#L253-L276 So it should be noted that if you were to manually do this from within the container via a docker run -it bash, it works fine. It only fails when it's occurring so quickly in the throw away container we're using. It seems like a race condition of some sort. (In reply to Vivek Goyal from comment #17) > (In reply to Alex Schultz from comment #15) > > > BTW, to catch errors during configuration, I had modified > > > container-storage-setup and error out if overlay is being setup with ftype=0 > > > on underlying fs. But looks like in your setup you are somehow bypassing it. > > > > > > https://github.com/projectatomic/container-storage-setup/commit/ > > > 7fffea78b4195bdb883c3dada90d11d140a2c60a > > > > > > > We're not using this in openstack. I think we might need to add a similar > > check to prevent anything from proceeding. > > Why did you decide to bypass container-storage-setup in openstack. I think > it is a good idea to keep container-storage-setup in the path. > We don't use atomic in the OSP project yet. > > > Operating System: CentOS Linux 7 (Core) > > Hmmm... you are using CentOS. Interesting. Yes I could try and reproduce it in RHEL, but it's unlikely to change anything as we're using the same version of docker upstream. Overlay Storage was not supported in 7.2. So upgrading it to 7.4/7.5 is not supported. We did not support overlay until 7.4 and only with newly created xfs with the correct D-Type. The issue as has been pointed out is container images built using the bad xfs setting will be invalid. Basically they will end up with bogus files in them. (In reply to Daniel Walsh from comment #21) > Overlay Storage was not supported in 7.2. So upgrading it to 7.4/7.5 is not > supported. We did not support overlay until 7.4 and only with newly created > xfs with the correct D-Type. The issue as has been pointed out is container > images built using the bad xfs setting will be invalid. Basically they will > end up with bogus files in them. Right. There might not be much point in debugging issues on a not-supported configuration. That is have ftype=1. Otherwise use devicemapper as storage. Here you have some validations for the steps previous the upgrade: https://review.openstack.org/#/c/562282/ There is no fix for OSP8. We have documentation about the issue and have added in some validations. |