Red Hat Bugzilla – Bug 1365297
atomic host upgrade: Failed to read modified config file lvm/archive/
Last modified: 2018-10-26 17:46:04 EDT
Description of problem: Minor issue, but could cause (maybe) significant problems if lvm metadata recovery is needed at a later time (or other unexplored possibilities). Version-Release number of selected component (if applicable): Atomic Host 7.2.4 atomic-1.9-4.gitff44c6a.el7.x86_64 ostree-2016.1-2.atomic.el7.x86_64 How reproducible: Trivial, with precise timing :( Steps to Reproduce: 1. Run 'atomic host upgrade' 2. At the same time, expand a LV or make some other LVM metadata change. Actual results: # atomic host upgrade Updating from: rhel-atomic-host-ostree:rhel-atomic-host/7/x86_64/standard 99 metadata, 639 content objects fetched; 190201 KiB transferred in 229 seconds Copying /etc changes: 39 modified, 4 removed, 157955 added error: During /etc merge: Failed to read modified config file 'lvm/archive/.lvm_$HOSTNAME_$NUMBER_$OTHERNUMBER': No such file or directory Exit code 1 Expected results: Exit-code 0 and contents of /etc/lvm/ tree correctly reflect prior metadata state. Additional info: Found this by accident, so an acceptable resolution is: "Don't do that". Opening bug for reporting purposes and in case anyone else hits this or maybe there could be a more sinister problem here.
Hey lvm team, any chance we could support storing this data in `/var/lib/lvm`? For more information, see https://ostree.readthedocs.io/en/latest/manual/adapting-existing/
At some point we'll probably tweak the ostree process such that we only do the config merge before rebooting. This would obviate this problem as well as the "config changes i make after preparing an upgrade are gone". But there are other advantages of having lvm store state in /var - we don't copy it at all, and it's also not something administrators should edit with `vi` etc.
IIRC the important difference here is /etc is "guaranteed" to be on the / filesystem whereas /var is not. For low-level facilities (like LVM) I can see an argument for wanting to possibly ease the pain for a (probably) small-number of cases where someone's important data is on the line. However, this problem is also likely reproducible using any tool/service that writes/changes/moves/locks files under /etc during an update. Which is why I'm fine with a WONTFIX / "Don't do that" (i.e. docs) resolution. I guess it depends how much low-level and manual-recovery options we want to enable/support on this platform.
Set LVM_SYSTEM_DIR: LVM_SYSTEM_DIR Directory containing lvm.conf(5) and other LVM system files. Defaults to "/etc/lvm". man 7 lvm.
Err, man *8* lvm..
I think NOTABUG is fine. Maybe we should have a feature to configure LVM_SYSTEM_DIR in the atomic image to point at /var/... ? Otherwise it's probably more practical to simply recommend not touching storage while you're running the os-tree update. I'll see about getting this into the knowledge base in case a customer hits it.
Looks great, thanks Derrick.
I have this issue on 2 centos atomic nodes today. The root partition has been extended lately because it was full: -bash-4.2# lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert docker-pool cah twi-aot--- 64.60g 85.85 53.13 root cah -wi-ao---- 10.00g swap cah -wi-ao---- 5.00g Now we have enough space: -bash-4.2# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/cah-root 10G 7.0G 3.0G 71% / devtmpfs 3.9G 0 3.9G 0% /dev tmpfs 3.9G 0 3.9G 0% /dev/shm tmpfs 3.9G 4.3M 3.9G 1% /run tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup /dev/sda1 297M 146M 152M 49% /boot [...] and I'm NOT in the middle of a lvm expansion (or else I missed a command after `xfs_growfs /`): -bash-4.2# atomic host upgrade -r Updating from: centos-atomic-host:centos-atomic-host/7/x86_64/standard 1 metadata, 0 content objects fetched; 313 B transferred in 0 seconds Copying /etc changes: 26 modified, 4 removed, 179400 added error: During /etc merge: Failed to read modified config file 'lvm/archive/.lvm_atomic-test-node-2.priv.tech-angels.net_967_1993706434': No such file or directory Thanks
Oh, now that is interesting! Right, because the thin-pool auto-extends, so if that were to happen at the same time as the upgrade, you'd hit this. The error is a TOCTOU race on the temp/lock file. Probably the code just needs to catch that error and refresh / retry the copy. So, clearly not as "corner-case" of a problem as I thought. Okay, re-opening this for further investigation.
I think the correct component is probably 'ostree' for this.
This will be fixed by https://github.com/ostreedev/ostree/issues/545
This is related to /etc merge before reboot.