Description of problem:
Minor issue, but could cause (maybe) significant problems if lvm metadata recovery is needed at a later time (or other unexplored possibilities).
Version-Release number of selected component (if applicable):
Atomic Host 7.2.4
Trivial, with precise timing :(
Steps to Reproduce:
1. Run 'atomic host upgrade'
2. At the same time, expand a LV or make some other LVM metadata change.
# atomic host upgrade
Updating from: rhel-atomic-host-ostree:rhel-atomic-host/7/x86_64/standard
99 metadata, 639 content objects fetched; 190201 KiB transferred in 229 seconds
Copying /etc changes: 39 modified, 4 removed, 157955 added
error: During /etc merge: Failed to read modified config file 'lvm/archive/.lvm_$HOSTNAME_$NUMBER_$OTHERNUMBER': No such file or directory
Exit code 1
Exit-code 0 and contents of /etc/lvm/ tree correctly reflect prior metadata state.
Found this by accident, so an acceptable resolution is: "Don't do that".
Opening bug for reporting purposes and in case anyone else hits this or maybe there could be a more sinister problem here.
Hey lvm team, any chance we could support storing this data in `/var/lib/lvm`? For more information, see https://ostree.readthedocs.io/en/latest/manual/adapting-existing/
At some point we'll probably tweak the ostree process such that we only do the config merge before rebooting. This would obviate this problem as well as the "config changes i make after preparing an upgrade are gone".
But there are other advantages of having lvm store state in /var - we don't copy it at all, and it's also not something administrators should edit with `vi` etc.
IIRC the important difference here is /etc is "guaranteed" to be on the / filesystem whereas /var is not. For low-level facilities (like LVM) I can see an argument for wanting to possibly ease the pain for a (probably) small-number of cases where someone's important data is on the line.
However, this problem is also likely reproducible using any tool/service that writes/changes/moves/locks files under /etc during an update. Which is why I'm fine with a WONTFIX / "Don't do that" (i.e. docs) resolution. I guess it depends how much low-level and manual-recovery options we want to enable/support on this platform.
Directory containing lvm.conf(5) and other LVM system files.
Defaults to "/etc/lvm".
man 7 lvm.
Err, man *8* lvm..
I think NOTABUG is fine.
Maybe we should have a feature to configure LVM_SYSTEM_DIR in the atomic image to point at /var/... ?
Otherwise it's probably more practical to simply recommend not touching storage while you're running the os-tree update. I'll see about getting this into the knowledge base in case a customer hits it.
Looks great, thanks Derrick.
I have this issue on 2 centos atomic nodes today.
The root partition has been extended lately because it was full:
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
docker-pool cah twi-aot--- 64.60g 85.85 53.13
root cah -wi-ao---- 10.00g
swap cah -wi-ao---- 5.00g
Now we have enough space:
-bash-4.2# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/cah-root 10G 7.0G 3.0G 71% /
devtmpfs 3.9G 0 3.9G 0% /dev
tmpfs 3.9G 0 3.9G 0% /dev/shm
tmpfs 3.9G 4.3M 3.9G 1% /run
tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup
/dev/sda1 297M 146M 152M 49% /boot
and I'm NOT in the middle of a lvm expansion (or else I missed a command after `xfs_growfs /`):
-bash-4.2# atomic host upgrade -r
Updating from: centos-atomic-host:centos-atomic-host/7/x86_64/standard
1 metadata, 0 content objects fetched; 313 B transferred in 0 seconds
Copying /etc changes: 26 modified, 4 removed, 179400 added
error: During /etc merge: Failed to read modified config file 'lvm/archive/.lvm_atomic-test-node-2.priv.tech-angels.net_967_1993706434': No such file or directory
Oh, now that is interesting! Right, because the thin-pool auto-extends, so if that were to happen at the same time as the upgrade, you'd hit this.
The error is a TOCTOU race on the temp/lock file. Probably the code just needs to catch that error and refresh / retry the copy.
So, clearly not as "corner-case" of a problem as I thought. Okay, re-opening this for further investigation.
I think the correct component is probably 'ostree' for this.
This will be fixed by https://github.com/ostreedev/ostree/issues/545
This is related to /etc merge before reboot.