Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Description of problem:
Minor issue, but could cause (maybe) significant problems if lvm metadata recovery is needed at a later time (or other unexplored possibilities).
Version-Release number of selected component (if applicable):
Atomic Host 7.2.4
atomic-1.9-4.gitff44c6a.el7.x86_64
ostree-2016.1-2.atomic.el7.x86_64
How reproducible:
Trivial, with precise timing :(
Steps to Reproduce:
1. Run 'atomic host upgrade'
2. At the same time, expand a LV or make some other LVM metadata change.
Actual results:
# atomic host upgrade
Updating from: rhel-atomic-host-ostree:rhel-atomic-host/7/x86_64/standard
99 metadata, 639 content objects fetched; 190201 KiB transferred in 229 seconds
Copying /etc changes: 39 modified, 4 removed, 157955 added
error: During /etc merge: Failed to read modified config file 'lvm/archive/.lvm_$HOSTNAME_$NUMBER_$OTHERNUMBER': No such file or directory
Exit code 1
Expected results:
Exit-code 0 and contents of /etc/lvm/ tree correctly reflect prior metadata state.
Additional info:
Found this by accident, so an acceptable resolution is: "Don't do that".
Opening bug for reporting purposes and in case anyone else hits this or maybe there could be a more sinister problem here.
At some point we'll probably tweak the ostree process such that we only do the config merge before rebooting. This would obviate this problem as well as the "config changes i make after preparing an upgrade are gone".
But there are other advantages of having lvm store state in /var - we don't copy it at all, and it's also not something administrators should edit with `vi` etc.
IIRC the important difference here is /etc is "guaranteed" to be on the / filesystem whereas /var is not. For low-level facilities (like LVM) I can see an argument for wanting to possibly ease the pain for a (probably) small-number of cases where someone's important data is on the line.
However, this problem is also likely reproducible using any tool/service that writes/changes/moves/locks files under /etc during an update. Which is why I'm fine with a WONTFIX / "Don't do that" (i.e. docs) resolution. I guess it depends how much low-level and manual-recovery options we want to enable/support on this platform.
I think NOTABUG is fine.
Maybe we should have a feature to configure LVM_SYSTEM_DIR in the atomic image to point at /var/... ?
Otherwise it's probably more practical to simply recommend not touching storage while you're running the os-tree update. I'll see about getting this into the knowledge base in case a customer hits it.
Oh, now that is interesting! Right, because the thin-pool auto-extends, so if that were to happen at the same time as the upgrade, you'd hit this.
The error is a TOCTOU race on the temp/lock file. Probably the code just needs to catch that error and refresh / retry the copy.
So, clearly not as "corner-case" of a problem as I thought. Okay, re-opening this for further investigation.
Comment 18RHEL Program Management
2020-12-15 07:44:12 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.