Bug 1365297 - atomic host upgrade: Failed to read modified config file lvm/archive/
Summary: atomic host upgrade: Failed to read modified config file lvm/archive/
Status: ASSIGNED
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: ostree   
(Show other bugs)
Version: 7.5
Hardware: x86_64
OS: Linux
unspecified
low
Target Milestone: rc
: ---
Assignee: Colin Walters
QA Contact: atomic-bugs@redhat.com
URL:
Whiteboard:
Keywords: Extras, Reopened
Depends On:
Blocks: 1420851
TreeView+ depends on / blocked
 
Reported: 2016-08-08 20:30 UTC by Chris Evich
Modified: 2018-10-26 21:46 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-08-11 17:08:23 UTC
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Bugzilla 1366584 None None None Never
Red Hat Knowledge Base (Solution) 2519051 None None None 2016-08-11 18:12 UTC

Internal Trackers: 1366584

Description Chris Evich 2016-08-08 20:30:50 UTC
Description of problem:
Minor issue, but could cause (maybe) significant problems if lvm metadata recovery is needed at a later time (or other unexplored possibilities).

Version-Release number of selected component (if applicable):
Atomic Host 7.2.4
atomic-1.9-4.gitff44c6a.el7.x86_64
ostree-2016.1-2.atomic.el7.x86_64

How reproducible:
Trivial, with precise timing :(

Steps to Reproduce:
1. Run 'atomic host upgrade'
2. At the same time, expand a LV or make some other LVM metadata change.

Actual results:
# atomic host upgrade
Updating from: rhel-atomic-host-ostree:rhel-atomic-host/7/x86_64/standard

99 metadata, 639 content objects fetched; 190201 KiB transferred in 229 seconds
Copying /etc changes: 39 modified, 4 removed, 157955 added
error: During /etc merge: Failed to read modified config file 'lvm/archive/.lvm_$HOSTNAME_$NUMBER_$OTHERNUMBER': No such file or directory

Exit code 1


Expected results:
Exit-code 0 and contents of /etc/lvm/ tree correctly reflect prior metadata state.

Additional info:
Found this by accident, so an acceptable resolution is: "Don't do that".

Opening bug for reporting purposes and in case anyone else hits this or maybe there could be a more sinister problem here.

Comment 1 Colin Walters 2016-08-08 20:35:30 UTC
Hey lvm team, any chance we could support storing this data in `/var/lib/lvm`?  For more information, see https://ostree.readthedocs.io/en/latest/manual/adapting-existing/

Comment 2 Colin Walters 2016-08-08 20:37:25 UTC
At some point we'll probably tweak the ostree process such that we only do the config merge before rebooting.  This would obviate this problem as well as the "config changes i make after preparing an upgrade are gone".

But there are other advantages of having lvm store state in /var - we don't copy it at all, and it's also not something administrators should edit with `vi` etc.

Comment 3 Chris Evich 2016-08-08 21:15:30 UTC
IIRC the important difference here is /etc is "guaranteed" to be on the / filesystem whereas /var is not.  For low-level facilities (like LVM) I can see an argument for wanting to possibly ease the pain for a (probably) small-number of cases where someone's important data is on the line.  

However, this problem is also likely reproducible using any tool/service that writes/changes/moves/locks files under /etc during an update.  Which is why I'm fine with a WONTFIX / "Don't do that" (i.e. docs) resolution.  I guess it depends how much low-level and manual-recovery options we want to enable/support on this platform.

Comment 5 Bryn M. Reeves 2016-08-09 10:25:55 UTC
Set LVM_SYSTEM_DIR:

       LVM_SYSTEM_DIR
              Directory containing lvm.conf(5) and other LVM system files.  
              Defaults to "/etc/lvm".

man 7 lvm.

Comment 6 Bryn M. Reeves 2016-08-09 10:26:30 UTC
Err, man *8* lvm..

Comment 8 Chris Evich 2016-08-11 17:26:40 UTC
I think NOTABUG is fine.

Maybe we should have a feature to configure LVM_SYSTEM_DIR in the atomic image to point at /var/... ?

Otherwise it's probably more practical to simply recommend not touching storage while you're running the os-tree update.  I'll see about getting this into the knowledge base in case a customer hits it.

Comment 9 Chris Evich 2016-08-11 19:14:45 UTC
Looks great, thanks Derrick.

Comment 11 Philippe Lafoucriere 2016-08-26 14:05:24 UTC
I have this issue on 2 centos atomic nodes today.
The root partition has been extended lately because it was full:

-bash-4.2# lvs
  LV          VG   Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  docker-pool cah  twi-aot--- 64.60g             85.85  53.13
  root        cah  -wi-ao---- 10.00g
  swap        cah  -wi-ao----  5.00g

Now we have enough space:

-bash-4.2# df -h
Filesystem                                      Size  Used Avail Use% Mounted on
/dev/mapper/cah-root                             10G  7.0G  3.0G  71% /
devtmpfs                                        3.9G     0  3.9G   0% /dev
tmpfs                                           3.9G     0  3.9G   0% /dev/shm
tmpfs                                           3.9G  4.3M  3.9G   1% /run
tmpfs                                           3.9G     0  3.9G   0% /sys/fs/cgroup
/dev/sda1                                       297M  146M  152M  49% /boot
[...]

and I'm NOT in the middle of a lvm expansion (or else I missed a command after `xfs_growfs /`):

-bash-4.2# atomic host upgrade -r
Updating from: centos-atomic-host:centos-atomic-host/7/x86_64/standard
1 metadata, 0 content objects fetched; 313 B transferred in 0 seconds
Copying /etc changes: 26 modified, 4 removed, 179400 added
error: During /etc merge: Failed to read modified config file 'lvm/archive/.lvm_atomic-test-node-2.priv.tech-angels.net_967_1993706434': No such file or directory

Thanks

Comment 12 Chris Evich 2016-08-26 19:36:30 UTC
Oh, now that is interesting!  Right, because the thin-pool auto-extends, so if that were to happen at the same time as the upgrade, you'd hit this.

The error is a TOCTOU race on the temp/lock file.  Probably the code just needs to catch that error and refresh / retry the copy.

So, clearly not as "corner-case" of a problem as I thought.  Okay, re-opening this for further investigation.

Comment 13 Micah Abbott 2016-09-22 13:34:36 UTC
I think the correct component is probably 'ostree' for this.

Comment 15 Colin Walters 2018-01-15 15:28:36 UTC
This will be fixed by https://github.com/ostreedev/ostree/issues/545

Comment 16 Steve Milner 2018-02-12 15:15:14 UTC
This is related to /etc merge before reboot.


Note You need to log in before you can comment on or make changes to this bug.