Created attachment 1264558 [details] Sosreport and all logs in /var/log/ and /tmp from host Description of problem: Upgrade RHVH for multiple times, 4.0 GA -> 4.0.4 -> 4.0.5 -> 4.0.6 Async -> 4.1, the file modification in /etc in layer 4.0.4 and 4.0.5 can not be updated to layer 4.1. All modifications in /etc in any layer should be updated to newer layer during upgrade. Version-Release number of selected component (if applicable): Build 1: redhat-virtualization-host-4.0-20160817.0 Build 2: redhat-virtualization-host-4.0-20160919.0 Build 3: redhat-virtualization-host-4.0-20161116.1 Build 4: redhat-virtualization-host-4.0-20170201.0 Build 5: redhat-virtualization-host-4.1-20170314.0 How reproducible: 100% Steps to Reproduce: 1. Clean install RHVH build1 redhat-virtualization-host-4.0-20160817.0 2. Reboot and login build1(rhvh-4.0-20160817.0), create new file in /etc For example, create new file /etc/huzhao: ------------------------- # cat /etc/huzhao huzhao 0817 ------------------------- 3. Download redhat-virtualization-host-image-update-4.0-20160919.0.el7_2.noarch.rpm, and update rhvh to build2 in rhvh side: # yum install redhat-virtualization-host-image-update-4.0-20160919.0.el7_2.noarch.rpm 4. Reboot and login build2(rhvh-4.0-20160919.0), check file /etc/huzhao: ------------------------- # cat /etc/huzhao huzhao 0817 ------------------------- Modify it(add one line): ------------------------- # cat /etc/huzhao huzhao 0817 huzhao 0919 ------------------------- 5. Download redhat-virtualization-host-image-update-4.0-20161116.1.el7_3.noarch.rpm, and update rhvh to build3 in rhvh side: # yum install redhat-virtualization-host-image-update-4.0-20161116.1.el7_3.noarch.rpm 6. Reboot and login build3(rhvh-4.0-20161116.1), check file /etc/huzhao: ------------------------- # cat /etc/huzhao huzhao 0817 huzhao 0919 ------------------------- Modify it(add one line): ------------------------- # cat /etc/huzhao huzhao 0817 huzhao 0919 huzhao 1116 ------------------------- 7. Download redhat-virtualization-host-image-update-4.0-20170201.0.el7_3.noarch.rpm, and update rhvh to build4 in rhvh side: # yum install redhat-virtualization-host-image-update-4.0-20170201.0.el7_3.noarch.rpm 8. Reboot and login build4(rhvh-4.0-20170201.0), check file /etc/huzhao and imgbase layout: ------------------------- # cat /etc/huzhao huzhao 0817 huzhao 0919 huzhao 1116 ------------------------- ------------------------- # imgbase layout rhvh-4.0-0.20160817.0 +- rhvh-4.0-0.20160817.0+1 rhvh-4.0-0.20160919.0 +- rhvh-4.0-0.20160919.0+1 rhvh-4.0-0.20161116.0 +- rhvh-4.0-0.20161116.0+1 rhvh-4.0-0.20170201.0 +- rhvh-4.0-0.20170201.0+1 ------------------------- 9. Setup local repos in build4(rhvh-4.0-0.20170201.0), and update to build5(rhvh-4.1-20170314.0) # yum update 10. Check imgbase layout before reboot rhvh: ------------------------- # imgbase layout rhvh-4.0-0.20170201.0 +- rhvh-4.0-0.20170201.0+1 rhvh-4.1-0.20170315.0 +- rhvh-4.1-0.20170315.0+1 ------------------------- 11. Reboot and login build5 (rhvh-4.1-20170314.0), check file /etc/huzhao Actual results: In step 11, file /etc/huzhao is: ------------------------- # cat /etc/huzhao huzhao 0817 ------------------------- The modifications in build2 and build3 are not updated to latest layer. Expected results: In step 11, file /etc/huzhao should be: ------------------------- # cat /etc/huzhao huzhao 0817 huzhao 0919 huzhao 1116 ------------------------- The modifications in build2 and build3 should be updated to latest layer. Additional info: The file modifications in /var in build2 and build3 can be updated to latest layer.
The contents in /etc should be kept for each image because it is in the writable layer, but from bug description, the contents under /etc are missing after upgrading to 4.1. Standing on customer side as QE viewpoint, we consider it is a blocker.
Can you define "the contents under /etc are missing"? This has already been verified as part of rhbz#1417534 In general, the problem with the middle layer update is multifaceted, and the fix for rhbz#1417534 makes imgbased go back in time and pretend that imgbased always did the "right thing" by only keeping unmodified configuration files. Here, we can compare the hash of (for example): 0916 - /etc/hosts vs 0916 /usr/share/factory/etc/hosts These differ, so we figure that /etc/hosts has been modified, and we copy it forward. Originally (until 4.0.7/4.1.1), imgbased copied ALL of /etc To remediate this (and get the system back to a point where unmodified configuration files on a system upgraded from 4.0.3->4.0.6->4.0.7, for example) are actually still unmodified and keep the system value, imgbased will now essentially look at the difference and copy. Using /etc/vdsm/logger.conf for example, since this changes frequently, and some changes were created and then removed, imagine the following: 4.0.3 -> logger.conf was not modified 4.0.5 -> logger.conf changed in the image, but imgbased bulk copied the file from 4.0.3, so it's now considered modified. 4.0.6 -> logger.conf changed in the image, but imgbased bulk copied the file from 4.0.3, so it's now considered modified. 4.0.7 -> logger.conf changed in the image, but imgbased bulk copied again 4.0.7 now has logger.conf and a number of other files from 4.0.3 in /etc which should not be present in their modified versions. To fix this appropriately (and to fix it in previous layers), imgbased must go back in layers, and say: 4.0.3 -> logger.conf was not modified 4.0.5 -> logger.conf changed in the image, so keep the new one 4.0.6 -> logger.conf changed in the image, so keep the new one 4.0.7 -> logger.conf changed in the image, and the running image has it In this case (as an analogue to /etc/hosts), we cannot compare the file in 4.0.5 to /usr/share/factory/etc/hosts in 4.0.5, since it would have a different hash (being from 4.0.3). To resolve this, it's repeating the changes imgbased *should* have made. The only appropriate resolution here is to say: /etc/hosts in 0919 differs from /etc/hosts in 0817 *and* has a newer timestamp *and* the hash for /etc/hosts in 0817 is not the same as /usr/share/factory/etc/hosts in 0817 *and* the hash for /etc/hosts in 0916 is not the same as the timestamp in /usr/share/factory/etc/hosts in 0916, so we should keep that. This is probably an acceptable workaround, but can still potentially fail. If, for example: Boot into 0817 Modify /etc/hosts Upgrade to 0919 Modify /etc/hosts Upgrade to 1116 Modify /etc/hosts Boot back to 0919 Upgrade Which /etc/hosts should be taken? Similar to another bug, imgbased must now assume that layers newer than the NVR which is being upgraded from contain invalid configuration somehow and should be ignored (there should be a separate bug for this). In general, the suggestion ("All modifications in /etc in any layer should be updated to newer layer during upgrade.") cannot be resolved. If for no other reason, imgbased cannot be expected to know what is and is not valid syntax for every file in /etc. We must make a decision of what to keep, and timestamp seems like the best option, while also being aware that
(In reply to Ryan Barry from comment #2) > Can you define "the contents under /etc are missing"? This has already been > verified as part of rhbz#1417534 It means that the files under /etc which were _created_ into middle layers are missing after upgrading to latest 4.1 builds. It should not lose any data among several upgrades.
(In reply to Ying Cui from comment #3) > (In reply to Ryan Barry from comment #2) > > Can you define "the contents under /etc are missing"? This has already been > > verified as part of rhbz#1417534 > > It means that the files under /etc which were _created_ into middle layers > are missing after upgrading to latest 4.1 builds. It should not lose any > data among several upgrades. This is very misleading. For example: Install 0916 touch /etc/test.0916 Upgrade to 1012 touch /etc/test.1012 Upgrade to 1116 touch /etc/test.1116 Upgrade to 0317 All files are actually present in /etc on the final image, as expected. What's missing now is modifications made in subsequent layers, which is still serious, but can be handled with timestamp checking as a safety. There are still ways in which this can fail, though. I'm hesitant to implement a 3-way merge, since it's possible for configurations to be conflicting or otherwise broken, which is why I'm defaulting to timestamps
Test version: Build 1: redhat-virtualization-host-4.0-20160817.0 Build 2: redhat-virtualization-host-4.0-20160919.0 Build 3: redhat-virtualization-host-4.0-20161116.1 Build 4: redhat-virtualization-host-4.0-20170201.0 Build 5: redhat-virtualization-host-4.1-20170403.0 imgbased-0.9.20-0.1.el7ev.noarch Test steps: Same with comment 0 Test results: In step 11, file /etc/huzhao is: ------------------------- # cat /etc/huzhao huzhao 0817 huzhao 0919 huzhao 1116 ------------------------- The modifications in middle layers are updated to latest layer. So this bug is fixed in imgbased-0.9.20-0.1.el7ev.noarch, change the status to VERIFIED.
*** Bug 1443957 has been marked as a duplicate of this bug. ***