Bug 2122299

Summary: Add support for container whiteouts
Product: Red Hat Enterprise Linux 8 Reporter: Colin Walters <walters>
Component: rpm-ostreeAssignee: Colin Walters <walters>
Status: CLOSED ERRATA QA Contact: RHCOS SST QE <rhcos-sst-qe>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 8.7CC: hhei, mnguyen, qzhang, rhcos-sst
Target Milestone: rcKeywords: Triaged
Target Release: ---Flags: pm-rhel: mirror+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: rpm-ostree-2022.10.97.gade6df33-2.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-05-16 08:24:00 UTC Type: Enhancement
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Colin Walters 2022-08-29 17:56:12 UTC
Backport https://github.com/ostreedev/ostree-rs-ext/pull/359 which fixes support for container whiteouts.

This code is only used by OCP right now; risk to Edge is effectively nil.

The backport was already queued on our rhel8 branch in https://github.com/coreos/rpm-ostree/pull/3973

Comment 3 HuijingHei 2022-09-22 09:02:08 UTC
Remove the needinfo for 8.7 exception, as this is moved to 8.8, change back if my understanding is not correct, thanks!

Comment 5 HuijingHei 2022-09-26 07:38:24 UTC
Thanks Colin for the pointer!

Build container (upgrade kernel) with podman locally and push it to registry (using skopeo workaround policy), run `rpm-ostree rebase`
- with old rpm-ostree, after booted, get old kernel version, this is expected and will be fixed in new version
- with new fixed rpm-ostree, reboot failed

Could you help to check? Thanks!

$ cat p.json 
{
    "default": [{"type": "insecureAcceptAnything"}],
    "transports": {
        "docker": {
	    "registry.access.redhat.com": [{"type": "insecureAcceptAnything"}],
	    "registry.redhat.io": [{"type": "insecureAcceptAnything"}]
	},
        "docker-daemon": {
	    "": [{"type": "insecureAcceptAnything"}]
	}
    }
}
$ oc registry login --to auth.json
$ skopeo copy --authfile auth.json --policy p.json containers-storage:localhost/test:latest docker://registry.ci.openshift.org/coreos/hhei-rhcos-test:4.12

1) with old rpm-ostree actually get old kernel version instead of new kernel
[core@cosa-devsh ~]$ rpm -q rpm-ostree
rpm-ostree-2022.10.86.gd8f0c67a-3.el8.x86_64
[core@cosa-devsh ~]$ sudo rpm-ostree rebase --experimental ostree-unverified-registry:registry.ci.openshift.org/coreos/hhei-rhcos-test:4.12
[core@cosa-devsh ~]$ sudo systemctl reboot

[core@cosa-devsh ~]$ ls -al /usr/lib/modules
total 12
drwxr-xr-x.  4 root root  118 Jan  1  1970 .
drwxr-xr-x. 38 root root 4096 Jan  1  1970 ..
----------.  1 root root    0 Sep 26 06:58 .wh.4.18.0-372.26.1.el8_6.x86_64
drwxr-xr-x.  7 root root 4096 Jan  1  1970 4.18.0-372.26.1.el8_6.x86_64
drwxr-xr-x.  7 root root 4096 Jan  1  1970 4.18.0-372.29.1.el8_6.x86_64
[core@cosa-devsh ~]$ uname -r
4.18.0-372.26.1.el8_6.x86_64


2) with new rpm-ostree
[core@cosa-devsh ~]$ rpm -q rpm-ostree
rpm-ostree-2022.10.94.g89f58028-2.el8.x86_64
[core@cosa-devsh ~]$ sudo rpm-ostree rebase --experimental ostree-unverified-registry:registry.ci.openshift.org/coreos/hhei-rhcos-test:4.12
[core@cosa-devsh ~]$ sudo systemctl reboot

Reboot failed with panic:
[    3.165155] traps: init[1] general protection fault ip:7f422d4aaee1 sp:7ffd72091520 error:0 in libc-2.28.so[7f422d489000+1bc000]
Fatal: can't o[pe    3.170081] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[    3.170081] 
n /de[v/ uran dom:   3.171026] CPU: 1 PID: 1 Comm: init Not tainted 4.18.0-372.29.1.el8_6.x86_64 #1
No such file o[r     3.171026] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.fc36 04/01/2014
directory[
    3.171026] Call Trace:
[    3.171026]  dump_stack+0x41/0x60
[    3.171026]  panic+0xe7/0x2ac
[    3.171026]  do_exit.cold.25+0x43/0x96
[    3.171026]  do_group_exit+0x3a/0xa0
[    3.171026]  get_signal+0x158/0x860
[    3.171026]  ? __switch_to_asm+0x35/0x70
[    3.171026]  ? __switch_to_asm+0x41/0x70
[    3.171026]  ? general_protection+0x8/0x30
[    3.171026]  do_signal+0x36/0x690
[    3.171026]  ? __switch_to_asm+0x41/0x70
[    3.171026]  ? __switch_to_asm+0x35/0x70
[    3.171026]  ? __switch_to_asm+0x41/0x70
[    3.171026]  ? finish_task_switch+0xaf/0x2e0
[    3.171026]  ? general_protection+0x8/0x30
[    3.171026]  exit_to_usermode_loop+0x89/0xf0
[    3.171026]  prepare_exit_to_usermode+0x9b/0xa0
[    3.171026]  retint_user+0x8/0x8
[    3.171026] RIP: 0033:0x7f422d4aaee1
[    3.171026] Code: 03 75 14 c7 05 1c ee 39 00 04 00 00 00 bf 06 00 00 00 e8 c2 ca 02 00 83 3d 0b ee 39 00 04 75 0b c7 05 ff ed 39 00 05 00 00 00 <f4> 83 3df
[    3.171026] RSP: 002b:00007ffd72091520 EFLAGS: 00010246
[    3.171026] RAX: 0000000000000000 RBX: 0000000000000028 RCX: 0000000000000000
[    3.171026] RDX: 0000000000000000 RSI: 00007ffd72091400 RDI: 0000000000000002
[    3.171026] RBP: 00007f422cbe1780 R08: 0000000000000000 R09: 00007ffd72091400
[    3.171026] R10: 0000000000000008 R11: 0000000000000246 R12: 00007ffd72091680
[    3.171026] R13: 00007f422d8497c0 R14: 0000000000000000 R15: 0000000000000078
[    3.171026] Kernel Offset: 0x2a000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[    3.171026] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[    3.171026]  ]---

Comment 6 Colin Walters 2022-09-27 13:57:23 UTC
Interesting, the fatal error here is:

Fatal: can't open /dev/urandom

I think something went wrong in the initramfs generation; can you paste your Dockerfile?

Comment 8 Colin Walters 2022-09-27 20:53:37 UTC
OK dug into this and https://github.com/coreos/rpm-ostree/pull/4058 fixes it for me.

I'm not yet entirely sure why this doesn't seem to affect current FCOS.

Comment 9 Colin Walters 2022-09-28 20:00:29 UTC
OK, built https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=2185788 and this works for me, can you give it a try too?

Comment 12 HuijingHei 2022-09-30 14:07:39 UTC
Should I change the fix version to 2022.10.97.gade6df33-2.el8 and set Verified:Tested?

Comment 13 Colin Walters 2022-09-30 14:21:35 UTC
> Should I change the fix version to 2022.10.97.gade6df33-2.el8 and set Verified:Tested?

Yep that sounds good to me, thanks!

Comment 14 HuijingHei 2022-10-01 04:09:04 UTC
(In reply to Colin Walters from comment #13)
> > Should I change the fix version to 2022.10.97.gade6df33-2.el8 and set Verified:Tested?
> 
> Yep that sounds good to me, thanks!

Thanks for your confirmation, done!

Comment 17 HuijingHei 2022-10-08 04:26:20 UTC
Verify passed with rpm-ostree-2022.10.97.gade6df33-2.el8.x86_64. After run `$ sudo rpm-ostree rebase --experimental ostree-unverified-registry:registry.ci.openshift.org/coreos/hhei-rhcos-test:4.12` and reboot successfully, check the new kernel is upgraded to the version as container image

$ ls /usr/lib/modules -la
total 8
drwxr-xr-x.  3 root root   42 Jan  1  1970 .
drwxr-xr-x. 38 root root 4096 Jan  1  1970 ..
drwxr-xr-x.  7 root root 4096 Jan  1  1970 4.18.0-372.29.1.el8_6.x86_64

Comment 18 Colin Walters 2022-10-24 13:17:21 UTC
So a question came up about the relationship of this and

https://bugzilla.redhat.com/show_bug.cgi?id=2134630

It's confusing!  They both talk about whiteouts, but are only tangentially related.  This bug is about handling whiteouts in the filesystem tree we boot into.  Look at the examples above, we're changing the "base" CoreOS container image.

The other bug is about handling *embedding other containers* into an ostree commit.  Here, the whiteout files instead are instead interpreted at runtime by e.g. podman/crio.

Comment 20 errata-xmlrpc 2023-05-16 08:24:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (rpm-ostree bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:2759