Created attachment 1701927 [details] must-gather log Description of problem: Specifying realkernel and OpenShift cluster installation fails. Steps to Reproduce: 1. Create a MachineConfig that sets kernelType to realtime cat > <installation_directory>/openshift/99-master-kerneltype.yaml <<EOF apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: "master" name: 99-master-kerneltype spec: kernelType: realtime EOF cat > <installation_directory>/openshift/99-worker-kerneltype.yaml <<EOF apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: "worker" name: 99-worker-kerneltype spec: kernelType: realtime EOF 2. Install ipi cluster. Actual results: Installation fails with only bootstrap and master created - no worker. On bootstrap: [#162] failed to create some manifests: p.c.openshift-qe.internal bootkube.sh[2299]: "99_openshift-cluster-api_worker-machineset-0.yaml": unable to get REST mapping for "99_openshift-cluster-api_worker-machineset-0.yaml": no matches for kind "MachineSet" in version "machine.openshift.io/v1beta1" On master: config-daemon[2057]: I0721 08:31:15.928575 2057 rpm-ostree.go:252] Pivoting to: 46.82.202007171740-0 (51be2cbe77b26ec90d4f4caa956615> config-daemon[2057]: I0721 08:31:44.222388 2057 update.go:1462] Initiating switch from kernel realtime to default config-daemon[2057]: I0721 08:31:44.228724 2057 rpm-ostree.go:360] Running captured: rpm-ostree status --json config-daemon[2057]: I0721 08:31:44.313764 2057 update.go:866] Updating files config-daemon[2057]: I0721 08:31:44.314472 2057 update.go:903] Deleting stale data config-daemon[2057]: error: error rolling back Real time Kernel No kernel-rt package installed on host: exit status 1 1]: machine-config-daemon-firstboot.service: Main process exited, code=exited, status=1/FAILURE 1]: machine-config-daemon-firstboot.service: Failed with result 'exit-code'. 1]: Failed to start Machine Config Daemon Firstboot. 1]: machine-config-daemon-firstboot.service: Consumed 2min 14.861s CPU time This is the error (from above): error: error rolling back Real time Kernel No kernel-rt package installed on host: exit status 1 Additional info: Tried with 4.6 nightly 7.20-093546 (and a few earlier builds)
Since cluster failed a bit early, must-gather didn't capture journal logs. Since we are trying to switch to realtime kernel during install time, detailed log will be in machine-config-daemon-firstboot.service . Can you please provide also provide journal log from one of the worker node for rpm-ostreed.service and machine-config-daemon-firstboot.service .
from one of the master machines: $ journalctl --unit=machine-config-daemon-firstboot.service -- Logs begin at Tue 2020-07-21 16:50:14 UTC, end at Tue 2020-07-21 18:21:21 UTC. -- Jul 21 16:55:34 tszegcp72120a-hbtss-master-0.c.openshift-qe.internal systemd[1]: Starting Machine Config Daemon Firstboot... Jul 21 16:55:34 tszegcp72120a-hbtss-master-0.c.openshift-qe.internal machine-config-daemon[2046]: I0721 16:55:34.148869 2046 rpm-ostree.go:360] Running capture> Jul 21 16:55:34 tszegcp72120a-hbtss-master-0.c.openshift-qe.internal machine-config-daemon[2046]: I0721 16:55:34.415015 2046 daemon.go:220] Booted osImageURL: > Jul 21 16:55:34 tszegcp72120a-hbtss-master-0.c.openshift-qe.internal machine-config-daemon[2046]: I0721 16:55:34.471123 2046 daemon.go:227] Installed Ignition > Jul 21 16:55:34 tszegcp72120a-hbtss-master-0.c.openshift-qe.internal machine-config-daemon[2046]: I0721 16:55:34.472624 2046 update.go:284] Checking Reconcilab> Jul 21 16:55:34 tszegcp72120a-hbtss-master-0.c.openshift-qe.internal machine-config-daemon[2046]: I0721 16:55:34.474187 2046 update.go:1462] Starting update fr> Jul 21 16:55:34 tszegcp72120a-hbtss-master-0.c.openshift-qe.internal machine-config-daemon[2046]: I0721 16:55:34.478878 2046 update.go:866] Updating files Jul 21 16:55:34 tszegcp72120a-hbtss-master-0.c.openshift-qe.internal machine-config-daemon[2046]: I0721 16:55:34.479336 2046 update.go:903] Deleting stale data Jul 21 16:55:34 tszegcp72120a-hbtss-master-0.c.openshift-qe.internal machine-config-daemon[2046]: I0721 16:55:34.481999 2046 update.go:1462] Initiating switch > Jul 21 16:55:34 tszegcp72120a-hbtss-master-0.c.openshift-qe.internal machine-config-daemon[2046]: I0721 16:55:34.485091 2046 run.go:18] Running: podman pull -q> Jul 21 16:56:05 tszegcp72120a-hbtss-master-0.c.openshift-qe.internal machine-config-daemon[2046]: 2381dd334aaf1b9c25351bc5a9ac7a48baf063fbfcf3c8b53d94526c7fe45588 Jul 21 16:56:05 tszegcp72120a-hbtss-master-0.c.openshift-qe.internal machine-config-daemon[2046]: I0721 16:56:05.639197 2046 rpm-ostree.go:360] Running capture> Jul 21 16:56:05 tszegcp72120a-hbtss-master-0.c.openshift-qe.internal machine-config-daemon[2046]: I0721 16:56:05.781138 2046 rpm-ostree.go:360] Running capture> Jul 21 16:56:05 tszegcp72120a-hbtss-master-0.c.openshift-qe.internal machine-config-daemon[2046]: I0721 16:56:05.899125 2046 update.go:1462] Switching to kerne> Jul 21 16:58:14 tszegcp72120a-hbtss-master-0.c.openshift-qe.internal machine-config-daemon[2046]: I0721 16:58:14.554239 2046 update.go:1462] Deleted container > Jul 21 16:58:14 tszegcp72120a-hbtss-master-0.c.openshift-qe.internal machine-config-daemon[2046]: I0721 16:58:14.559825 2046 update.go:1320] Updating OS to qua> Jul 21 16:58:14 tszegcp72120a-hbtss-master-0.c.openshift-qe.internal machine-config-daemon[2046]: I0721 16:58:14.559855 2046 rpm-ostree.go:360] Running capture> Jul 21 16:58:14 tszegcp72120a-hbtss-master-0.c.openshift-qe.internal machine-config-daemon[2046]: I0721 16:58:14.598968 2046 rpm-ostree.go:159] Current origin > Jul 21 16:58:14 tszegcp72120a-hbtss-master-0.c.openshift-qe.internal machine-config-daemon[2046]: I0721 16:58:14.599124 2046 run.go:18] Running: podman pull -q> Jul 21 16:58:44 tszegcp72120a-hbtss-master-0.c.openshift-qe.internal machine-config-daemon[2046]: 2381dd334aaf1b9c25351bc5a9ac7a48baf063fbfcf3c8b53d94526c7fe45588 Jul 21 16:58:44 tszegcp72120a-hbtss-master-0.c.openshift-qe.internal machine-config-daemon[2046]: I0721 16:58:44.227291 2046 rpm-ostree.go:360] Running capture> Jul 21 16:58:44 tszegcp72120a-hbtss-master-0.c.openshift-qe.internal machine-config-daemon[2046]: I0721 16:58:44.348389 2046 rpm-ostree.go:360] Running capture> Jul 21 16:58:44 tszegcp72120a-hbtss-master-0.c.openshift-qe.internal machine-config-daemon[2046]: I0721 16:58:44.503162 2046 rpm-ostree.go:360] Running capture> Jul 21 16:58:44 tszegcp72120a-hbtss-master-0.c.openshift-qe.internal machine-config-daemon[2046]: I0721 16:58:44.622152 2046 rpm-ostree.go:252] Pivoting to: 46> Jul 21 16:59:13 tszegcp72120a-hbtss-master-0.c.openshift-qe.internal machine-config-daemon[2046]: I0721 16:59:13.336235 2046 update.go:1462] Initiating switch > Jul 21 16:59:13 tszegcp72120a-hbtss-master-0.c.openshift-qe.internal machine-config-daemon[2046]: I0721 16:59:13.342546 2046 rpm-ostree.go:360] Running capture> Jul 21 16:59:13 tszegcp72120a-hbtss-master-0.c.openshift-qe.internal machine-config-daemon[2046]: I0721 16:59:13.424375 2046 update.go:866] Updating files Jul 21 16:59:13 tszegcp72120a-hbtss-master-0.c.openshift-qe.internal machine-config-daemon[2046]: I0721 16:59:13.425196 2046 update.go:903] Deleting stale data Jul 21 16:59:13 tszegcp72120a-hbtss-master-0.c.openshift-qe.internal machine-config-daemon[2046]: error: error rolling back Real time Kernel No kernel-rt package > Jul 21 16:59:13 tszegcp72120a-hbtss-master-0.c.openshift-qe.internal systemd[1]: machine-config-daemon-firstboot.service: Main process exited, code=exited, status> Jul 21 16:59:13 tszegcp72120a-hbtss-master-0.c.openshift-qe.internal systemd[1]: machine-config-daemon-firstboot.service: Failed with result 'exit-code'. Jul 21 16:59:13 tszegcp72120a-hbtss-master-0.c.openshift-qe.internal systemd[1]: Failed to start Machine Config Daemon Firstboot. Jul 21 16:59:13 tszegcp72120a-hbtss-master-0.c.openshift-qe.internal systemd[1]: machine-config-daemon-firstboot.service: Consumed 2min 11.916s CPU time
There is no worker node created.
This is probably related to a kernel vs kernel-rt NVR mis-match. From the RHCOS build logs, we are using `kernel-4.18.0-211.el8.x86_64` in RHCOS itself. But the we are shipping `kernel-rt-4.18.0-193.13.2.rt13.65.el8_2` in the machine-os-content. @Sinny if you agree, re-assign to RHCOS and we'll handle it.
Umm, I think this issue happens when we overlay a package first and then do rebase in same deployment. This is most likely a regression from rpm-ostree or ostree update I have seen this problem happening while working on adding extensions https://github.com/openshift/machine-config-operator/pull/1941 . To overcome this in the PR I started doing rebase first and then layer packages which is actually good thing to do since layering package will resolve its dependency from the latest version available in staged deployment. From the journal log, the actual error is a bit tricky. The problem started happening a bit earlier: Jul 21 16:56:05 tszegcp72120a-hbtss-master-0.c.openshift-qe.internal machine-config-daemon[2046]: I0721 16:56:05.899125 2046 update.go:1462] Switching to kerne> <---- Jul 21 16:58:14 tszegcp72120a-hbtss-master-0.c.openshift-qe.internal machine-config-daemon[2046]: I0721 16:58:14.554239 2046 update.go:1462] Deleted container > Jul 21 16:58:14 tszegcp72120a-hbtss-master-0.c.openshift-qe.internal machine-config-daemon[2046]: I0721 16:58:14.559825 2046 update.go:1320] Updating OS to qua ... Jul 21 16:58:44 tszegcp72120a-hbtss-master-0.c.openshift-qe.internal machine-config-daemon[2046]: I0721 16:58:44.622152 2046 rpm-ostree.go:252] Pivoting to: 46> <----- Since this is install time kernel-rt switch, kernel-rt switch happened first and then we imitated pivot to the new OSImageURL which I suspect in this case didn't succeed. As a result MCO initiated rollback of applied MachineConfigs. While doing rollback it failed because we are trying to query for installed kernel-rt packages which is ofcourse not yet installed because layered package is still in staged deployment. @micah yes, it will be good to move to RHCOS to see what changes could have caused rebase behavior change. I believe day1 rt-kernel switch has been working fine during 4.5. @tsze Some of the important messages is not visible in journal log as it opens in `less` . For better clarity, can you please provide again journal log of machine-config-daemon-firstboot.service and rpm-ostreed.service. Try something like: $ journalctl --unit=machine-config-daemon-firstboot.service | cat $ journalctl --unit=rpm-ostreed.service | cat
A workaround would be to complete the install *without* using the MachineConfig that sets `kernelType=realtime` and then switching the kernel as part of a day 2 operation. Setting as medium priority and targeting for 4.6
(In reply to Sinny Kumari from comment #5) > Umm, I think this issue happens when we overlay a package first and then do > rebase in same deployment. This is most likely a regression from rpm-ostree > or ostree update > > I have seen this problem happening while working on adding extensions > https://github.com/openshift/machine-config-operator/pull/1941 . To overcome > this in the PR I started doing rebase first and then layer packages which is > actually good thing to do since layering package will resolve its dependency > from the latest version available in staged deployment. It's not just a workaround, it's the correct behaviour. :) Otherwise we'd essentially be trying to do a depsolve on a pkgset that we might've not tested at all. The extensions are made to match the OSTree in the same oscontainer, not whatever happens to be the customer's bootstrap version. That said, it's not 100% clear if that's the issue here. It did successfully stage the kernel-rt switchover and then something went wrong during the pivot. We need the rpm-ostreed logs (or even better the full journal) to better diagnose. > As a result MCO initiated rollback of applied MachineConfigs. While doing > rollback it failed because we are trying to query for installed kernel-rt > packages which is ofcourse not yet installed because layered package is > still in staged deployment. Hmm yeah I think `installedRTKernelRpmsOnHost()` should probably be querying the staged deployment instead? But actually, if a failure happens before rebooting, the MCD should be able to just wipe the staged deployment (`rpm-ostree cleanup -p`). > @micah yes, it will be good to move to RHCOS to see what changes could have > caused rebase behavior change. I believe day1 rt-kernel switch has been > working fine during 4.5. Weird, AFAIK nothing in rpm-ostree changed recently wrt package layering handling. Note also that both 4.5 and 4.6 currently ship the same rpm-ostree version.
journalctl --unit=machine-config-daemon-firstboot.service | cat: -- Logs begin at Wed 2020-07-22 13:02:31 UTC, end at Wed 2020-07-22 15:20:40 UTC. -- Jul 22 13:07:43 tsze72220-x7864-master-0.c.openshift-qe.internal systemd[1]: Starting Machine Config Daemon Firstboot... Jul 22 13:07:43 tsze72220-x7864-master-0.c.openshift-qe.internal machine-config-daemon[2050]: I0722 13:07:43.107537 2050 rpm-ostree.go:360] Running captured: rpm-ostree status --json Jul 22 13:07:43 tsze72220-x7864-master-0.c.openshift-qe.internal machine-config-daemon[2050]: I0722 13:07:43.367292 2050 daemon.go:220] Booted osImageURL: (46.82.202007051540-0) Jul 22 13:07:43 tsze72220-x7864-master-0.c.openshift-qe.internal machine-config-daemon[2050]: I0722 13:07:43.430824 2050 daemon.go:227] Installed Ignition binary version: 0.35.1 Jul 22 13:07:43 tsze72220-x7864-master-0.c.openshift-qe.internal machine-config-daemon[2050]: I0722 13:07:43.432461 2050 update.go:284] Checking Reconcilable for config mco-empty-mc to rendered-master-92a047670efa0f674d3138da376704b4 Jul 22 13:07:43 tsze72220-x7864-master-0.c.openshift-qe.internal machine-config-daemon[2050]: I0722 13:07:43.434028 2050 update.go:1462] Starting update from mco-empty-mc to rendered-master-92a047670efa0f674d3138da376704b4: &{osUpdate:true kargs:false fips:false passwd:false files:false units:false kernelType:true} Jul 22 13:07:43 tsze72220-x7864-master-0.c.openshift-qe.internal machine-config-daemon[2050]: I0722 13:07:43.438563 2050 update.go:866] Updating files Jul 22 13:07:43 tsze72220-x7864-master-0.c.openshift-qe.internal machine-config-daemon[2050]: I0722 13:07:43.439163 2050 update.go:903] Deleting stale data Jul 22 13:07:43 tsze72220-x7864-master-0.c.openshift-qe.internal machine-config-daemon[2050]: I0722 13:07:43.442480 2050 update.go:1462] Initiating switch from kernel default to realtime Jul 22 13:07:43 tsze72220-x7864-master-0.c.openshift-qe.internal machine-config-daemon[2050]: I0722 13:07:43.445845 2050 run.go:18] Running: podman pull -q --authfile /var/lib/kubelet/config.json quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4dcfa26c63a08d65feb11f621df29719b4d46b7999c76315c137a21398830fba Jul 22 13:08:17 tsze72220-x7864-master-0.c.openshift-qe.internal machine-config-daemon[2050]: 2381dd334aaf1b9c25351bc5a9ac7a48baf063fbfcf3c8b53d94526c7fe45588 Jul 22 13:08:17 tsze72220-x7864-master-0.c.openshift-qe.internal machine-config-daemon[2050]: I0722 13:08:17.291545 2050 rpm-ostree.go:360] Running captured: podman create --net=none --annotation=org.openshift.machineconfigoperator.pivot=true --name mcd-3ad80762-7ed6-4faf-b943-258267af1c25 quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4dcfa26c63a08d65feb11f621df29719b4d46b7999c76315c137a21398830fba Jul 22 13:08:17 tsze72220-x7864-master-0.c.openshift-qe.internal machine-config-daemon[2050]: I0722 13:08:17.449506 2050 rpm-ostree.go:360] Running captured: podman mount 90035ffa353b52be1a3f0656c09c1cb6a127b42ea68db1a90bc10fcf94c67832 Jul 22 13:08:17 tsze72220-x7864-master-0.c.openshift-qe.internal machine-config-daemon[2050]: I0722 13:08:17.567684 2050 update.go:1462] Switching to kernelType=realtime, invoking rpm-ostree ["override" "remove" "kernel" "kernel-core" "kernel-modules" "kernel-modules-extra" "--install" "/var/lib/containers/storage/overlay/6db296ceafe90a6b4461b4fe1c2128c7a31d5da32735d971f4142c61cc291173/merged/kernel-rt-core-4.18.0-193.13.2.rt13.65.el8_2.x86_64.rpm" "--install" "/var/lib/containers/storage/overlay/6db296ceafe90a6b4461b4fe1c2128c7a31d5da32735d971f4142c61cc291173/merged/kernel-rt-kvm-4.18.0-193.13.2.rt13.65.el8_2.x86_64.rpm" "--install" "/var/lib/containers/storage/overlay/6db296ceafe90a6b4461b4fe1c2128c7a31d5da32735d971f4142c61cc291173/merged/kernel-rt-modules-4.18.0-193.13.2.rt13.65.el8_2.x86_64.rpm" "--install" "/var/lib/containers/storage/overlay/6db296ceafe90a6b4461b4fe1c2128c7a31d5da32735d971f4142c61cc291173/merged/kernel-rt-modules-extra-4.18.0-193.13.2.rt13.65.el8_2.x86_64.rpm"] Jul 22 13:10:26 tsze72220-x7864-master-0.c.openshift-qe.internal machine-config-daemon[2050]: I0722 13:10:26.867576 2050 update.go:1462] Deleted container and removed OSContainer image Jul 22 13:10:26 tsze72220-x7864-master-0.c.openshift-qe.internal machine-config-daemon[2050]: I0722 13:10:26.872656 2050 update.go:1320] Updating OS to quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4dcfa26c63a08d65feb11f621df29719b4d46b7999c76315c137a21398830fba Jul 22 13:10:26 tsze72220-x7864-master-0.c.openshift-qe.internal machine-config-daemon[2050]: I0722 13:10:26.872687 2050 rpm-ostree.go:360] Running captured: rpm-ostree status --json Jul 22 13:10:26 tsze72220-x7864-master-0.c.openshift-qe.internal machine-config-daemon[2050]: I0722 13:10:26.913459 2050 rpm-ostree.go:159] Current origin is not custom Jul 22 13:10:26 tsze72220-x7864-master-0.c.openshift-qe.internal machine-config-daemon[2050]: I0722 13:10:26.913630 2050 run.go:18] Running: podman pull -q --authfile /var/lib/kubelet/config.json quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4dcfa26c63a08d65feb11f621df29719b4d46b7999c76315c137a21398830fba Jul 22 13:10:57 tsze72220-x7864-master-0.c.openshift-qe.internal machine-config-daemon[2050]: 2381dd334aaf1b9c25351bc5a9ac7a48baf063fbfcf3c8b53d94526c7fe45588 Jul 22 13:10:57 tsze72220-x7864-master-0.c.openshift-qe.internal machine-config-daemon[2050]: I0722 13:10:57.704513 2050 rpm-ostree.go:360] Running captured: podman inspect --type=image quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4dcfa26c63a08d65feb11f621df29719b4d46b7999c76315c137a21398830fba Jul 22 13:10:57 tsze72220-x7864-master-0.c.openshift-qe.internal machine-config-daemon[2050]: I0722 13:10:57.821724 2050 rpm-ostree.go:360] Running captured: podman create --net=none --annotation=org.openshift.machineconfigoperator.pivot=true --name ostree-container-pivot-26a62e78-c35d-43ba-8319-52382dea2c84 quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4dcfa26c63a08d65feb11f621df29719b4d46b7999c76315c137a21398830fba Jul 22 13:10:57 tsze72220-x7864-master-0.c.openshift-qe.internal machine-config-daemon[2050]: I0722 13:10:57.972459 2050 rpm-ostree.go:360] Running captured: podman mount 13ece144809b6b4ba4c82e46eb6fcd8b0b6e9850eccdff22b90031947c9987ce Jul 22 13:10:58 tsze72220-x7864-master-0.c.openshift-qe.internal machine-config-daemon[2050]: I0722 13:10:58.091518 2050 rpm-ostree.go:252] Pivoting to: 46.82.202007171740-0 (51be2cbe77b26ec90d4f4caa956615c202ede5e122b488ff53bb2eb777362e07)
Created attachment 1702106 [details] ostreed.service ostreed.service as attached
Please feel free to ping me for more info or access to the cluster.
(In reply to Jonathan Lebon from comment #7) > Weird, AFAIK nothing in rpm-ostree changed recently wrt package layering > handling. Note also that both 4.5 and 4.6 currently ship the same rpm-ostree > version. You are right, latest 4.5 bootimage has rpm-ostree-2020.2-2.el8.x86_64. I haven't ran kernel-rt as day1 with latest 4.5, maybe it will worth running a cluster here and check. It will really nice if we can have ci test for day1 MachineConfig changes to check these issues in advance.
Ahhh, we have a core dump: Jul 22 13:11:37 tsze72220-x7864-master-1.c.openshift-qe.internal kernel: pool[16411]: segfault at 7fab14769480 ip 00007fab141f1879 sp 00007fab03ffbd08 error 7 in libglib-2.0.so.0.5600.4[7fab1415d000+116000] ... Jul 22 13:11:37 tsze72220-x7864-master-1.c.openshift-qe.internal systemd[1]: rpm-ostreed.service: Main process exited, code=killed, status=11/SEGV Jul 22 13:11:37 tsze72220-x7864-master-1.c.openshift-qe.internal systemd[1]: rpm-ostreed.service: Failed with result 'signal'. Jul 22 13:11:37 tsze72220-x7864-master-1.c.openshift-qe.internal systemd[1]: rpm-ostreed.service: Consumed 2min 17.768s CPU time Jul 22 13:11:38 tsze72220-x7864-master-1.c.openshift-qe.internal systemd-coredump[16565]: Process 2047 (rpm-ostree) of user 0 dumped core. Stack trace of thread 16411: #0 0x00007fab141f1879 g_mutex_lock (libglib-2.0.so.0) #1 0x00007fab141a7778 g_source_unref_internal (libglib-2.0.so.0) #2 0x00007fab15c20d3f idle_invoke_async_progress (libostree-1.so.1) #3 0x00007fab141a6fbb g_idle_dispatch (libglib-2.0.so.0) #4 0x00007fab141aa67d g_main_context_dispatch (libglib-2.0.so.0) #5 0x00007fab141aaa48 g_main_context_iterate.isra.21 (libglib-2.0.so.0) #6 0x00007fab141aaae0 g_main_context_iteration (libglib-2.0.so.0) #7 0x00005621d183255d relabel_if_necessary (rpm-ostree) #8 0x00005621d183917e rpmostree_context_assemble (rpm-ostree) #9 0x00005621d1872246 rpmostree_sysroot_upgrader_deploy (rpm-ostree) #10 0x00005621d186abd0 deploy_transaction_execute (rpm-ostree) #11 0x00005621d185d36d transaction_execute_thread (rpm-ostree) #12 0x00007fab1475d377 g_task_thread_pool_thread (libgio-2.0.so.0) #13 0x00007fab141d3753 g_thread_pool_thread_proxy (libglib-2.0.so.0) #14 0x00007fab141d2d4a g_thread_proxy (libglib-2.0.so.0) #15 0x00007fab131222de start_thread (libpthread.so.0) #16 0x00007fab12e53e83 __clone (libc.so.6) Stack trace of thread 16560: #0 0x00007fab12e4446d __fxstatat64 (libc.so.6) #1 0x00007fab15c34659 _ostree_repo_load_file_bare (libostree-1.so.1) #2 0x00007fab15c34f8e ostree_repo_load_file (libostree-1.so.1) #3 0x00007fab15c39e13 checkout_one_file_at (libostree-1.so.1) #4 0x00007fab15c3b923 checkout_tree_at_recurse (libostree-1.so.1) #5 0x00007fab15c3bee0 checkout_tree_at_recurse (libostree-1.so.1) #6 0x00007fab15c3bee0 checkout_tree_at_recurse (libostree-1.so.1) #7 0x00007fab15c3bee0 checkout_tree_at_recurse (libostree-1.so.1) #8 0x00007fab15c3bee0 checkout_tree_at_recurse (libostree-1.so.1) #9 0x00007fab15c3bee0 checkout_tree_at_recurse (libostree-1.so.1) #10 0x00007fab15c3bee0 checkout_tree_at_recurse (libostree-1.so.1) #11 0x00007fab15c3bee0 checkout_tree_at_recurse (libostree-1.so.1) #12 0x00007fab15c3c646 checkout_tree_at (libostree-1.so.1) #13 0x00007fab15c3ca87 ostree_repo_checkout_at (libostree-1.so.1) #14 0x00005621d1831d6b relabel_in_thread_impl.constprop.50 (rpm-ostree) #15 0x00005621d18321d6 relabel_in_thread (rpm-ostree) #16 0x00007fab1475d377 g_task_thread_pool_thread (libgio-2.0.so.0) #17 0x00007fab141d3753 g_thread_pool_thread_proxy (libglib-2.0.so.0) #18 0x00007fab141d2d4a g_thread_proxy (libglib-2.0.so.0) #19 0x00007fab131222de start_thread (libpthread.so.0) #20 0x00007fab12e53e83 __clone (libc.so.6) ... Stack trace of thread 16563: #0 0x00007fab12e4472b mkdirat (libc.so.6) #1 0x00007fab15c3b727 checkout_tree_at_recurse (libostree-1.so.1) #2 0x00007fab15c3c646 checkout_tree_at (libostree-1.so.1) #3 0x00007fab15c3ca87 ostree_repo_checkout_at (libostree-1.so.1) #4 0x00005621d1831d6b relabel_in_thread_impl.constprop.50 (rpm-ostree) #5 0x00005621d18321d6 relabel_in_thread (rpm-ostree) #6 0x00007fab1475d377 g_task_thread_pool_thread (libgio-2.0.so.0) #7 0x00007fab141d3753 g_thread_pool_thread_proxy (libglib-2.0.so.0) #8 0x00007fab141d2d4a g_thread_proxy (libglib-2.0.so.0) #9 0x00007fab131222de start_thread (libpthread.so.0) #10 0x00007fab12e53e83 __clone (libc.so.6) Stack trace of thread 16561: #0 0x00007fab12e51c2e fsetxattr (libc.so.6) #1 0x00007fab15c9bd71 glnx_fd_set_all_xattrs (libostree-1.so.1) #2 0x00007fab15c3b81f checkout_tree_at_recurse (libostree-1.so.1) #3 0x00007fab15c3c646 checkout_tree_at (libostree-1.so.1) #4 0x00007fab15c3ca87 ostree_repo_checkout_at (libostree-1.so.1) #5 0x00005621d1831d6b relabel_in_thread_impl.constprop.50 (rpm-ostree) #6 0x00005621d18321d6 relabel_in_thread (rpm-ostree) #7 0x00007fab1475d377 g_task_thread_pool_thread (libgio-2.0.so.0) #8 0x00007fab141d3753 g_thread_pool_thread_proxy (libglib-2.0.so.0) #9 0x00007fab141d2d4a g_thread_proxy (libglib-2.0.so.0) #10 0x00007fab131222de start_thread (libpthread.so.0) #11 0x00007fab12e53e83 __clone (libc.so.6) Stack trace of thread 16562: #0 0x00007fab12e4472b mkdirat (libc.so.6) #1 0x00007fab15c3b727 checkout_tree_at_recurse (libostree-1.so.1) #2 0x00007fab15c3c646 checkout_tree_at (libostree-1.so.1) #3 0x00007fab15c3ca87 ostree_repo_checkout_at (libostree-1.so.1) #4 0x00005621d1831d6b relabel_in_thread_impl.constprop.50 (rpm-ostree) #5 0x00005621d18321d6 relabel_in_thread (rpm-ostree) #6 0x00007fab1475d377 g_task_thread_pool_thread (libgio-2.0.so.0) #7 0x00007fab141d3753 g_thread_pool_thread_proxy (libglib-2.0.so.0) #8 0x00007fab141d2d4a g_thread_proxy (libglib-2.0.so.0) #9 0x00007fab131222de start_thread (libpthread.so.0) #10 0x00007fab12e53e83 __clone (libc.so.6) Digging...
Sorry, got mixed up. This is waiting on https://projects.engineering.redhat.com/browse/CLOUDBLD-2104.
In latest RHCOS 4.6 builds.
Verified with 4.6.0-0.nightly-2020-08-12-071533 ``` $ ./openshift-install create install-configs --dir=cluster $ ./openshift-install create manifests --dir=cluster $ cat machineConfigs/worker-realtime.yaml apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: "worker" name: 99-worker-kerneltype spec: kernelType: realtime $ cat machineConfigs/master-realtime.yaml apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: "master" name: 99-master-kerneltype spec: kernelType: realtime $ cp machineConfigs/*-realtime.yaml cluster/manifests/ $ ./openshift-install --dir=cluster --log-level=debug create cluster $ $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2020-08-12-071533 True False 31s Cluster version is 4.6.0-0.nightly-2020-08-12-071533 $ oc get nodes NAME STATUS ROLES AGE VERSION ip-10-0-139-89.us-west-2.compute.internal Ready worker 8m11s v1.19.0-rc.2+edbf229-dirty ip-10-0-142-22.us-west-2.compute.internal Ready master 21m v1.19.0-rc.2+edbf229-dirty ip-10-0-165-202.us-west-2.compute.internal Ready worker 8m29s v1.19.0-rc.2+edbf229-dirty ip-10-0-167-103.us-west-2.compute.internal Ready master 22m v1.19.0-rc.2+edbf229-dirty ip-10-0-210-82.us-west-2.compute.internal Ready master 22m v1.19.0-rc.2+edbf229-dirty ip-10-0-212-70.us-west-2.compute.internal Ready worker 8m44s v1.19.0-rc.2+edbf229-dirty $ oc get mc NAME GENERATEDBYCONTROLLER IGNITIONVERSION AGE 00-master 4865f7831e691d984623c52165a085444a6875ee 3.1.0 20m 00-worker 4865f7831e691d984623c52165a085444a6875ee 3.1.0 20m 01-master-container-runtime 4865f7831e691d984623c52165a085444a6875ee 3.1.0 20m 01-master-kubelet 4865f7831e691d984623c52165a085444a6875ee 3.1.0 20m 01-worker-container-runtime 4865f7831e691d984623c52165a085444a6875ee 3.1.0 20m 01-worker-kubelet 4865f7831e691d984623c52165a085444a6875ee 3.1.0 20m 99-master-generated-registries 4865f7831e691d984623c52165a085444a6875ee 3.1.0 20m 99-master-kerneltype 31m 99-master-ssh 3.1.0 31m 99-worker-generated-registries 4865f7831e691d984623c52165a085444a6875ee 3.1.0 20m 99-worker-kerneltype 31m 99-worker-ssh 3.1.0 31m rendered-master-b71d0fb8c61d50f6d2363fbeaab57165 4865f7831e691d984623c52165a085444a6875ee 3.1.0 20m rendered-worker-8deff4de3b484f57cace925f88f1b214 4865f7831e691d984623c52165a085444a6875ee 3.1.0 20m $ oc debug node/ip-10-0-139-89.us-west-2.compute.internal Starting pod/ip-10-0-139-89us-west-2computeinternal-debug ... To use host binaries, run `chroot /host` Pod IP: 10.0.139.89 If you don't see a command prompt, try pressing enter. sh-4.2# chroot /host sh-4.4# rpm-ostree status State: idle Deployments: * pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3e519481ef0ad982c860b036907d0e9622c524b59ea691e9ea85d7836d345a6e CustomOrigin: Managed by machine-config-operator Version: 46.82.202008111934-0 (2020-08-11T19:38:09Z) RemovedBasePackages: kernel-core kernel-modules kernel kernel-modules-extra 4.18.0-211.el8 LayeredPackages: kernel-rt-core kernel-rt-kvm kernel-rt-modules kernel-rt-modules-extra ostree://a5f61bca284efdd814e71e250a77afcbdc302f6eadee9836013d29fcae89e234 Version: 46.82.202008111140-0 (2020-08-11T11:44:32Z) sh-4.4# rpm -q rpm-ostree rpm-ostree-2020.4-1.el8.x86_64 sh-4.4# uname -a Linux ip-10-0-139-89 4.18.0-211.rt5.23.el8.x86_64 #1 SMP PREEMPT_RT Thu Jun 4 13:01:45 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux sh-4.4# exit exit sh-4.2# exit exit Removing debug pod ... $ oc debug node/ip-10-0-142-22.us-west-2.compute.internal Starting pod/ip-10-0-142-22us-west-2computeinternal-debug ... To use host binaries, run `chroot /host` Pod IP: 10.0.142.22 If you don't see a command prompt, try pressing enter. sh-4.2# chroot /host sh-4.4# rpm-ostree status State: idle Deployments: * pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3e519481ef0ad982c860b036907d0e9622c524b59ea691e9ea85d7836d345a6e CustomOrigin: Managed by machine-config-operator Version: 46.82.202008111934-0 (2020-08-11T19:38:09Z) RemovedBasePackages: kernel-core kernel-modules kernel kernel-modules-extra 4.18.0-211.el8 LayeredPackages: kernel-rt-core kernel-rt-kvm kernel-rt-modules kernel-rt-modules-extra ostree://a5f61bca284efdd814e71e250a77afcbdc302f6eadee9836013d29fcae89e234 Version: 46.82.202008111140-0 (2020-08-11T11:44:32Z) sh-4.4# rpm -q rpm-ostree rpm-ostree-2020.4-1.el8.x86_64 sh-4.4# rpm -q kernel-rt-core kernel-rt-core-4.18.0-211.rt5.23.el8.x86_64 sh-4.4# uname -a Linux ip-10-0-142-22 4.18.0-211.rt5.23.el8.x86_64 #1 SMP PREEMPT_RT Thu Jun 4 13:01:45 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux sh-4.4# exit exit sh-4.2# exit exit Removing debug pod ... ```
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196