Bug 2037611 - The service account does not refresh after VM migration
Summary: The service account does not refresh after VM migration
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Virtualization
Version: 4.9.1
Hardware: All
OS: All
high
medium
Target Milestone: ---
: 4.15.0
Assignee: Jed Lejosne
QA Contact: Kedar Bidarkar
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-01-06 05:34 UTC by Mustafa Aydın
Modified: 2023-10-11 12:44 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: Known Issue
Doc Text:
Cause: As of OpenShift 4.8, OpenShift links a service account token in use by a pod to that specific Pod. Currently KubeVirt implements a ServiceAccountVolume by creating a disk image containing the token. If this VM is migrated, the token contained in the image will become invalid. Consequence: Migrating a VM, e.g. by evicting a node during upgrade, will render ServiceAccountVolumes invalid. Workaround (if any): It is possible to use user accounts instead of service accounts, because those tokens are not bound to a specific pod. Result:
Clone Of:
Environment:
Last Closed: 2023-10-11 12:44:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker CNV-15619 0 None None None 2022-11-22 16:31:17 UTC
Red Hat Issue Tracker CNV-18227 0 None None None 2022-08-10 12:45:41 UTC

Description Mustafa Aydın 2022-01-06 05:34:56 UTC
Description of problem:
When we mount a service account to the VM and  check the token inside the VM, it contains the name of the virt-launcher pod, however after the VM migration this name still shows the old virt-launcher pod, as the result token becomes useless.

Version-Release number of selected component (if applicable):
4.9.1

How reproducible:
Always

Steps to Reproduce:
1. Mount a service account to the VM
2. Migrate the VM
3. Check the token inside VM, the token has the field which shows the old virt-launcher pod



Actual results:
oc get pods
NAME                         READY   STATUS      RESTARTS   AGE
virt-launcher-xtbx02-7pnhq   0/1     Completed   0          18m
virt-launcher-xtbx02-dmbbl   1/1     Running     0          3m18s 



The token after the migration: 


jwt:

{
"aud": [
"https://kubernetes.default.svc"
],
"exp": 1672934711,
"iat": 1641398711,
"iss": "https://kubernetes.default.svc",
"kubernetes.io": {
"namespace": "s-testbox-02",
"pod": { 
"name": "virt-launcher-xtbx02-7pnhq",     < -----------------------------
"uid": "58ea2a5f-9794-433c-845b-76c69634752f" 
},
"serviceaccount": {
"name": "murphy",
"uid": "9cbdd2ea-4fc3-4ffc-a966-bb761ed4ba60"
},
"warnafter": 1641402318
},
"nbf": 1641398711,
"sub": "system:serviceaccount:s-testbox-02:murphy"
} 


Expected results:
IT should reference to the virt-launcher-xtbx02-dmbbl 

Additional info:

As a workaround, a secret with the token can be mounted to the VM.

Comment 3 Jed Lejosne 2022-01-25 13:59:33 UTC
I took a look at this issue, and tried to figure out why it wasn't a problem before.
The thing is, the token didn't include the pod's name until kubernetes 1.21, so that's why.

Now this is not easy to fix. We unfortunately expose the service account to the guest using an emulated disk drive.
Had we used a cdrom drive, we could have just ejected the outdated token and inserted the new one.
However, modifying active disk drives from the backend is forbidden by libvirt/qemu, for good reasons.

The only way I see around that is to hot-unplug the whole drive before migration and plug a new one after.
That will work, but there's a potential issue: if the disk containing the token is currently mounted in the (Linux) guest, the new disk will appear under a new name!
For example, if /dev/vda is the main drive of the VM and /dev/vdb is the service account drive, and /dev/vdb is mounted on migration, /dev/vdb will become unusable after migration and the new token will now be under /dev/vdc.
I haven't tested other OSes like Windows, but a BSOD wouldn't surprise me!

In short, the above solution would be a lot of work and potentially cause instabilities.
I would recommend that users move away from that and switch to using secrets.
I'll make a suggestion upstream to deprecate service-account config drives.

Comment 4 sgott 2022-01-26 21:11:17 UTC
Per Comment #3: With the current implementation, this issue cannot be fixed. Our recommendation is to copy a user or service account token to a secret and use that instead, because the token will not change it will survive a migration.

In the future, it might be possible that we re-implement the serviceAccountVolume feature using virtiofs. This would automatically address this issue.

Because this bug is not immediately addressable in its current state, we are deferring it to the next release.

Comment 6 ctomasko 2022-03-15 22:23:23 UTC
Added Release note > known issue

OpenShift Virtualization links a service account token in use by a pod to that specific pod. Kubevirt implements a ServiceAccountVolume by creating a disk image that contains a token. If you migrate a VM, then the ServiceAccountVolume becomes invalid.

As a workaround, use user accounts rather than service accounts because user account tokens are not bound to a specific pod. (BZ#2037611)

https://github.com/openshift/openshift-docs/pull/42530
https://deploy-preview-42530--osdocs.netlify.app/openshift-enterprise/latest/virt/virt-4-10-release-notes#virt-4-10-known-issues

Future link: After the OpenShift Virtualization 4.10 releases, you can find the release notes here: https://docs.openshift.com/container-platform/4.10/virt/virt-4-10-release-notes.html
or on the portal,
https://access.redhat.com/documentation/en-us/openshift_container_platform/4.10

Comment 7 sgott 2022-03-24 14:15:34 UTC
Per comment #4, this BZ is currently not addressable with the current architecture. Deferring it to the next release.

Comment 12 sgott 2022-09-09 15:44:26 UTC
Per comment #4, this BZ is currently not addressable with the current architecture (because rearchitecting this requires virtiofs). Deferring it to the next release.

Comment 20 Antonio Cardace 2023-06-20 12:23:14 UTC
Deferring this to 4.15 as the groundwork needed for this is still not ready (virtiofs in non-root mode).

Comment 21 Fabian Deutsch 2023-09-27 07:26:51 UTC
Lowering priority, mainly because this is a known issue, but a workaround is available and a featur eon track to address this problem.

Comment 22 Antonio Cardace 2023-10-11 12:44:32 UTC
Closing as 'DEFERRED' as we're now tracking this in Jira https://issues.redhat.com/browse/CNV-33835 for 4.17.


Note You need to log in before you can comment on or make changes to this bug.