Bug 2037611

Summary: The service account does not refresh after VM migration
Product: Container Native Virtualization (CNV) Reporter: Mustafa Aydın <maydin>
Component: VirtualizationAssignee: Jed Lejosne <jlejosne>
Status: CLOSED DEFERRED QA Contact: Kedar Bidarkar <kbidarka>
Severity: medium Docs Contact:
Priority: high    
Version: 4.9.1CC: acardace, bhanoglu, danken, fdeutsch, gmaglione, gwest, jlejosne, jspanko, lpivarc, pelauter, phoracek, sgott, vgoyal, xiagao
Target Milestone: ---   
Target Release: 4.15.0   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Known Issue
Doc Text:
Cause: As of OpenShift 4.8, OpenShift links a service account token in use by a pod to that specific Pod. Currently KubeVirt implements a ServiceAccountVolume by creating a disk image containing the token. If this VM is migrated, the token contained in the image will become invalid. Consequence: Migrating a VM, e.g. by evicting a node during upgrade, will render ServiceAccountVolumes invalid. Workaround (if any): It is possible to use user accounts instead of service accounts, because those tokens are not bound to a specific pod. Result:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-10-11 12:44:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Mustafa Aydın 2022-01-06 05:34:56 UTC
Description of problem:
When we mount a service account to the VM and  check the token inside the VM, it contains the name of the virt-launcher pod, however after the VM migration this name still shows the old virt-launcher pod, as the result token becomes useless.

Version-Release number of selected component (if applicable):
4.9.1

How reproducible:
Always

Steps to Reproduce:
1. Mount a service account to the VM
2. Migrate the VM
3. Check the token inside VM, the token has the field which shows the old virt-launcher pod



Actual results:
oc get pods
NAME                         READY   STATUS      RESTARTS   AGE
virt-launcher-xtbx02-7pnhq   0/1     Completed   0          18m
virt-launcher-xtbx02-dmbbl   1/1     Running     0          3m18s 



The token after the migration: 


jwt:

{
"aud": [
"https://kubernetes.default.svc"
],
"exp": 1672934711,
"iat": 1641398711,
"iss": "https://kubernetes.default.svc",
"kubernetes.io": {
"namespace": "s-testbox-02",
"pod": { 
"name": "virt-launcher-xtbx02-7pnhq",     < -----------------------------
"uid": "58ea2a5f-9794-433c-845b-76c69634752f" 
},
"serviceaccount": {
"name": "murphy",
"uid": "9cbdd2ea-4fc3-4ffc-a966-bb761ed4ba60"
},
"warnafter": 1641402318
},
"nbf": 1641398711,
"sub": "system:serviceaccount:s-testbox-02:murphy"
} 


Expected results:
IT should reference to the virt-launcher-xtbx02-dmbbl 

Additional info:

As a workaround, a secret with the token can be mounted to the VM.

Comment 3 Jed Lejosne 2022-01-25 13:59:33 UTC
I took a look at this issue, and tried to figure out why it wasn't a problem before.
The thing is, the token didn't include the pod's name until kubernetes 1.21, so that's why.

Now this is not easy to fix. We unfortunately expose the service account to the guest using an emulated disk drive.
Had we used a cdrom drive, we could have just ejected the outdated token and inserted the new one.
However, modifying active disk drives from the backend is forbidden by libvirt/qemu, for good reasons.

The only way I see around that is to hot-unplug the whole drive before migration and plug a new one after.
That will work, but there's a potential issue: if the disk containing the token is currently mounted in the (Linux) guest, the new disk will appear under a new name!
For example, if /dev/vda is the main drive of the VM and /dev/vdb is the service account drive, and /dev/vdb is mounted on migration, /dev/vdb will become unusable after migration and the new token will now be under /dev/vdc.
I haven't tested other OSes like Windows, but a BSOD wouldn't surprise me!

In short, the above solution would be a lot of work and potentially cause instabilities.
I would recommend that users move away from that and switch to using secrets.
I'll make a suggestion upstream to deprecate service-account config drives.

Comment 4 sgott 2022-01-26 21:11:17 UTC
Per Comment #3: With the current implementation, this issue cannot be fixed. Our recommendation is to copy a user or service account token to a secret and use that instead, because the token will not change it will survive a migration.

In the future, it might be possible that we re-implement the serviceAccountVolume feature using virtiofs. This would automatically address this issue.

Because this bug is not immediately addressable in its current state, we are deferring it to the next release.

Comment 6 ctomasko 2022-03-15 22:23:23 UTC
Added Release note > known issue

OpenShift Virtualization links a service account token in use by a pod to that specific pod. Kubevirt implements a ServiceAccountVolume by creating a disk image that contains a token. If you migrate a VM, then the ServiceAccountVolume becomes invalid.

As a workaround, use user accounts rather than service accounts because user account tokens are not bound to a specific pod. (BZ#2037611)

https://github.com/openshift/openshift-docs/pull/42530
https://deploy-preview-42530--osdocs.netlify.app/openshift-enterprise/latest/virt/virt-4-10-release-notes#virt-4-10-known-issues

Future link: After the OpenShift Virtualization 4.10 releases, you can find the release notes here: https://docs.openshift.com/container-platform/4.10/virt/virt-4-10-release-notes.html
or on the portal,
https://access.redhat.com/documentation/en-us/openshift_container_platform/4.10

Comment 7 sgott 2022-03-24 14:15:34 UTC
Per comment #4, this BZ is currently not addressable with the current architecture. Deferring it to the next release.

Comment 12 sgott 2022-09-09 15:44:26 UTC
Per comment #4, this BZ is currently not addressable with the current architecture (because rearchitecting this requires virtiofs). Deferring it to the next release.

Comment 20 Antonio Cardace 2023-06-20 12:23:14 UTC
Deferring this to 4.15 as the groundwork needed for this is still not ready (virtiofs in non-root mode).

Comment 21 Fabian Deutsch 2023-09-27 07:26:51 UTC
Lowering priority, mainly because this is a known issue, but a workaround is available and a featur eon track to address this problem.

Comment 22 Antonio Cardace 2023-10-11 12:44:32 UTC
Closing as 'DEFERRED' as we're now tracking this in Jira https://issues.redhat.com/browse/CNV-33835 for 4.17.