Bug 2119309

Summary: readinessProbe in VM stays on failed
Product: Container Native Virtualization (CNV) Reporter: Roni Kishner <rkishner>
Component: InfrastructureAssignee: Javier Cano Cano <jcanocan>
Status: CLOSED ERRATA QA Contact: Roni Kishner <rkishner>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.11.0CC: dholler
Target Milestone: ---   
Target Release: 4.12.0   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-01-24 13:39:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Roni Kishner 2022-08-18 09:38:53 UTC
Description of problem:
When creating the file needed for the readinessProbe the readiness probe keeps staying on failed.
This should work the same way as described in the docs - https://github.com/kubevirt/kubevirt/blob/b52ebdc081868bf961d70683b2ec8d071fad6232/docs/probes.md#example

Version-Release number of selected component (if applicable):
4.11

How reproducible:
100%

Steps to Reproduce:
1. Create a VM using readinessProbe param:
spec:
  template:
    spec:
      readinessProbe:
        exec:
          command:
          - cat /tmp/healthy.txt
2. Watch the VM events using "oc get events -n <namespace>"
3. Start the VM (notice the readiness probe fails)
2. Console into the VM and create the file /tmp/healthy.txt
3. Validate the file was created using "cat /tmp/healthy.txt" command
4. Watch the VM events

Actual results:
The readiness probe keeps failing (VMI stays on "not ready")

Expected results:
The VMI should change to "ready"

Additional info:
The probe failure msg is:
Readiness probe failed: {"component":"virt-probe","level":"fatal","msg":"Failed executing the command","pos":"virt-probe.go:70","reason":"rpc error: code = Unknown desc = virError(Code=1, Domain=10, Message='internal error: unable to execute QEMU agent command 'guest-exec': Guest agent command failed, error was 'Failed to execute child process “cat /tmp/healthy.txt” (No such file or directory)'')","timestamp":"2022-08-18T07:54:53.312885Z"}...

I also tried using:
  command:
    - cat
    - /tmp/healthy.txt

The issue still persist with this, the only difference is the probe failure msg

Comment 3 Dominik Holler 2022-08-18 14:51:55 UTC
To keep the required permissions minimal, I would recommend to use 'test' instead of 'cat', e.g.:
"""
      readinessProbe:
        exec:
          command: ["test", "-f", "/tmp/healthy.txt"]
"""

In addition, the probe will only succeed, if the file /tmp/healthy.txt is created inside the VM, for testing porposes this could be done in cloud init, e.g.:
"""

      - cloudInitNoCloud:
          userData: |-
            #cloud-config
            chpasswd:
              expire: false
            password: xxxx
            user: fedora

            ssh_authorized_keys:
             [ssh-rsa AAAAB3Nxxx]
            runcmd: [ ..., 'touch /tmp/healthy.txt']
"""

@rkishner Do you agree that this bug might be used to track improving the documentation, instead of improving the software?

Comment 5 Roni Kishner 2022-08-18 16:55:11 UTC
Yes, only needed to change the docs if the solution you suggested works.

On a side note, I do wonder if in the past this did work in the current written method.

Comment 9 errata-xmlrpc 2023-01-24 13:39:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Virtualization 4.12.0 Images security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:0408