Bug 2027092

Summary: Create Kataconfig just after operator been installed in disconnected environment fails sporadically
Product: OpenShift Container Platform Reporter: Victor Voronkov <vvoronko>
Component: sandboxed-containersAssignee: Ariel Adam <aadam>
Status: CLOSED WORKSFORME QA Contact: Cameron Meadors <cmeadors>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.9CC: aadam, augol, cmeadors, jfreiman, pmores
Target Milestone: ---Keywords: Regression
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-09-21 08:41:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Victor Voronkov 2021-11-28 09:21:05 UTC
Description of problem:
Create Kataconfig just after operator been installed fails sporadically in CI:

07:49:48  TASK [install-kata-operator : Create operator subscription] ********************
07:49:48  task path: /var/lib/jenkins/workspace/ocp-olm-setup/ocp-edge-qe/linchpin-workspace/hooks/ansible/ocp-edge-setup/roles/install-kata-operator/tasks/install-kata.yml:198
07:49:48  changed: [provisionhost-0-0] => {"changed": true, "cmd": "set +e\ncat <<EOF | oc apply -f -\napiVersion: operators.coreos.com/v1alpha1\nkind: Subscription\nmetadata:\n  name: kata-operator\n  namespace: \"openshift-sandboxed-containers-operator\"\nspec:\n  channel: \"preview-1.1\"\n  installPlanApproval: Automatic\n  name: sandboxed-containers-operator\n  source: kata-qe-optional-operators\n  sourceNamespace: openshift-marketplace\n  startingCSV: \"sandboxed-containers-operator.v1.1.0\"\nEOF\n", "delta": "0:00:00.329896", "end": "2021-11-28 00:49:47.589984", "failed_when_result": false, "rc": 0, "start": "2021-11-28 00:49:47.260088", "stderr": "", "stderr_lines": [], "stdout": "subscription.operators.coreos.com/kata-operator created", "stdout_lines": ["subscription.operators.coreos.com/kata-operator created"]}
07:49:48  
07:49:48  TASK [install-kata-operator : Make sure the sandboxed-containers is installed] ***
07:49:48  task path: /var/lib/jenkins/workspace/ocp-olm-setup/ocp-edge-qe/linchpin-workspace/hooks/ansible/ocp-edge-setup/roles/install-kata-operator/tasks/install-kata.yml:220
07:49:48  FAILED - RETRYING: Make sure the sandboxed-containers is installed (20 retries left).
07:50:05  FAILED - RETRYING: Make sure the sandboxed-containers is installed (19 retries left).
07:50:20  changed: [provisionhost-0-0] => {"attempts": 3, "changed": true, "cmd": "oc get pods -n \"openshift-sandboxed-containers-operator\" -o json | jq -r '.items[] | select(.metadata.name | test(\"sandboxed-containers-operator-controller-manager-*\")).status.phase'\n", "delta": "0:00:00.118035", "end": "2021-11-28 00:50:19.786984", "rc": 0, "start": "2021-11-28 00:50:19.668949", "stderr": "", "stderr_lines": [], "stdout": "Running", "stdout_lines": ["Running"]}
07:50:20  
07:50:20  TASK [install-kata-operator : Create kataconfig] *******************************
07:50:20  task path: /var/lib/jenkins/workspace/ocp-olm-setup/ocp-edge-qe/linchpin-workspace/hooks/ansible/ocp-edge-setup/roles/install-kata-operator/tasks/install-kata.yml:230
07:50:21  fatal: [provisionhost-0-0]: FAILED! => {"changed": true, "cmd": "set +e\ncat <<EOF | oc apply -f -\napiVersion: kataconfiguration.openshift.io/v1\nkind: KataConfig\nmetadata:\n  name: example-kataconfig\nEOF\n", "delta": "0:00:00.640764", "end": "2021-11-28 00:50:21.107591", "failed_when_result": true, "msg": "non-zero return code", "rc": 1, "start": "2021-11-28 00:50:20.466827", "stderr": "Error from server (InternalError): error when creating \"STDIN\": Internal error occurred: failed calling webhook \"vkataconfig.kb.io\": Post \"https://sandboxed-containers-operator-controller-manager-service.openshift-sandboxed-containers-operator.svc:443/validate-kataconfiguration-openshift-io-v1-kataconfig?timeout=10s\": dial tcp 10.130.0.55:9443: connect: connection refused", "stderr_lines": ["Error from server (InternalError): error when creating \"STDIN\": Internal error occurred: failed calling webhook \"vkataconfig.kb.io\": Post \"https://sandboxed-containers-operator-controller-manager-service.openshift-sandboxed-containers-operator.svc:443/validate-kataconfiguration-openshift-io-v1-kataconfig?timeout=10s\": dial tcp 10.130.0.55:9443: connect: connection refused"], "stdout": "", "stdout_lines": []}


Same issue was spotted both on 4.9 and 4.10 latest builds (with kata 1.1.0)


How reproducible:
Deploy kata operator in disconnected environment
Create kataconfig just after pods are running in openshift-sandboxed-containers-operator namespace

Actual results:
Post \"https://sandboxed-containers-operator-controller-manager-service.openshift-sandboxed-containers-operator.svc:443/validate-kataconfiguration-openshift-io-v1-kataconfig?timeout=10s\": dial tcp 10.130.0.55:9443: connect: connection refused"

Expected results:
Post should succeed and kataconfig applied

Additional info:
Looks like timing issue, fails sporadically in CI

Comment 3 Jens Freimann 2022-09-21 08:41:57 UTC
I can't reproduce this. Please reopen if it occurs again.