Bug 2004510

Summary: openshift-gitops operator hooks gets unauthorized (401) errors during jobs executions
Product: OpenShift Container Platform Reporter: Juan Manuel Parrilla Madrid <jparrill>
Component: Telco EdgeAssignee: Ian Miller <imiller>
Telco Edge sub component: ZTP QA Contact: yliu1
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: unspecified CC: aos-bugs, eparis, mcornea
Version: 4.8   
Target Milestone: ---   
Target Release: 4.10.0   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 2010529 (view as bug list) Environment:
Last Closed: 2022-03-10 16:10:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2010529    

Description Juan Manuel Parrilla Madrid 2021-09-15 13:21:18 UTC
Description of problem:

Once installed OCP, ACM and GitOps operator we face an issue when we use the pre and post hooks from https://github.com/openshift-kni/cnf-features-deploy/ztp/gitops-subscriptions/argocd.

The behaviour is:

- We load the deployment folder as the README.md says
- Our DU manifests (clusters and policies) are being loaded using Argo
- When the PolicyGen tries to execute the hooks an unauthorized error appears

Pre Hook
```
[root@bastion1 gitops-verizon]# oc logs siteconfig-pre-6ncnx
error: failed to create configmap: Unauthorized
ztp-hooks.presync Wed, 15 Sep 2021 13:02:10 +0000 ERROR [pre-sync-entrypoint] Config map of siteconfigs resourceVersion creation failed
```

So we debug the pod and enter in to that, the force the execution sourcing the common.sh to force login into OCP cluster and then the clusters are being loaded, but it appears again on the post hook

Post Hook
```
[root@bastion1 gitops-verizon]# oc logs siteconfig-post-9zksw
ztp-hooks.postsync Wed, 15 Sep 2021 13:04:27 +0000 ERROR [post-sync-entrypoint] Failed to get RAN sites resourceVersion
ztp-hooks.watcher 2021-09-15 13:04:28 UTC [DEBUG]             [watcher:217]: 0, siteconfigs
ztp-hooks.watcher 2021-09-15 13:04:28 UTC [ERROR]             [watcher:53]: (401)
Reason: Unauthorized
HTTP response headers: HTTPHeaderDict({'Audit-Id': '56919b9c-db7f-4d6f-a9a4-7916f94777d9', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'Date': 'Wed, 15 Sep 2021 13:04:28 GMT', 'Content-Length': '129'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}

Traceback (most recent call last):
  File "watcher.py", line 51, in watch_resources
    resource_version=rv, timeout_seconds=5)
  File "/usr/local/lib/python3.6/site-packages/kubernetes/client/api/custom_objects_api.py", line 2087, in list_cluster_custom_object_with_http_info
    collection_formats=collection_formats)
  File "/usr/local/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 353, in call_api
    _preload_content, _request_timeout, _host)
  File "/usr/local/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 184, in __call_api
    _request_timeout=_request_timeout)
  File "/usr/local/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 377, in request
    headers=headers)
  File "/usr/local/lib/python3.6/site-packages/kubernetes/client/rest.py", line 243, in GET
    query_params=query_params)
  File "/usr/local/lib/python3.6/site-packages/kubernetes/client/rest.py", line 233, in request
    raise ApiException(http_resp=r)
kubernetes.client.exceptions.ApiException: (401)
Reason: Unauthorized
HTTP response headers: HTTPHeaderDict({'Audit-Id': '56919b9c-db7f-4d6f-a9a4-7916f94777d9', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'Date': 'Wed, 15 Sep 2021 13:04:28 GMT', 'Content-Length': '129'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}


ztp-hooks.watcher 2021-09-15 13:04:28 UTC [ERROR]             [watcher:223]: 'NoneType' object is not subscriptable
Traceback (most recent call last):
  File "watcher.py", line 221, in <module>
    ApiResponseParser(resp, resourcename=sys.argv[2], debug=debug)
  File "watcher.py", line 122, in __init__
    if api_response[1] != 200:
TypeError: 'NoneType' object is not subscriptable
```

If we repeat the same workaround, debugging the node and forcing the generation it works.

Version-Release number of selected component (if applicable):

OCP: 4.8.12
ACM: 2.3.3-7
Openshift-gitops Operator: latest from channel 4.8
Platform: IPv6/Disconnected


How reproducible:

- always

Steps to Reproduce:
1. Create the NS and Secrets for your clusters
2. Have the repo update with the clusters definitions
3. clone https://github.com/openshift-kni/cnf-features-deploy and fill the https://github.com/openshift-kni/cnf-features-deploy/tree/master/ztp/gitops-subscriptions/argocd/deployment/{policies-app.yaml,cluster-app.yaml}
3. execute oc apply -k on deployment folder

Actual results:

It fails the most of the times (90%)

Expected results:

work most of the times.
Additional info:

Comment 2 Juan Manuel Parrilla Madrid 2021-10-05 13:54:04 UTC
PR https://github.com/openshift-kni/cnf-features-deploy/pull/722 tested on customer environment with:

- OCP 4.8.5  as hub
- OCP 4.8.11 as Spoke 
- ACM 2.3.3
- GitOps Operator on 4.8 channel

Comment 3 yliu1 2021-10-05 15:24:39 UTC
We did not encounter this issue in our test environment with same steps. But from regression perspective, it didn't not introduce any regression. 
Marking this as verified combining the results from Juan in comment #2.

Comment 7 errata-xmlrpc 2022-03-10 16:10:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056