Bug 1970015
| Summary: | [master] Missing logs in case cluster installation failed | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Eran Cohen <ercohen> | |
| Component: | assisted-installer | Assignee: | Fred Rolland <frolland> | |
| assisted-installer sub component: | Deployment Operator | QA Contact: | bjacot | |
| Status: | CLOSED NOTABUG | Docs Contact: | ||
| Severity: | urgent | |||
| Priority: | urgent | CC: | akrzos, alazar, aos-bugs, frolland, itsoiref, mfilanov, sasha | |
| Version: | 4.8 | Keywords: | Triaged | |
| Target Milestone: | --- | Flags: | ercohen:
needinfo-
|
|
| Target Release: | --- | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | KNI-EDGE-JUKE-4.8 AI-Team-Hive | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1971288 (view as bug list) | Environment: | ||
| Last Closed: | 2021-06-29 12:21:35 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1971288 | |||
In this specific case we shouldn't get controller logs cause controller was started after cluster was in timeout already. Though we still miss assisted-installer logs that should be sent before reboot. I try the following scenario with kube API: - Start install of SNO - After reboot, suspend the VM - Wait for installation to fail - Download the logs via DebugInfo URL The logs where available, without the controller log as expected. @ercohen Anything I missed? Do you have the assisted service log? The issue reproduce if the service pod is restarted after the installation failures:
{"code":"500","href":"","id":500,"kind":"Error","reason":"No log files were found"}
The logs are not persisted after reboot.
@mfilanov WDYT?
I suspect the logs are saved on the pod filesystem and not on a PV. If we are talking about test-infra then this is probably the case. If it happened with the operator then it's a bug. Tested with env installed with operator with PV for filesystem and logs are persisted after reboot. @ercohen WDYT? Reproduce the issue. Note: Once the cluster got deployed successfully, the link actually started to work and I was able to download the log tarball. |
Description of problem: I tried to get the logs of failed installation and got this from the link: {"code":"500","href":"","id":500,"kind":"Error","reason":"No log files were found"} Version-Release number of selected component (if applicable): How reproducible: 1/1 Steps to Reproduce: 1. Create the following: BMH pull-secret manifest that will fail the cluster installation (see content below, though it doesn't really matter how you fail the installation) infra-env agent-cluster-install (with the manifestsConfigMapRef) cluster-deployment 2. describe the Agentclusterinstall once the installation fails 3. curl the logs link to get the logs manifest content: kind: ConfigMap apiVersion: v1 metadata: name: single-node-manifests namespace: assisted-installer data: 99_master_kernel_arg.yaml: | apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: master name: 02-master-workload-partitioning spec: config: ignition: version: 3.2.0 storage: files: - contents: source: data:text/plain;charset=utf-8;base64,W2NyaW8ucnVudGltZS53b3JrbG9hZHMubWFuYWdlbWVudF0KYWN0aXZhdGlvbl9hbm5vdGF0aW9uID0gInRhcmdldC53b3JrbG9hZC5vcGVuc2hpZnQuaW8vbWFuYWdlbWVudCIKYW5ub3RhdGlvbl9wcmVmaXggPSAicmVzb3VyY2VzLndvcmtsb2FkLm9wZW5zaGlmdC5pbyIKcmVzb3VyY2VzID0geyAiY3B1c2hhcmVzIiA9IDAsICJjcHVzZXQiID0gIjAtMSw1Mi01MyIgfQ== mode: 420 overwrite: true path: /etc/crio/crio.conf.d/01-workload-partitioning user: name: root - contents: source: data:text/plain;charset=utf-8;base64,ewogICJtYW5hZ2VtZW50IjogewogICAgImNwdXNldCI6ICIwLTEsNTItNTMiCiAgfQp9 mode: 420 overwrite: true path: /etc/kubernetes/openshift-workload-pinning user: name: root Actual results: {"code":"500","href":"","id":500,"kind":"Error","reason":"No log files were found"} Expected results: Get the cluster logs Additional info: I can log into the host, the assisted-installer-controller did upload the logs: time="2021-06-08T16:09:49Z" level=info msg="Cluster installation failed." time="2021-06-08T16:09:49Z" level=info msg="Waiting for all go routines to finish" time="2021-06-08T16:09:49Z" level=info msg="Finished PostInstallConfigs" time="2021-06-08T16:09:49Z" level=info msg="Finished UpdateBMHs" time="2021-06-08T16:09:49Z" level=info msg="Finished UploadLogs" time="2021-06-08T16:09:49Z" level=info msg="Finished all"