Description of problem: I tried to get the logs of failed installation and got this from the link: {"code":"500","href":"","id":500,"kind":"Error","reason":"No log files were found"} Version-Release number of selected component (if applicable): How reproducible: 1/1 Steps to Reproduce: 1. Create the following: BMH pull-secret manifest that will fail the cluster installation (see content below, though it doesn't really matter how you fail the installation) infra-env agent-cluster-install (with the manifestsConfigMapRef) cluster-deployment 2. describe the Agentclusterinstall once the installation fails 3. curl the logs link to get the logs manifest content: kind: ConfigMap apiVersion: v1 metadata: name: single-node-manifests namespace: assisted-installer data: 99_master_kernel_arg.yaml: | apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: master name: 02-master-workload-partitioning spec: config: ignition: version: 3.2.0 storage: files: - contents: source: data:text/plain;charset=utf-8;base64,W2NyaW8ucnVudGltZS53b3JrbG9hZHMubWFuYWdlbWVudF0KYWN0aXZhdGlvbl9hbm5vdGF0aW9uID0gInRhcmdldC53b3JrbG9hZC5vcGVuc2hpZnQuaW8vbWFuYWdlbWVudCIKYW5ub3RhdGlvbl9wcmVmaXggPSAicmVzb3VyY2VzLndvcmtsb2FkLm9wZW5zaGlmdC5pbyIKcmVzb3VyY2VzID0geyAiY3B1c2hhcmVzIiA9IDAsICJjcHVzZXQiID0gIjAtMSw1Mi01MyIgfQ== mode: 420 overwrite: true path: /etc/crio/crio.conf.d/01-workload-partitioning user: name: root - contents: source: data:text/plain;charset=utf-8;base64,ewogICJtYW5hZ2VtZW50IjogewogICAgImNwdXNldCI6ICIwLTEsNTItNTMiCiAgfQp9 mode: 420 overwrite: true path: /etc/kubernetes/openshift-workload-pinning user: name: root Actual results: {"code":"500","href":"","id":500,"kind":"Error","reason":"No log files were found"} Expected results: Get the cluster logs Additional info: I can log into the host, the assisted-installer-controller did upload the logs: time="2021-06-08T16:09:49Z" level=info msg="Cluster installation failed." time="2021-06-08T16:09:49Z" level=info msg="Waiting for all go routines to finish" time="2021-06-08T16:09:49Z" level=info msg="Finished PostInstallConfigs" time="2021-06-08T16:09:49Z" level=info msg="Finished UpdateBMHs" time="2021-06-08T16:09:49Z" level=info msg="Finished UploadLogs" time="2021-06-08T16:09:49Z" level=info msg="Finished all"
In this specific case we shouldn't get controller logs cause controller was started after cluster was in timeout already. Though we still miss assisted-installer logs that should be sent before reboot.
I try the following scenario with kube API: - Start install of SNO - After reboot, suspend the VM - Wait for installation to fail - Download the logs via DebugInfo URL The logs where available, without the controller log as expected. @ercohen Anything I missed? Do you have the assisted service log?
The issue reproduce if the service pod is restarted after the installation failures: {"code":"500","href":"","id":500,"kind":"Error","reason":"No log files were found"} The logs are not persisted after reboot. @mfilanov WDYT?
I suspect the logs are saved on the pod filesystem and not on a PV.
If we are talking about test-infra then this is probably the case. If it happened with the operator then it's a bug.
Tested with env installed with operator with PV for filesystem and logs are persisted after reboot. @ercohen WDYT?
Reproduce the issue. Note: Once the cluster got deployed successfully, the link actually started to work and I was able to download the log tarball.