1970015 – [master] Missing logs in case cluster installation failed

Bug 1970015 - [master] Missing logs in case cluster installation failed

Summary: [master] Missing logs in case cluster installation failed

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	assisted-installer
Sub Component:
Version:	4.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Fred Rolland
QA Contact:	bjacot
Docs Contact:
URL:
Whiteboard:	KNI-EDGE-JUKE-4.8 AI-Team-Hive
Depends On:
Blocks:	1971288
TreeView+	depends on / blocked

Reported:	2021-06-09 15:56 UTC by Eran Cohen
Modified:	2021-07-20 10:44 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1971288 (view as bug list)
Environment:
Last Closed:	2021-06-29 12:21:35 UTC
Target Upstream Version:
Embargoed:
Flags:	ercohen: needinfo-

Attachments	(Terms of Use)

Description Eran Cohen 2021-06-09 15:56:26 UTC

Description of problem:
I tried to get the logs of failed installation and got this from the link:
{"code":"500","href":"","id":500,"kind":"Error","reason":"No log files were found"}

Version-Release number of selected component (if applicable):


How reproducible:

1/1
Steps to Reproduce:
1. Create the following:
BMH
pull-secret
manifest that will fail the cluster installation (see content below, though it doesn't really matter how you fail the installation)
infra-env
agent-cluster-install (with the manifestsConfigMapRef)
cluster-deployment

2. describe the Agentclusterinstall once the installation fails
3. curl the logs link to get the logs


manifest content:

kind: ConfigMap
apiVersion: v1
metadata:
  name: single-node-manifests
  namespace: assisted-installer
data:
  99_master_kernel_arg.yaml: |
    apiVersion: machineconfiguration.openshift.io/v1
    kind: MachineConfig
    metadata:
      labels:
        machineconfiguration.openshift.io/role: master
      name: 02-master-workload-partitioning
    spec:
      config:
        ignition:
          version: 3.2.0
        storage:
          files:
          - contents:
              source: data:text/plain;charset=utf-8;base64,W2NyaW8ucnVudGltZS53b3JrbG9hZHMubWFuYWdlbWVudF0KYWN0aXZhdGlvbl9hbm5vdGF0aW9uID0gInRhcmdldC53b3JrbG9hZC5vcGVuc2hpZnQuaW8vbWFuYWdlbWVudCIKYW5ub3RhdGlvbl9wcmVmaXggPSAicmVzb3VyY2VzLndvcmtsb2FkLm9wZW5zaGlmdC5pbyIKcmVzb3VyY2VzID0geyAiY3B1c2hhcmVzIiA9IDAsICJjcHVzZXQiID0gIjAtMSw1Mi01MyIgfQ==
            mode: 420
            overwrite: true
            path: /etc/crio/crio.conf.d/01-workload-partitioning
            user:
              name: root
          - contents:
              source: data:text/plain;charset=utf-8;base64,ewogICJtYW5hZ2VtZW50IjogewogICAgImNwdXNldCI6ICIwLTEsNTItNTMiCiAgfQp9
            mode: 420
            overwrite: true
            path: /etc/kubernetes/openshift-workload-pinning
            user:
              name: root



Actual results:

{"code":"500","href":"","id":500,"kind":"Error","reason":"No log files were found"}

Expected results:
Get the cluster logs

Additional info:

I can log into the host, the assisted-installer-controller did upload the logs:
time="2021-06-08T16:09:49Z" level=info msg="Cluster installation failed."
time="2021-06-08T16:09:49Z" level=info msg="Waiting for all go routines to finish"
time="2021-06-08T16:09:49Z" level=info msg="Finished PostInstallConfigs"
time="2021-06-08T16:09:49Z" level=info msg="Finished UpdateBMHs"
time="2021-06-08T16:09:49Z" level=info msg="Finished UploadLogs"
time="2021-06-08T16:09:49Z" level=info msg="Finished all"

Comment 1 Igal Tsoiref 2021-06-10 07:34:17 UTC

In this specific case we shouldn't get controller  logs cause controller was started after cluster was in timeout already. Though we still miss assisted-installer logs that should be sent before reboot.

Comment 2 Fred Rolland 2021-06-15 10:55:33 UTC

I try the following scenario with kube API:
- Start install of SNO
- After reboot, suspend the VM
- Wait for installation to fail
- Download the logs via DebugInfo URL

The logs where available, without the controller log as expected.

@ercohen Anything I missed? Do you have the assisted service log?

Comment 3 Fred Rolland 2021-06-15 11:28:40 UTC

The issue reproduce if the service pod is restarted after the installation failures:

{"code":"500","href":"","id":500,"kind":"Error","reason":"No log files were found"}

The logs are not persisted after reboot.

@mfilanov WDYT?

Comment 4 Eran Cohen 2021-06-15 12:07:39 UTC

I suspect the logs are saved on the pod filesystem and not on a PV.

Comment 5 Michael Filanov 2021-06-15 13:01:13 UTC

If we are talking about test-infra then this is probably the case.
If it happened with the operator then it's a bug.

Comment 6 Fred Rolland 2021-06-15 14:25:14 UTC

Tested with env installed with operator with PV for filesystem and logs are persisted after reboot.

@ercohen WDYT?

Comment 7 Alexander Chuzhoy 2021-06-17 13:28:07 UTC

Reproduce the issue.

Note: Once the cluster got deployed successfully, the link actually started to work and I was able to download the log tarball.

Note You need to log in before you can comment on or make changes to this bug.