Bug 1970015 - [master] Missing logs in case cluster installation failed
Summary: [master] Missing logs in case cluster installation failed
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: assisted-installer
Version: 4.8
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: ---
Assignee: Fred Rolland
QA Contact: bjacot
URL:
Whiteboard: KNI-EDGE-JUKE-4.8 AI-Team-Hive
Depends On:
Blocks: 1971288
TreeView+ depends on / blocked
 
Reported: 2021-06-09 15:56 UTC by Eran Cohen
Modified: 2021-07-20 10:44 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1971288 (view as bug list)
Environment:
Last Closed: 2021-06-29 12:21:35 UTC
Target Upstream Version:
Embargoed:
ercohen: needinfo-


Attachments (Terms of Use)

Description Eran Cohen 2021-06-09 15:56:26 UTC
Description of problem:
I tried to get the logs of failed installation and got this from the link:
{"code":"500","href":"","id":500,"kind":"Error","reason":"No log files were found"}

Version-Release number of selected component (if applicable):


How reproducible:

1/1
Steps to Reproduce:
1. Create the following:
BMH
pull-secret
manifest that will fail the cluster installation (see content below, though it doesn't really matter how you fail the installation)
infra-env
agent-cluster-install (with the manifestsConfigMapRef)
cluster-deployment

2. describe the Agentclusterinstall once the installation fails
3. curl the logs link to get the logs


manifest content:

kind: ConfigMap
apiVersion: v1
metadata:
  name: single-node-manifests
  namespace: assisted-installer
data:
  99_master_kernel_arg.yaml: |
    apiVersion: machineconfiguration.openshift.io/v1
    kind: MachineConfig
    metadata:
      labels:
        machineconfiguration.openshift.io/role: master
      name: 02-master-workload-partitioning
    spec:
      config:
        ignition:
          version: 3.2.0
        storage:
          files:
          - contents:
              source: data:text/plain;charset=utf-8;base64,W2NyaW8ucnVudGltZS53b3JrbG9hZHMubWFuYWdlbWVudF0KYWN0aXZhdGlvbl9hbm5vdGF0aW9uID0gInRhcmdldC53b3JrbG9hZC5vcGVuc2hpZnQuaW8vbWFuYWdlbWVudCIKYW5ub3RhdGlvbl9wcmVmaXggPSAicmVzb3VyY2VzLndvcmtsb2FkLm9wZW5zaGlmdC5pbyIKcmVzb3VyY2VzID0geyAiY3B1c2hhcmVzIiA9IDAsICJjcHVzZXQiID0gIjAtMSw1Mi01MyIgfQ==
            mode: 420
            overwrite: true
            path: /etc/crio/crio.conf.d/01-workload-partitioning
            user:
              name: root
          - contents:
              source: data:text/plain;charset=utf-8;base64,ewogICJtYW5hZ2VtZW50IjogewogICAgImNwdXNldCI6ICIwLTEsNTItNTMiCiAgfQp9
            mode: 420
            overwrite: true
            path: /etc/kubernetes/openshift-workload-pinning
            user:
              name: root



Actual results:

{"code":"500","href":"","id":500,"kind":"Error","reason":"No log files were found"}

Expected results:
Get the cluster logs

Additional info:

I can log into the host, the assisted-installer-controller did upload the logs:
time="2021-06-08T16:09:49Z" level=info msg="Cluster installation failed."
time="2021-06-08T16:09:49Z" level=info msg="Waiting for all go routines to finish"
time="2021-06-08T16:09:49Z" level=info msg="Finished PostInstallConfigs"
time="2021-06-08T16:09:49Z" level=info msg="Finished UpdateBMHs"
time="2021-06-08T16:09:49Z" level=info msg="Finished UploadLogs"
time="2021-06-08T16:09:49Z" level=info msg="Finished all"

Comment 1 Igal Tsoiref 2021-06-10 07:34:17 UTC
In this specific case we shouldn't get controller  logs cause controller was started after cluster was in timeout already. Though we still miss assisted-installer logs that should be sent before reboot.

Comment 2 Fred Rolland 2021-06-15 10:55:33 UTC
I try the following scenario with kube API:
- Start install of SNO
- After reboot, suspend the VM
- Wait for installation to fail
- Download the logs via DebugInfo URL

The logs where available, without the controller log as expected.

@ercohen Anything I missed? Do you have the assisted service log?

Comment 3 Fred Rolland 2021-06-15 11:28:40 UTC
The issue reproduce if the service pod is restarted after the installation failures:

{"code":"500","href":"","id":500,"kind":"Error","reason":"No log files were found"}

The logs are not persisted after reboot.

@mfilanov WDYT?

Comment 4 Eran Cohen 2021-06-15 12:07:39 UTC
I suspect the logs are saved on the pod filesystem and not on a PV.

Comment 5 Michael Filanov 2021-06-15 13:01:13 UTC
If we are talking about test-infra then this is probably the case.
If it happened with the operator then it's a bug.

Comment 6 Fred Rolland 2021-06-15 14:25:14 UTC
Tested with env installed with operator with PV for filesystem and logs are persisted after reboot.

@ercohen WDYT?

Comment 7 Alexander Chuzhoy 2021-06-17 13:28:07 UTC
Reproduce the issue.

Note: Once the cluster got deployed successfully, the link actually started to work and I was able to download the log tarball.


Note You need to log in before you can comment on or make changes to this bug.