Bug 1881033
| Summary: | [Assisted-4.6][Staging] Agent unable to start container during discovery "podman[1973]: Error: error creating container storage: size for layer "layer_sha" is unknown, failing getSize() | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | nshidlin <nshidlin> |
| Component: | assisted-installer | Assignee: | Ori Amizur <oamizur> |
| assisted-installer sub component: | discovery-agent | QA Contact: | Yuri Obshansky <yobshans> |
| Status: | CLOSED CURRENTRELEASE | Docs Contact: | |
| Severity: | unspecified | ||
| Priority: | urgent | CC: | lalon, sasha |
| Version: | 4.6 | Keywords: | TestBlocker |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | OCP-Metal-v1.0.9.5 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-08-28 08:45:59 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Maybe this is a timeout issue of the systemd service pre-step I have one environment (out of 2) which constantly reproduce this issue (seal32). Hosts are getting up to 20 retries. Agent fails to start. When manually ran: sudo podman pull registry.stage.redhat.io/openshift4/assisted-installer-agent-rhel8:v4.6.0-15 issue resolves. so i guess its timeout issue. Reproducible on ocp-edge33.lab.eng.tlv2.redhat.com 1 master node failed 2 masters and 2 workers are OK Sep 21 15:18:24 master-0-1 podman[6154]: Error: error creating container storage: size for layer "ccf04fbd6e1943f648d1c2980e96038edc02b543c597556> Sep 21 15:18:24 master-0-1 systemd[1]: agent.service: Control process exited, code=exited status=125 Sep 21 15:18:24 master-0-1 systemd[1]: agent.service: Failed with result 'exit-code'. Sep 21 15:18:24 master-0-1 systemd[1]: Failed to start agent.service. issue persists on staging:
{
"release_tag": "v1.0.9.5-ds",
"versions": {
"assisted-ignition-generator": "quay.io/ocpmetal/assisted-ignition-generator:v1.0.9.5",
"assisted-installer": "registry.stage.redhat.io/openshift4/assisted-installer-rhel8:v4.6.0-19",
"assisted-installer-controller": "registry.stage.redhat.io/openshift4/assisted-installer-reporter-rhel8:v4.6.0-15",
"assisted-installer-service": "quay.io/app-sre/assisted-service:7fd51db",
"discovery-agent": "registry.stage.redhat.io/openshift4/assisted-installer-agent-rhel8:v4.6.0-15",
"image-builder": "quay.io/app-sre/assisted-iso-create:7fd51db"
}
}
Verified on staging:
{
"release_tag": "v1.0.9.5-ds",
"versions": {
"assisted-ignition-generator": "quay.io/ocpmetal/assisted-ignition-generator:v1.0.9.5",
"assisted-installer": "registry.stage.redhat.io/openshift4/assisted-installer-rhel8:v4.6.0-19",
"assisted-installer-controller": "registry.stage.redhat.io/openshift4/assisted-installer-reporter-rhel8:v4.6.0-15",
"assisted-installer-service": "quay.io/app-sre/assisted-service:27cfe0d",
"discovery-agent": "registry.stage.redhat.io/openshift4/assisted-installer-agent-rhel8:v4.6.0-15",
"image-builder": "quay.io/app-sre/assisted-iso-create:27cfe0d"
}
}
|
Description of problem: When booting hosts into ISO some or all of hosts are not discovered by the service. agent logs: Sep 21 11:41:29 master-0-1 podman[1860]: Trying to pull registry.stage.redhat.io/openshift4/assisted-installer-agent-rhel8:v4.6.0-15... Sep 21 11:41:34 master-0-1 podman[1860]: Getting image source signatures Sep 21 11:41:36 master-0-1 podman[1860]: Copying blob sha256:9d0d09b1ea44d90760e85af2ce11a721c1fd9c646015000e4453b10c661fd21c Sep 21 11:41:36 master-0-1 podman[1860]: Copying blob sha256:8e7ef64fd2cf1d5fbafc966ca5339400a2a8d26dcd5025cae0bffeb76155b005 Sep 21 11:41:36 master-0-1 podman[1860]: Copying blob sha256:c4d668e229cd131e0a8e4f8218dca628d9cf9697572875e355fe4b247b6aa9f0 Sep 21 11:41:36 master-0-1 podman[1860]: Copying blob sha256:ec1681b6a383e4ecedbeddd5abc596f3de835aed6db39a735f62395c8edbff30 Sep 21 11:42:55 master-0-1 podman[1860]: Copying config sha256:8a15567bae378372803735ca3e4359cd5d91057b30ae54631c3ba82a7e6660fc Sep 21 11:42:55 master-0-1 podman[1860]: Writing manifest to image destination Sep 21 11:42:55 master-0-1 podman[1860]: Storing signatures Sep 21 11:42:59 master-0-1 systemd[1]: agent.service: Start-pre operation timed out. Terminating. Sep 21 11:42:59 master-0-1 systemd[1]: agent.service: Failed with result 'timeout'. Sep 21 11:42:59 master-0-1 systemd[1]: Failed to start agent.service. Sep 21 11:43:03 master-0-1 systemd[1]: agent.service: Service RestartSec=3s expired, scheduling restart. Sep 21 11:43:03 master-0-1 systemd[1]: agent.service: Scheduled restart job, restart counter is at 6. Sep 21 11:43:03 master-0-1 systemd[1]: Stopped agent.service. Sep 21 11:43:03 master-0-1 systemd[1]: Starting agent.service... Sep 21 11:43:03 master-0-1 podman[1973]: Trying to pull registry.stage.redhat.io/openshift4/assisted-installer-agent-rhel8:v4.6.0-15... Sep 21 11:43:08 master-0-1 podman[1973]: Getting image source signatures Sep 21 11:43:08 master-0-1 podman[1973]: Copying blob sha256:ec1681b6a383e4ecedbeddd5abc596f3de835aed6db39a735f62395c8edbff30 Sep 21 11:43:08 master-0-1 podman[1973]: Copying blob sha256:c4d668e229cd131e0a8e4f8218dca628d9cf9697572875e355fe4b247b6aa9f0 Sep 21 11:43:08 master-0-1 podman[1973]: Copying blob sha256:9d0d09b1ea44d90760e85af2ce11a721c1fd9c646015000e4453b10c661fd21c Sep 21 11:43:09 master-0-1 podman[1973]: Copying blob sha256:8e7ef64fd2cf1d5fbafc966ca5339400a2a8d26dcd5025cae0bffeb76155b005 Sep 21 11:43:36 master-0-1 podman[1973]: Copying config sha256:8a15567bae378372803735ca3e4359cd5d91057b30ae54631c3ba82a7e6660fc Sep 21 11:43:36 master-0-1 podman[1973]: Writing manifest to image destination Sep 21 11:43:36 master-0-1 podman[1973]: Storing signatures Sep 21 11:43:36 master-0-1 podman[1973]: Error: error creating container storage: size for layer "053c169f70b03d18472b4004472910fdb2465d55ae9335c422f2cd4b7479a21e" is unknown, failing getSize() Sep 21 11:43:36 master-0-1 systemd[1]: agent.service: Control process exited, code=exited status=125 Sep 21 11:43:36 master-0-1 systemd[1]: agent.service: Failed with result 'exit-code'. Sep 21 11:43:36 master-0-1 systemd[1]: Failed to start agent.service. Sep 21 11:43:40 master-0-1 systemd[1]: agent.service: Service RestartSec=3s expired, scheduling restart. Sep 21 11:43:40 master-0-1 systemd[1]: agent.service: Scheduled restart job, restart counter is at 7. Sep 21 11:43:40 master-0-1 systemd[1]: Stopped agent.service. Version-Release number of selected component (if applicable): Staging: { "release_tag": "v1.0.9.4-ds", "versions": { "assisted-ignition-generator": "quay.io/ocpmetal/assisted-ignition-generator:v1.0.9.4", "assisted-installer": "registry.stage.redhat.io/openshift4/assisted-installer-rhel8:v4.6.0-19", "assisted-installer-controller": "registry.stage.redhat.io/openshift4/assisted-installer-reporter-rhel8:v4.6.0-15", "assisted-installer-service": "quay.io/app-sre/assisted-service:b793c52", "discovery-agent": "registry.stage.redhat.io/openshift4/assisted-installer-agent-rhel8:v4.6.0-15", "image-builder": "quay.io/app-sre/assisted-iso-create:b793c52" } } How reproducible: This is not reproduced every time. Also when it is reproduced, somtimes this happens with all the nodes and sometimes with only one node. Steps to Reproduce: 1. Create Cluster 2. Generate and download ISO 3. Boot nodes into ISO Actual results: Node discovery fails Expected results: Nodes to be discovered by service Additional info: