Bug 1939234
| Summary: | 16.1 Introspection Times Out after node scaled up | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | David Rosenfeld <drosenfe> |
| Component: | openstack-tripleo-common | Assignee: | Steve Baker <sbaker> |
| Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | David Rosenfeld <drosenfe> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 16.1 (Train) | CC: | mburns, pweeks, sbaker, slinaber |
| Target Milestone: | --- | Keywords: | Triaged |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-12-15 22:11:59 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
David Rosenfeld
2021-03-15 19:18:35 UTC
Logs to a failing test(see: ir-cloud-config-scale-up.log): https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/view/DFG/view/df/view/rfe/job/DFG-df-rfe-16.1-virsh-3cont_3db_3msg_2net_2comp_3ceph-blacklist-2networker-compute-replacement/32/ See this is in ir-cloud-config-scale-up.log: Waiting for introspection to finish... Introspection of node attempt failed:f4118ceb-d221-41e5-95d3-c1357e13fd6e. Retrying 1 nodes that failed introspection. Attempt 1 of 3 Introspection of node attempt failed:f4118ceb-d221-41e5-95d3-c1357e13fd6e. Retrying 1 nodes that failed introspection. Attempt 2 of 3 Introspection of node attempt failed:f4118ceb-d221-41e5-95d3-c1357e13fd6e. Retrying 1 nodes that failed introspection. Attempt 3 of 3 Introspection of node attempt failed:f4118ceb-d221-41e5-95d3-c1357e13fd6e. Retry limit reached with 1 nodes still failing introspection STDERR: Waiting for messages on queue 'tripleo' with no timeout. Introspection completed with errors: Retry limit reached with 1 nodes still failing introspection We're going to need to see the inspector logs for this failure, its not obvious if those are available in the jira job. Can exact path to inspector logs be provided? Will get logs if path is provided. It will be /var/log/ironic/deploy/<node uuid>.tar.gz or something similar whoops, inspector logs are /var/log/ironic-inspector/ramdisk/<file including datestamp>.tar.gz or similar The Jenkins instance was replaced since this BZ was written. This is link to the failing job in the archived Jenkins instance: https://rhos-ci-jenkins-history.lab.eng.tlv2.redhat.com/view/DFG/view/df/view/rfe/job/DFG-df-rfe-16.1-virsh-3cont_3db_3msg_2net_2comp_3ceph-blacklist-2networker-compute-replacement/32/ This is link to /var/log directory on undercloud: http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-df-rfe-16.1-virsh-3cont_3db_3msg_2net_2comp_3ceph-blacklist-2networker-compute-replacement/32/undercloud-0/var/log/ These is no ironic-inspector log directory. I think we need access to an environment after the inspection times out. We can re-trigger the inspection manually and observe the issue. The introspection timeout has been seen(and is still being seen) many times in the CI environment. Haven't been able to recreate on my server that I can give you access too. Will keep trying. Note: not sure if its related, but am also having problem in CI environment during the scale up job with ipmitool returning: Set Chassis Power Control to Up/On failed: Unspecified error Have written a Jira for that: https://projects.engineering.redhat.com/browse/RHOSINFRA-3971 The Scale up regression test is failing with either the ipmitool error or the introspection timeout. Lets close this for now until we get some better data to diagnose. The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days |