Description of problem: Network attach detach script is failing during regression testing with "Failed to delete port with name or ID 'edd7e958-9b51-44f2-b6eb-facc985b5d15': HttpException: 504: Server Error for url: https://10.0.0.135:13696/v2.0/ports/edd7e958-9b51-44f2-b6eb-facc985b5d15, The server didn't respond in time.: 504 Gateway Time-out" Version-Release number of selected component (if applicable): How reproducible: everytime with script posted below Steps to Reproduce: 1. run ironic-network-attach-detach job and/or script which includes running 'openstack port delete ...' 2. 3. Actual results: port delete failed Expected results: port delete succeeds Additional info: script: #!/bin/bash # There should be only one running nova istance at this point source /home/stack/overcloudrc IRONIC_NODE=`openstack baremetal node list -f value|awk '/active/ {print $1}'` NEWIP=192.168.24.24 # Display list openstack baremetal node list IRONIC_NODE_VIF_ID=$(openstack baremetal node vif list $IRONIC_NODE -f value|head -1) NOVA_INSTANCE_NAME=$(openstack server list -f value |awk '/ACTIVE/ {print $1}') openstack port delete $IRONIC_NODE_VIF_ID sleep 30 openstack port create --network baremetal --fixed-ip ip-address=$NEWIP $IRONIC_NODE-extra sleep 300 openstack server add port $NOVA_INSTANCE_NAME "$IRONIC_NODE-extra" sleep 120 # reboot instance to get network refresh openstack server reboot $NOVA_INSTANCE_NAME sleep 300 openstack server list -c Name -c Status | grep $NOVA_INSTANCE_NAME ssh -q -o strictHostKeyChecking=no cloud-user@$NEWIP "echo success" if [ "$?" = "0" ]; then echo -e "\tSSH cloud-user@$NEWIP is successful" else echo -e "\tSSH cloud-user@$NEWIP FAILED!" sleep 2 exit 1 fi log and job to follow blocks a regression job
It looks like mariadb dies at 2022-12-05 23:10:47 but its not clear to us what is happening. Tagging in Pidone to take a look at these logs and hopefully provide some insight: http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/staging/DFG-hardware_provisioning-rqci-17.1-3cont_2comp_2ironic-ipv4-geneve-network_attach_detach-IR-OC_Ironic/2/controller-2/var/log/containers/neutron/server.log.gz http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/staging/DFG-hardware_provisioning-rqci-17.1-3cont_2comp_2ironic-ipv4-geneve-network_attach_detach-IR-OC_Ironic/2/controller-2/var/log/containers/mysql/mysqld.log.gz http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/staging/DFG-hardware_provisioning-rqci-17.1-3cont_2comp_2ironic-ipv4-geneve-network_attach_detach-IR-OC_Ironic/2/controller-2/var/log/containers/stdouts/galera-bundle.log.gz
Considering the logs have disk IO errors, this is likely caused by the thin pool metadata volume filling up, which is being tracked by bug #2149586 *** This bug has been marked as a duplicate of bug 2149586 ***