Hide Forgot
Description of problem: Due to filesystem full on undercloud node IPMI power status failed to create the password file with error: Failed to create the password file. [Errno 28] No space left on device: '/tmp/tmp6DSQwt. As the filesystem was full so no more logs during that period & looks like it leads to " maintenance_reason | During sync_power_state, max retries exceeded for node xx, node state None does not match expected state 'power on'. Updating DB state to 'None' Switching node to maintenance mode. But ironic node-show is not showing any instance_uuid/info so just setting maintance mode & power status to on/true will not help . - whats the best way to recover from this situation. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
uploaded complete nova / ironic logs to collab-shell.usersys.redhat.com
(In reply to Anil Dhingra from comment #11) > uploaded complete nova / ironic logs to collab-shell.usersys.redhat.com Thanks for that, I looked at the logs and it's all very strage. E.g: $ grep -Ri deployed ironic-logs-060916.tar.gz/var/log/ironic/ironic-conductor.log | wc -l 0 $ grep -R provision ironic-logs-060916.tar.gz/var/log/ironic/ironic-api.log 2016-09-06 13:17:18.023 16162 INFO ironic.cmd.api [-] conductor.check_provision_state_interval = 60 I can't find a single line showing the request to deploy an instance in Ironic, maybe it happened before the logs date (it starts at 2016-05-25 03:58:32.179) Also, looking at the nova-compute logs from 2016-07-24 to 2016-07-29 they are filled up with errors, more specifically: 2016-07-28 03:31:41.174 19392 ERROR nova.compute.manager [-] [instance: 23b34ae9-81a3-4405-859f-105ce9e6c803] Periodic sync_power_state task had an error while processing an instance. 2016-07-28 03:31:41.174 19392 TRACE nova.compute.manager [instance: 23b34ae9-81a3-4405-859f-105ce9e6c803] Traceback (most recent call last): 2016-07-28 03:31:41.174 19392 TRACE nova.compute.manager [instance: 23b34ae9-81a3-4405-859f-105ce9e6c803] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 6100, in _sync 2016-07-28 03:31:41.174 19392 TRACE nova.compute.manager [instance: 23b34ae9-81a3-4405-859f-105ce9e6c803] query_driver_power_state_and_sync() 2016-07-28 03:31:41.174 19392 TRACE nova.compute.manager [instance: 23b34ae9-81a3-4405-859f-105ce9e6c803] File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 445, in inner 2016-07-28 03:31:41.174 19392 TRACE nova.compute.manager [instance: 23b34ae9-81a3-4405-859f-105ce9e6c803] return f(*args, **kwargs) 2016-07-28 03:31:41.174 19392 TRACE nova.compute.manager [instance: 23b34ae9-81a3-4405-859f-105ce9e6c803] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 6097, in query_driver_power_state_and_sync 2016-07-28 03:31:41.174 19392 TRACE nova.compute.manager [instance: 23b34ae9-81a3-4405-859f-105ce9e6c803] self._query_driver_power_state_and_sync(context, db_instance) 2016-07-28 03:31:41.174 19392 TRACE nova.compute.manager [instance: 23b34ae9-81a3-4405-859f-105ce9e6c803] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 6127, in _query_driver_power_state_and_sync 2016-07-28 03:31:41.174 19392 TRACE nova.compute.manager [instance: 23b34ae9-81a3-4405-859f-105ce9e6c803] vm_instance = self.driver.get_info(db_instance) 2016-07-28 03:31:41.174 19392 TRACE nova.compute.manager [instance: 23b34ae9-81a3-4405-859f-105ce9e6c803] File "/usr/lib/python2.7/site-packages/nova/virt/ironic/driver.py", line 557, in get_info 2016-07-28 03:31:41.174 19392 TRACE nova.compute.manager [instance: 23b34ae9-81a3-4405-859f-105ce9e6c803] node = _validate_instance_and_node(self.ironicclient, instance) 2016-07-28 03:31:41.174 19392 TRACE nova.compute.manager [instance: 23b34ae9-81a3-4405-859f-105ce9e6c803] File "/usr/lib/python2.7/site-packages/nova/virt/ironic/driver.py", line 127, in _validate_instance_and_node 2016-07-28 03:31:41.174 19392 TRACE nova.compute.manager [instance: 23b34ae9-81a3-4405-859f-105ce9e6c803] return ironicclient.call("node.get_by_instance_uuid", instance.uuid) 2016-07-28 03:31:41.174 19392 TRACE nova.compute.manager [instance: 23b34ae9-81a3-4405-859f-105ce9e6c803] File "/usr/lib/python2.7/site-packages/nova/virt/ironic/client_wrapper.py", line 126, in call 2016-07-28 03:31:41.174 19392 TRACE nova.compute.manager [instance: 23b34ae9-81a3-4405-859f-105ce9e6c803] return self._multi_getattr(client, method)(*args, **kwargs) 2016-07-28 03:31:41.174 19392 TRACE nova.compute.manager [instance: 23b34ae9-81a3-4405-859f-105ce9e6c803] File "/usr/lib/python2.7/site-packages/ironicclient/v1/node.py", line 147, in get_by_instance_uuid 2016-07-28 03:31:41.174 19392 TRACE nova.compute.manager [instance: 23b34ae9-81a3-4405-859f-105ce9e6c803] nodes = self._list(self._path(path), 'nodes') 2016-07-28 03:31:41.174 19392 TRACE nova.compute.manager [instance: 23b34ae9-81a3-4405-859f-105ce9e6c803] File "/usr/lib/python2.7/site-packages/ironicclient/common/base.py", line 121, in _list 2016-07-28 03:31:41.174 19392 TRACE nova.compute.manager [instance: 23b34ae9-81a3-4405-859f-105ce9e6c803] resp, body = self.api.json_request('GET', url) 2016-07-28 03:31:41.174 19392 TRACE nova.compute.manager [instance: 23b34ae9-81a3-4405-859f-105ce9e6c803] File "/usr/lib/python2.7/site-packages/ironicclient/common/http.py", line 353, in json_request 2016-07-28 03:31:41.174 19392 TRACE nova.compute.manager [instance: 23b34ae9-81a3-4405-859f-105ce9e6c803] resp, body_iter = self._http_request(url, method, **kwargs) 2016-07-28 03:31:41.174 19392 TRACE nova.compute.manager [instance: 23b34ae9-81a3-4405-859f-105ce9e6c803] File "/usr/lib/python2.7/site-packages/ironicclient/common/http.py", line 162, in wrapper 2016-07-28 03:31:41.174 19392 TRACE nova.compute.manager [instance: 23b34ae9-81a3-4405-859f-105ce9e6c803] return func(self, url, method, **kwargs) 2016-07-28 03:31:41.174 19392 TRACE nova.compute.manager [instance: 23b34ae9-81a3-4405-859f-105ce9e6c803] File "/usr/lib/python2.7/site-packages/ironicclient/common/http.py", line 298, in _http_request 2016-07-28 03:31:41.174 19392 TRACE nova.compute.manager [instance: 23b34ae9-81a3-4405-859f-105ce9e6c803] resp = conn.getresponse() 2016-07-28 03:31:41.174 19392 TRACE nova.compute.manager [instance: 23b34ae9-81a3-4405-859f-105ce9e6c803] File "/usr/lib64/python2.7/httplib.py", line 1051, in getresponse 2016-07-28 03:31:41.174 19392 TRACE nova.compute.manager [instance: 23b34ae9-81a3-4405-859f-105ce9e6c803] response.begin() 2016-07-28 03:31:41.174 19392 TRACE nova.compute.manager [instance: 23b34ae9-81a3-4405-859f-105ce9e6c803] File "/usr/lib64/python2.7/httplib.py", line 415, in begin 2016-07-28 03:31:41.174 19392 TRACE nova.compute.manager [instance: 23b34ae9-81a3-4405-859f-105ce9e6c803] version, status, reason = self._read_status() 2016-07-28 03:31:41.174 19392 TRACE nova.compute.manager [instance: 23b34ae9-81a3-4405-859f-105ce9e6c803] File "/usr/lib64/python2.7/httplib.py", line 379, in _read_status 2016-07-28 03:31:41.174 19392 TRACE nova.compute.manager [instance: 23b34ae9-81a3-4405-859f-105ce9e6c803] raise BadStatusLine(line) 2016-07-28 03:31:41.174 19392 TRACE nova.compute.manager [instance: 23b34ae9-81a3-4405-859f-105ce9e6c803] BadStatusLine: '' $ grep -R "error while processing an instance" nova-compute.log-20160726 nova-compute.log-20160728 nova-compute.log-20160730 | wc -l 7000 ... So again, this environment does not seem reliable at all. I can only guess at this point and I think that something external caused the failures, I still think that something happened with the database (so please check the logs as asked on comment #8). But that said, I really don't think there's much we can do here, the environment is just not right.
Sorry, we can't really do anything, if the underlying database is not reliable.