Description of problem: Since I have seen this issue twice in a week, here is a report. While re-deploying the overcloud I noticed that all nodes where stuck and had a failed connection towards the ironic services. On the undercloud the ironic-api service was stopped: [root@undercloud ~]# systemctl |grep ironic ● openstack-ironic-api.service loaded failed failed OpenStack Ironic API service openstack-ironic-conductor.service loaded active running OpenStack Ironic Conductor service openstack-ironic-inspector-dnsmasq.service loaded active running PXE boot dnsmasq service for Ironic Inspector openstack-ironic-inspector.service loaded active running Hardware introspection service for OpenStack Ironic Corresponding logs: 2015-11-13 23:30:19.537 19559 DEBUG keystonemiddleware.auth_token [-] Authenticating user token process_request /usr/lib/python2.7/site-packages/keystonemiddleware/auth_token/__init__.py:480 2015-11-13 23:30:19.880 19559 DEBUG keystonemiddleware.auth_token [-] Storing token in cache store /usr/lib/python2.7/site-packages/keystonemiddleware/auth_token/_cache.py:170 2015-11-13 23:30:19.884 19559 DEBUG keystonemiddleware.auth_token [-] Received request from user: user_id 61c4f345f52c40769b99090ea37b2e1f, project_id 5c4817932d9b4ae39f015a5ec046c7a8, roles admin,_member_ process_request /usr/lib/python2.7/site-packages/keystonemiddleware/auth_token/__init__.py:759 2015-11-13 23:30:22.048 19559 CRITICAL ironic [-] IOError: [Errno 32] Broken pipe 2015-11-13 23:30:22.048 19559 ERROR ironic Traceback (most recent call last): 2015-11-13 23:30:22.048 19559 ERROR ironic File "/usr/bin/ironic-api", line 10, in <module> 2015-11-13 23:30:22.048 19559 ERROR ironic sys.exit(main()) 2015-11-13 23:30:22.048 19559 ERROR ironic File "/usr/lib/python2.7/site-packages/ironic/cmd/api.py", line 64, in main 2015-11-13 23:30:22.048 19559 ERROR ironic wsgi.serve_forever() 2015-11-13 23:30:22.048 19559 ERROR ironic File "/usr/lib64/python2.7/SocketServer.py", line 236, in serve_forever 2015-11-13 23:30:22.048 19559 ERROR ironic poll_interval) 2015-11-13 23:30:22.048 19559 ERROR ironic File "/usr/lib64/python2.7/SocketServer.py", line 155, in _eintr_retry 2015-11-13 23:30:22.048 19559 ERROR ironic return func(*args) 2015-11-13 23:30:22.048 19559 ERROR ironic File "/usr/lib/python2.7/site-packages/eventlet/green/select.py", line 83, in select 2015-11-13 23:30:22.048 19559 ERROR ironic return hub.switch() 2015-11-13 23:30:22.048 19559 ERROR ironic File "/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 294, in switch 2015-11-13 23:30:22.048 19559 ERROR ironic return self.greenlet.switch() 2015-11-13 23:30:22.048 19559 ERROR ironic File "/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 336, in run 2015-11-13 23:30:22.048 19559 ERROR ironic self.fire_timers(self.clock()) 2015-11-13 23:30:22.048 19559 ERROR ironic File "/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 461, in fire_timers 2015-11-13 23:30:22.048 19559 ERROR ironic self.squelch_timer_exception(timer, sys.exc_info()) 2015-11-13 23:30:22.048 19559 ERROR ironic File "/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 385, in squelch_timer_exception 2015-11-13 23:30:22.048 19559 ERROR ironic traceback.print_exception(*exc_info) 2015-11-13 23:30:22.048 19559 ERROR ironic File "/usr/lib64/python2.7/traceback.py", line 124, in print_exception 2015-11-13 23:30:22.048 19559 ERROR ironic _print(file, 'Traceback (most recent call last):') 2015-11-13 23:30:22.048 19559 ERROR ironic File "/usr/lib64/python2.7/traceback.py", line 13, in _print 2015-11-13 23:30:22.048 19559 ERROR ironic file.write(str+terminator) 2015-11-13 23:30:22.048 19559 ERROR ironic IOError: [Errno 32] Broken pipe 2015-11-13 23:30:22.048 19559 ERROR ironic 201 Version-Release number of selected component (if applicable): openstack-ironic-conductor-4.2.0-2.1.el7ost.noarch openstack-ironic-api-4.2.0-2.1.el7ost.noarch openstack-ironic-common-4.2.0-2.1.el7ost.noarch How reproducible: Not very often, not sure what triggers it yet. I have only seen it after a bunch of redeploy on a virtual system. I have now set debug and verbose to True. Hopefully some more hints will pop up.
Known bug: https://bugs.launchpad.net/ironic/+bug/1296833 Fixed upstream by moving to oslo.service instead of SocketServer: https://bugs.launchpad.net/ironic/+bug/1484044
reproduce also on my deployment. openstack-ironic-api-4.2.2-3.el7ost.noarch openstack-ironic-conductor-4.2.2-3.el7ost.noarch openstack-ironic-inspector-2.2.2-1.el7ost.noarch openstack-ironic-common-4.2.2-3.el7ost.noarch
Reproduced: Environment: openstack-ironic-api-4.2.2-4.el7ost.noarch openstack-ironic-conductor-4.2.2-4.el7ost.noarch openstack-ironic-common-4.2.2-4.el7ost.noarch openstack-ironic-inspector-2.2.4-2.el7ost.noarch
Yeah, we've seem this bug upstream the cause is due a bug in the Python SocketServer when the client disconnects suddenly it will raise that exception (see http://bugs.python.org/issue14574). New versions of Ironic switched to using oslo.service (as pointed by dmitry's comment #1, https://review.openstack.org/#/c/219298/). I've backported 219298 since Ironic osp8 already depends on oslo.service, you can find the link to the backport on the external links session (upstream and downstream).
Hi @Michele, Talking in the Upstream Ironic, apparently that bug shouldn't cause the API service to stop working. Can I have access to the whole ironic-api logs to investigate the problem further ? Thank you
Hi Lucas, so I don't have more recent logs as they were from a machine with liberty that I was developing on and to which I do not have access any longer. I moved on to mitaka and this issue has never shown up again. Maybe someone else that is currently experiencing this issue can provide more logs? cheers, Michele
I am running into this using OSP8 beta9. I can provide logs and potentially access. [stack@iaas-inst ironic]$ sudo yum list installed | grep ironic This system is not registered with RHN Classic or Red Hat Satellite. You can use rhn_register to register. Red Hat Satellite or RHN Classic support will be disabled. openstack-ironic-api.noarch 1:4.2.2-4.el7ost @RH7-RHOS-8.0 openstack-ironic-common.noarch 1:4.2.2-4.el7ost @RH7-RHOS-8.0 openstack-ironic-conductor.noarch 1:4.2.2-4.el7ost @RH7-RHOS-8.0 openstack-ironic-inspector.noarch 2.2.4-3.el7ost @RH7-RHOS-8.0-director python-ironic-inspector-client.noarch python-ironicclient.noarch 0.8.1-1.el7ost @RH7-RHOS-8.0 [stack@iaas-inst ironic]$
RDO has same issue (https://bugzilla.redhat.com/show_bug.cgi?id=1370942)
I'm also able to reproduce this here.
Hi David, Michele, Steve, This problem have been fixed upstream by switching to oslo.service [0]. The bug upstream is this one [1] (see comments) it was marked as invalid because of the parallel work to move to oslo.service. Unfortunately I won't be able to backport that upstream change because it will introduce a new dependency to OSP8. A possible workaround would be to setup the API service behind apache mod_wsgi. [0] https://review.openstack.org/#/c/219298/ [1] https://bugs.launchpad.net/ironic/+bug/1296833
*** Bug 1328981 has been marked as a duplicate of this bug. ***