Bug 1285470 - openstack-ironic-api stopped due to broken pipe
openstack-ironic-api stopped due to broken pipe
Status: CLOSED CANTFIX
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-ironic (Show other bugs)
8.0 (Liberty)
All Linux
high Severity high
: ---
: 8.0 (Liberty)
Assigned To: Lucas Alvares Gomes
Toure Dunnon
: ZStream
: 1328981 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-11-25 11:37 EST by Michele Baldessari
Modified: 2016-11-10 06:51 EST (History)
15 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-10-03 12:01:27 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Launchpad 1484044 None None None Never
OpenStack gerrit 287134 None None None 2016-03-02 07:05 EST

  None (edit)
Description Michele Baldessari 2015-11-25 11:37:24 EST
Description of problem:

Since I have seen this issue twice in a week, here is a report. While re-deploying the overcloud I noticed that all nodes where stuck and had
a failed connection towards the ironic services. On the undercloud the
ironic-api service was stopped:

[root@undercloud ~]# systemctl |grep ironic
● openstack-ironic-api.service loaded failed failed    OpenStack Ironic API service
openstack-ironic-conductor.service loaded active running   OpenStack Ironic Conductor service
openstack-ironic-inspector-dnsmasq.service loaded active running   PXE boot dnsmasq service for Ironic Inspector
openstack-ironic-inspector.service loaded active running   Hardware introspection service for OpenStack Ironic

Corresponding logs:
2015-11-13 23:30:19.537 19559 DEBUG keystonemiddleware.auth_token [-] Authenticating user token process_request /usr/lib/python2.7/site-packages/keystonemiddleware/auth_token/__init__.py:480
2015-11-13 23:30:19.880 19559 DEBUG keystonemiddleware.auth_token [-] Storing token in cache store /usr/lib/python2.7/site-packages/keystonemiddleware/auth_token/_cache.py:170
2015-11-13 23:30:19.884 19559 DEBUG keystonemiddleware.auth_token [-] Received request from user: user_id 61c4f345f52c40769b99090ea37b2e1f, project_id 5c4817932d9b4ae39f015a5ec046c7a8, roles admin,_member_ process_request /usr/lib/python2.7/site-packages/keystonemiddleware/auth_token/__init__.py:759
2015-11-13 23:30:22.048 19559 CRITICAL ironic [-] IOError: [Errno 32] Broken pipe
2015-11-13 23:30:22.048 19559 ERROR ironic Traceback (most recent call last):
2015-11-13 23:30:22.048 19559 ERROR ironic   File "/usr/bin/ironic-api", line 10, in <module>
2015-11-13 23:30:22.048 19559 ERROR ironic     sys.exit(main())
2015-11-13 23:30:22.048 19559 ERROR ironic   File "/usr/lib/python2.7/site-packages/ironic/cmd/api.py", line 64, in main
2015-11-13 23:30:22.048 19559 ERROR ironic     wsgi.serve_forever()
2015-11-13 23:30:22.048 19559 ERROR ironic   File "/usr/lib64/python2.7/SocketServer.py", line 236, in serve_forever
2015-11-13 23:30:22.048 19559 ERROR ironic     poll_interval)
2015-11-13 23:30:22.048 19559 ERROR ironic   File "/usr/lib64/python2.7/SocketServer.py", line 155, in _eintr_retry
2015-11-13 23:30:22.048 19559 ERROR ironic     return func(*args)
2015-11-13 23:30:22.048 19559 ERROR ironic   File "/usr/lib/python2.7/site-packages/eventlet/green/select.py", line 83, in select
2015-11-13 23:30:22.048 19559 ERROR ironic     return hub.switch()
2015-11-13 23:30:22.048 19559 ERROR ironic   File "/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 294, in switch
2015-11-13 23:30:22.048 19559 ERROR ironic     return self.greenlet.switch()
2015-11-13 23:30:22.048 19559 ERROR ironic   File "/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 336, in run
2015-11-13 23:30:22.048 19559 ERROR ironic     self.fire_timers(self.clock())
2015-11-13 23:30:22.048 19559 ERROR ironic   File "/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 461, in fire_timers
2015-11-13 23:30:22.048 19559 ERROR ironic     self.squelch_timer_exception(timer, sys.exc_info())
2015-11-13 23:30:22.048 19559 ERROR ironic   File "/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 385, in squelch_timer_exception
2015-11-13 23:30:22.048 19559 ERROR ironic     traceback.print_exception(*exc_info)
2015-11-13 23:30:22.048 19559 ERROR ironic   File "/usr/lib64/python2.7/traceback.py", line 124, in print_exception
2015-11-13 23:30:22.048 19559 ERROR ironic     _print(file, 'Traceback (most recent call last):')
2015-11-13 23:30:22.048 19559 ERROR ironic   File "/usr/lib64/python2.7/traceback.py", line 13, in _print
2015-11-13 23:30:22.048 19559 ERROR ironic     file.write(str+terminator)
2015-11-13 23:30:22.048 19559 ERROR ironic IOError: [Errno 32] Broken pipe
2015-11-13 23:30:22.048 19559 ERROR ironic
201

Version-Release number of selected component (if applicable):
openstack-ironic-conductor-4.2.0-2.1.el7ost.noarch
openstack-ironic-api-4.2.0-2.1.el7ost.noarch
openstack-ironic-common-4.2.0-2.1.el7ost.noarch

How reproducible:
Not very often, not sure what triggers it yet. I have only seen it after
a bunch of redeploy on a virtual system.

I have now set debug and verbose to True. Hopefully some more hints will pop up.
Comment 1 Dmitry Tantsur 2015-11-26 03:55:05 EST
Known bug: https://bugs.launchpad.net/ironic/+bug/1296833
Fixed upstream by moving to oslo.service instead of SocketServer: https://bugs.launchpad.net/ironic/+bug/1484044
Comment 5 Leonid Natapov 2016-01-05 06:38:34 EST
reproduce also on my deployment.

openstack-ironic-api-4.2.2-3.el7ost.noarch
openstack-ironic-conductor-4.2.2-3.el7ost.noarch
openstack-ironic-inspector-2.2.2-1.el7ost.noarch
openstack-ironic-common-4.2.2-3.el7ost.noarch
Comment 6 Alexander Chuzhoy 2016-03-01 13:57:33 EST
Reproduced:
Environment:
openstack-ironic-api-4.2.2-4.el7ost.noarch
openstack-ironic-conductor-4.2.2-4.el7ost.noarch
openstack-ironic-common-4.2.2-4.el7ost.noarch
openstack-ironic-inspector-2.2.4-2.el7ost.noarch
Comment 7 Lucas Alvares Gomes 2016-03-02 07:05:23 EST
Yeah, we've seem this bug upstream the cause is due a bug in the Python SocketServer when the client disconnects suddenly it will raise that exception (see http://bugs.python.org/issue14574).

New versions of Ironic switched to using oslo.service (as pointed by dmitry's comment #1, https://review.openstack.org/#/c/219298/).

I've backported 219298 since Ironic osp8 already depends on oslo.service, you can find the link to the backport on the external links session (upstream and downstream).
Comment 8 Lucas Alvares Gomes 2016-03-02 10:39:20 EST
Hi @Michele,

Talking in the Upstream Ironic, apparently that bug shouldn't cause the API service to stop working. Can I have access to the whole ironic-api logs to investigate the problem further ?

Thank you
Comment 9 Michele Baldessari 2016-03-02 13:26:30 EST
Hi Lucas,

so I don't have more recent logs as they were from a machine with liberty
that I was developing on and to which I do not have access any longer.
I moved on to mitaka and this issue has never shown up again. Maybe someone else 
that is currently experiencing this issue can provide more logs?

cheers,
Michele
Comment 10 Steve Reichard 2016-03-30 17:18:50 EDT
I am running into this using OSP8 beta9.   I can provide logs and potentially access.  

[stack@iaas-inst ironic]$ sudo yum list installed | grep ironic
This system is not registered with RHN Classic or Red Hat Satellite.
You can use rhn_register to register.
Red Hat Satellite or RHN Classic support will be disabled.
openstack-ironic-api.noarch       1:4.2.2-4.el7ost        @RH7-RHOS-8.0         
openstack-ironic-common.noarch    1:4.2.2-4.el7ost        @RH7-RHOS-8.0         
openstack-ironic-conductor.noarch 1:4.2.2-4.el7ost        @RH7-RHOS-8.0         
openstack-ironic-inspector.noarch 2.2.4-3.el7ost          @RH7-RHOS-8.0-director
python-ironic-inspector-client.noarch
python-ironicclient.noarch        0.8.1-1.el7ost          @RH7-RHOS-8.0         
[stack@iaas-inst ironic]$
Comment 11 David Hill 2016-08-28 14:23:31 EDT
RDO has same issue (https://bugzilla.redhat.com/show_bug.cgi?id=1370942)
Comment 12 David Hill 2016-08-31 15:10:02 EDT
I'm also able to reproduce this here.
Comment 13 Lucas Alvares Gomes 2016-10-03 12:01:27 EDT
Hi David, Michele, Steve,

This problem have been fixed upstream by switching to oslo.service [0]. The bug upstream is this one [1] (see comments) it was marked as invalid because of the parallel work to move to oslo.service.

Unfortunately I won't be able to backport that upstream change because it will introduce a new dependency to OSP8.

A possible workaround would be to setup the API service behind apache mod_wsgi.

[0] https://review.openstack.org/#/c/219298/
[1] https://bugs.launchpad.net/ironic/+bug/1296833
Comment 14 Dmitry Tantsur 2016-11-10 06:51:18 EST
*** Bug 1328981 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.