Bug 1319795 - Swift services fail to start on director install
Summary: Swift services fail to start on director install
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director
Version: 7.0 (Kilo)
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 10.0 (Newton)
Assignee: Angus Thomas
QA Contact: Arik Chernetsky
URL:
Whiteboard:
Depends On:
Blocks: 1273561
TreeView+ depends on / blocked
 
Reported: 2016-03-21 14:28 UTC by Jason Montleon
Modified: 2016-10-13 22:15 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-10-13 22:15:24 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Jason Montleon 2016-03-21 14:28:17 UTC
Description of problem:
Sometimes when running the undercloud install we are left with the swift services in a failed state complaining that the ring.gz files don't exist, which prevents node introspection from working properly.

Version-Release number of selected component (if applicable):
openstack-ceilometer-alarm-2015.1.1-1.el7ost.noarch
openstack-ceilometer-api-2015.1.1-1.el7ost.noarch
openstack-ceilometer-central-2015.1.1-1.el7ost.noarch
openstack-ceilometer-collector-2015.1.1-1.el7ost.noarch
openstack-ceilometer-common-2015.1.1-1.el7ost.noarch
openstack-ceilometer-notification-2015.1.1-1.el7ost.noarch
openstack-dashboard-2015.1.1-4.el7ost.noarch
openstack-dashboard-theme-2015.1.1-4.el7ost.noarch
openstack-glance-2015.1.1-3.el7ost.noarch
openstack-heat-api-2015.1.1-6.el7ost.noarch
openstack-heat-api-cfn-2015.1.1-6.el7ost.noarch
openstack-heat-api-cloudwatch-2015.1.1-6.el7ost.noarch
openstack-heat-common-2015.1.1-6.el7ost.noarch
openstack-heat-engine-2015.1.1-6.el7ost.noarch
openstack-heat-templates-0-0.6.20150605git.el7ost.noarch
openstack-ironic-api-2015.1.1-5.el7ost.noarch
openstack-ironic-common-2015.1.1-5.el7ost.noarch
openstack-ironic-conductor-2015.1.1-5.el7ost.noarch
openstack-ironic-discoverd-1.1.0-7.el7ost.noarch
openstack-keystone-2015.1.1-1.el7ost.noarch
openstack-neutron-2015.1.1-13.el7ost.noarch
openstack-neutron-common-2015.1.1-13.el7ost.noarch
openstack-neutron-ml2-2015.1.1-13.el7ost.noarch
openstack-neutron-openvswitch-2015.1.1-13.el7ost.noarch
openstack-nova-api-2015.1.1-3.el7ost.noarch
openstack-nova-cert-2015.1.1-3.el7ost.noarch
openstack-nova-common-2015.1.1-3.el7ost.noarch
openstack-nova-compute-2015.1.1-3.el7ost.noarch
openstack-nova-conductor-2015.1.1-3.el7ost.noarch
openstack-nova-console-2015.1.1-3.el7ost.noarch
openstack-nova-novncproxy-2015.1.1-3.el7ost.noarch
openstack-nova-scheduler-2015.1.1-3.el7ost.noarch
openstack-puppet-modules-2015.1.8-23.el7ost.noarch
openstack-selinux-0.6.37-1.el7ost.noarch
openstack-swift-2.3.0-2.el7ost.noarch
openstack-swift-account-2.3.0-2.el7ost.noarch
openstack-swift-container-2.3.0-2.el7ost.noarch
openstack-swift-object-2.3.0-2.el7ost.noarch
openstack-swift-plugin-swift3-1.7-3.el7ost.noarch
openstack-swift-proxy-2.3.0-2.el7ost.noarch
openstack-tempest-kilo-20150708.2.el7ost.noarch
openstack-tripleo-0.0.7-0.1.1664e566.el7ost.noarch
openstack-tripleo-common-0.0.1.dev6-3.git49b57eb.el7ost.noarch
openstack-tripleo-heat-templates-0.8.6-71.el7ost.noarch
openstack-tripleo-image-elements-0.9.6-10.el7ost.noarch
openstack-tripleo-puppet-elements-0.0.1-5.el7ost.noarch
openstack-tuskar-0.4.18-5.el7ost.noarch
openstack-tuskar-ui-0.4.0-5.el7ost.noarch
openstack-tuskar-ui-extras-0.0.4-2.el7ost.noarch
openstack-utils-2014.2-1.el7ost.noarch
python-django-openstack-auth-1.2.0-5.el7ost.noarch
python-openstackclient-1.0.3-3.el7ost.noarch
redhat-access-plugin-openstack-7.0.0-0.el7ost.noarch


How reproducible:
Intermittent. We're testing with VM's that are pretty much identical, however tt seems on some hosts it's 100% or near 100% reproducable and on others 0%, leading me to suspect it may be performance related.

Steps to Reproduce:
1. Install undercloud
2. systemctl -t services

Actual results:
swift services are failed

Expected results:
swift services are started

Additional info:
At the end of install we are sometimes left in this state. openstack-service restart swift brings them all back up.

[root@und ~]# systemctl status openstack-swift-proxy  -l
openstack-swift-proxy.service - OpenStack Object Storage (swift) - Proxy Server
   Loaded: loaded (/usr/lib/systemd/system/openstack-swift-proxy.service; enabled)
   Active: failed (Result: exit-code) since Wed 2016-03-16 12:14:07 EDT; 1h 12min ago
 Main PID: 17653 (code=exited, status=1/FAILURE)
   CGroup: /system.slice/openstack-swift-proxy.service

Mar 16 12:14:07 und.example.com swift-proxy-server[17653]: self._reload(force=True)
Mar 16 12:14:07 und.example.com swift-proxy-server[17653]: File "/usr/lib/python2.7/site-packages/swift/common/ring/ring.py", line 157, in _reload
Mar 16 12:14:07 und.example.com swift-proxy-server[17653]: ring_data = RingData.load(self.serialized_path)
Mar 16 12:14:07 und.example.com swift-proxy-server[17653]: File "/usr/lib/python2.7/site-packages/swift/common/ring/ring.py", line 65, in load
Mar 16 12:14:07 und.example.com swift-proxy-server[17653]: gz_file = GzipFile(filename, 'rb')
Mar 16 12:14:07 und.example.com swift-proxy-server[17653]: File "/usr/lib64/python2.7/gzip.py", line 94, in __init__
Mar 16 12:14:07 und.example.com swift-proxy-server[17653]: fileobj = self.myfileobj = __builtin__.open(filename, mode or 'rb')
Mar 16 12:14:07 und.example.com swift-proxy-server[17653]: IOError: [Errno 2] No such file or directory: '/etc/swift/container.ring.gz'
Mar 16 12:14:07 und.example.com systemd[1]: openstack-swift-proxy.service: main process exited, code=exited, status=1/FAILURE
Mar 16 12:14:07 und.example.com systemd[1]: Unit openstack-swift-proxy.service entered failed state.
     
[root@und ~]# systemctl status openstack-swift-container-replicator.service  -l
openstack-swift-container-replicator.service - OpenStack Object Storage (swift) - Container Replicator
   Loaded: loaded (/usr/lib/systemd/system/openstack-swift-container-replicator.service; enabled)
   Active: failed (Result: exit-code) since Wed 2016-03-16 12:14:03 EDT; 1h 12min ago
 Main PID: 17371 (code=exited, status=1/FAILURE)
   CGroup: /system.slice/openstack-swift-container-replicator.service

Mar 16 12:14:03 und.example.com swift-container-replicator[17371]: self._reload(force=True)
Mar 16 12:14:03 und.example.com swift-container-replicator[17371]: File "/usr/lib/python2.7/site-packages/swift/common/ring/ring.py", line 157, in _reload
Mar 16 12:14:03 und.example.com swift-container-replicator[17371]: ring_data = RingData.load(self.serialized_path)
Mar 16 12:14:03 und.example.com swift-container-replicator[17371]: File "/usr/lib/python2.7/site-packages/swift/common/ring/ring.py", line 65, in load
Mar 16 12:14:03 und.example.com swift-container-replicator[17371]: gz_file = GzipFile(filename, 'rb')
Mar 16 12:14:03 und.example.com swift-container-replicator[17371]: File "/usr/lib64/python2.7/gzip.py", line 94, in __init__
Mar 16 12:14:03 und.example.com swift-container-replicator[17371]: fileobj = self.myfileobj = __builtin__.open(filename, mode or 'rb')
Mar 16 12:14:03 und.example.com swift-container-replicator[17371]: IOError: [Errno 2] No such file or directory: '/etc/swift/container.ring.gz'
Mar 16 12:14:03 und.example.com systemd[1]: openstack-swift-container-replicator.service: main process exited, code=exited, status=1/FAILURE
Mar 16 12:14:03 und.example.com systemd[1]: Unit openstack-swift-container-replicator.service entered failed state.

[root@und ~]# systemctl status openstack-swift-object-updater.service  -l
openstack-swift-object-updater.service - OpenStack Object Storage (swift) - Object Updater
   Loaded: loaded (/usr/lib/systemd/system/openstack-swift-object-updater.service; enabled)
   Active: failed (Result: exit-code) since Wed 2016-03-16 12:15:44 EDT; 1h 12min ago
 Main PID: 17224 (code=exited, status=1/FAILURE)
   CGroup: /system.slice/openstack-swift-object-updater.service

Mar 16 12:13:59 und.example.com systemd[1]: Started OpenStack Object Storage (swift) - Object Updater.
Mar 16 12:15:43 und.example.com object-updater[17224]: Begin object update sweep
Mar 16 12:15:43 und.example.com object-updater[17224]: UNCAUGHT EXCEPTION#012Traceback (most recent call last):#012  File "/usr/bin/swift-object-updater", line 23, in <module>#012    run_daemon(ObjectUpdater, conf_file, **options)#012  File "/usr/lib/python2.7/site-packages/swift/common/daemon.py", line 110, in run_daemon#012    klass(conf).run(once=once, **kwargs)#012  File "/usr/lib/python2.7/site-packages/swift/common/daemon.py", line 57, in run#012    self.run_forever(**kwargs)#012  File "/usr/lib/python2.7/site-packages/swift/obj/updater.py", line 82, in run_forever#012    self.get_container_ring().get_nodes('')#012  File "/usr/lib/python2.7/site-packages/swift/obj/updater.py", line 71, in get_container_ring#012    self.container_ring = Ring(self.swift_dir, ring_name='container')#012  File "/usr/lib/python2.7/site-packages/swift/common/ring/ring.py", line 152, in __init__#012    self._reload(force=True)#012  File "/usr/lib/python2.7/site-packages/swift/common/ring/ring.py", line 157, in _reload#012    ring_data = RingData.load(self.serialized_path)#012  File "/usr/lib/python2.7/site-packages/swift/common/ring/ring.py", line 65, in load#012    gz_file = GzipFile(filename, 'rb')#012  File "/usr/lib64/python2.7/gzip.py", line 94, in __init__#012    fileobj = self.myfileobj = __builtin__.open(filename, mode or 'rb')#012IOError: [Errno 2] No such file or directory: '/etc/swift/container.ring.gz'
Mar 16 12:15:44 und.example.com systemd[1]: openstack-swift-object-updater.service: main process exited, code=exited, status=1/FAILURE
Mar 16 12:15:44 und.example.com systemd[1]: Unit openstack-swift-object-updater.service entered failed state.

[root@und ~]# ll /etc/swift
total 6200
-rw-r--r--. 1 root  swift 2099928 Mar 16 12:13 account.builder
-rw-r--r--. 1 root  swift    1729 Mar 16 12:13 account.ring.gz
drwxr-xr-x. 2 swift swift       6 Oct  7 19:21 account-server
-rw-r-----. 1 swift swift     652 Mar 16 12:14 account-server.conf
drwxr-sr-x. 2 root  swift    4096 Mar 16 12:16 backups
-rw-r--r--. 1 root  swift 2099928 Mar 16 12:16 container.builder
-rw-r-----. 1 root  swift    1415 Oct  7 19:19 container-reconciler.conf
-rw-r--r--. 1 root  swift    1731 Mar 16 12:16 container.ring.gz
drwxr-xr-x. 2 swift swift       6 Oct  7 19:21 container-server
-rw-r-----. 1 swift swift     739 Mar 16 12:14 container-server.conf
-rw-r--r--. 1 root  swift 2099928 Mar 16 12:12 object.builder
-rw-r-----. 1 root  swift     291 Oct  7 19:19 object-expirer.conf
-rw-r--r--. 1 root  swift    1727 Mar 16 12:12 object.ring.gz
drwxr-xr-x. 2 swift swift       6 Oct  7 19:21 object-server
-rw-r-----. 1 swift swift     643 Mar 16 12:14 object-server.conf
drwxr-xr-x. 2 root  root        6 Oct  7 19:21 proxy-server
-rw-rw----. 1 swift swift    1968 Mar 16 12:14 proxy-server.conf
-rw-rw----. 1 swift swift      79 Mar 16 12:11 swift.conf

Comment 2 Mike Burns 2016-04-07 21:14:44 UTC
This bug did not make the OSP 8.0 release.  It is being deferred to OSP 10.

Comment 3 Lars Kellogg-Stedman 2016-05-17 03:52:54 UTC
There is an upstream fix for this here:

https://review.openstack.org/#/c/317206/


Note You need to log in before you can comment on or make changes to this bug.