Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1441393 - os-collect-config does not start after node hard reset due to xfs
os-collect-config does not start after node hard reset due to xfs
Status: CLOSED CURRENTRELEASE
Product: Red Hat OpenStack
Classification: Red Hat
Component: os-collect-config (Show other bugs)
11.0 (Ocata)
Unspecified Unspecified
unspecified Severity unspecified
: ga
: 11.0 (Ocata)
Assigned To: Ben Nemec
Shai Revivo
: Triaged, ZStream
Depends On: 1438096 1442801
Blocks:
  Show dependency treegraph
 
Reported: 2017-04-11 16:55 EDT by Alex Schultz
Modified: 2017-06-08 16:18 EDT (History)
11 users (show)

See Also:
Fixed In Version: os-collect-config-6.0.0-2.el7ost
Doc Type: Known Issue
Doc Text:
Invalid cache files may cause os-collect-config to report 'ValueError: No JSON object could be decoded' and the service will fail to start. The cache files located in '/var/lib/os-collect-config/' should be valid json files. If they are are of size 0 or contain invalid json, remove the invalid files from '/var/lib/os-collect-config', otherwise they may prevent os-collect-config from starting.
Story Points: ---
Clone Of: 1438096
Environment:
Last Closed: 2017-06-08 16:18:00 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Launchpad 1678328 None None None 2017-04-11 16:55 EDT
OpenStack gerrit 455829 None None None 2017-04-11 16:55 EDT

  None (edit)
Description Alex Schultz 2017-04-11 16:55:22 EDT
+++ This bug was initially created as a clone of Bug #1438096 +++

Description of problem:
I noticed issue with os-collect-config when I snapshot my nested vm hypervisor the nodes have to be restarted. Root cause is XFS most likely dropping last transaction and reverting filesystem to point when os-collect-config was copying files around to point where they got turncated to 0. Or it's pure XFS bug (we might wanna investigate). But os-collect-config fails to start after.

Mar 28 13:28:28 overcloud-controller-0.localdomain os-collect-config[5224]: ValueError: No JSON object could be decoded
Mar 28 13:28:28 overcloud-controller-0.localdomain os-collect-config[5224]: raise ValueError("No JSON object could be decoded")
Mar 28 13:28:28 overcloud-controller-0.localdomain os-collect-config[5224]: File "/usr/lib64/python2.7/json/decoder.py", line 384, in raw_decode
Mar 28 13:28:28 overcloud-controller-0.localdomain os-collect-config[5224]: obj, end = self.raw_decode(s, idx=_w(s, 0).end())
Mar 28 13:28:28 overcloud-controller-0.localdomain os-collect-config[5224]: File "/usr/lib64/python2.7/json/decoder.py", line 366, in decode
Mar 28 13:28:28 overcloud-controller-0.localdomain os-collect-config[5224]: return _default_decoder.decode(s)
Mar 28 13:28:28 overcloud-controller-0.localdomain os-collect-config[5224]: File "/usr/lib64/python2.7/json/__init__.py", line 338, in loads
Mar 28 13:28:28 overcloud-controller-0.localdomain os-collect-config[5224]: **kw)
Mar 28 13:28:28 overcloud-controller-0.localdomain os-collect-config[5224]: File "/usr/lib64/python2.7/json/__init__.py", line 290, in load
Mar 28 13:28:28 overcloud-controller-0.localdomain os-collect-config[5224]: metadata = json.load(f)
Mar 28 13:28:28 overcloud-controller-0.localdomain os-collect-config[5224]: File "/usr/lib/python2.7/site-packages/os_collect_config/ec2.py", line 71, in collect
Mar 28 13:28:28 overcloud-controller-0.localdomain os-collect-config[5224]: content = module.Collector(**collector_kwargs).collect()
Mar 28 13:28:28 overcloud-controller-0.localdomain os-collect-config[5224]: File "/usr/lib/python2.7/site-packages/os_collect_config/collect.py", line 166, in collect_all
Mar 28 13:28:28 overcloud-controller-0.localdomain os-collect-config[5224]: collector_kwargs_map=collector_kwargs_map)
Mar 28 13:28:28 overcloud-controller-0.localdomain os-collect-config[5224]: File "/usr/lib/python2.7/site-packages/os_collect_config/collect.py", line 262, in __main__
Mar 28 13:28:28 overcloud-controller-0.localdomain os-collect-config[5224]: sys.exit(__main__())
Mar 28 13:28:28 overcloud-controller-0.localdomain os-collect-config[5224]: File "/usr/bin/os-collect-config", line 10, in <module>
Mar 28 13:28:28 overcloud-controller-0.localdomain os-collect-config[5224]: Traceback (most recent call last):


Steps to Reproduce:
1. openstack baremetal node list | grep -v UUID | awk '{print $2}' | grep -v '^$'| while read i; do openstack baremetal node power off $i ; done
2. openstack baremetal node list | grep -v UUID | awk '{print $2}' | grep -v '^$'| while read i; do openstack baremetal node power on $i ; done
3. for i in `nova list|awk '/ACTIVE/ {print $(NF-1)}' |awk -F"=" '{print $NF}'`; do echo $i; ssh -o StrictHostKeyChecking=no heat-admin@$i "sudo systemctl status os-collect-config "; done

Actual results:
os-collect-config fails to start

Expected results:
os-collect-config handles the 0size cache files

--- Additional comment from Jaromir Coufal on 2017-04-07 00:01:16 EDT ---

Doc_text if misses OSP11.
Comment 2 Alex Schultz 2017-06-08 16:18:00 EDT
This went out with 11 GA

Note You need to log in before you can comment on or make changes to this bug.