Bug 1743834 - ceph-volume fails on restorecon /var/lib/ceph/osd/ceph-1
Summary: ceph-volume fails on restorecon /var/lib/ceph/osd/ceph-1
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Ceph-Volume
Version: 4.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: rc
: 4.0
Assignee: Christina Meno
QA Contact: Vasishta
URL:
Whiteboard:
: 1749465 (view as bug list)
Depends On:
Blocks: 1594251
TreeView+ depends on / blocked
 
Reported: 2019-08-20 18:50 UTC by Eliad Cohen
Modified: 2020-01-31 12:47 UTC (History)
12 users (show)

Fixed In Version: ceph-14.2.3-3.el8cp
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-01-31 12:47:06 UTC
Embargoed:


Attachments (Terms of Use)
Undercloud folders (13.07 MB, application/x-xz)
2019-08-20 18:51 UTC, Eliad Cohen
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github ceph ceph pull 30274 0 None closed ceph-volume: fix stderr failure to decode/encode when redirected 2020-12-27 17:40:56 UTC
Github ceph ceph pull 30299 0 None closed luminous: ceph-volume: fix stderr failure to decode/encode when redirected 2020-12-27 17:41:00 UTC
Red Hat Product Errata RHBA-2020:0312 0 None None None 2020-01-31 12:47:27 UTC

Description Eliad Cohen 2019-08-20 18:50:11 UTC
Description of problem:
ceph osd containers are not coming up. During deployment, the ceph-ansible task in subject retries 60 times until eventually failing [1].
When examining the nodes, the osd containers do not stay up.
A look into journalctl [2] shows the same entries as the containers fail to restart.

Version-Release number of selected component (if applicable):
RHOS_TRUNK-15.0-RHEL-8-20190819.n.1

How reproducible:
100%

Steps to Reproduce:
1. Deploy rhos15 with ceph

Actual results:
fails on ceph-asible

Expected results:
deployment should succeed, osd containers should be up

Additional info:
[1] http://pastebin.test.redhat.com/790224
[2] http://pastebin.test.redhat.com/790222

Comment 1 Eliad Cohen 2019-08-20 18:51:43 UTC
Created attachment 1606233 [details]
Undercloud folders

Comment 3 John Fulton 2019-08-20 19:08:55 UTC
[2019-08-20 18:22:36,358][ceph_volume.process][INFO  ] stdout ceph.block_device=/dev/ceph-d58085a4-d014-4dd8-b3ed-e73ef994d7c8/osd-data-b3ce8c3d-742b-4d88-88b1-7c951e7a6bc4,ceph.block_uuid=1V1DKt-WQi5-ndXO-Yc3e-K9Hp-gcUV-ib8XiU,ceph.cephx_lockbox_secret=,ceph.cluster_fsid=b24ad492-c351-11e9-9e81-525400c13f4a,ceph.cluster_name=ceph,ceph.crush_device_class=None,ceph.encrypted=0,ceph.osd_fsid=b5cb48bd-e897-44bc-b646-c493d590eebb,ceph.osd_id=11,ceph.type=block,ceph.vdo=0";"/dev/ceph-d58085a4-d014-4dd8-b3ed-e73ef994d7c8/osd-data-b3ce8c3d-742b-4d88-88b1-7c951e7a6bc4";"osd-data-b3ce8c3d-742b-4d88-88b1-7c951e7a6bc4";"ceph-d58085a4-d014-4dd8-b3ed-e73ef994d7c8";"1V1DKt-WQi5-ndXO-Yc3e-K9Hp-gcUV-ib8XiU";"10.00g
[2019-08-20 18:22:36,360][ceph_volume.process][INFO  ] Running command: /bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-1
[2019-08-20 18:22:36,363][ceph_volume.process][INFO  ] Running command: /usr/sbin/restorecon /var/lib/ceph/osd/ceph-1
[2019-08-20 18:22:36,364][ceph_volume][ERROR ] exception caught by decorator
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 59, in newfunc
    return f(*a, **kw)
  File "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 148, in main
    terminal.dispatch(self.mapper, subcommand_args)
  File "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 205, in dispatch
    instance.main()
  File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/main.py", line 40, in main
    terminal.dispatch(self.mapper, self.argv)
  File "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 205, in dispatch
    instance.main()
  File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/activate.py", line 341, in main
    self.activate(args)
  File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 16, in is_root
    return func(*a, **kw)
  File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/activate.py", line 265, in activate
    activate_bluestore(lvs, no_systemd=args.no_systemd)
  File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/activate.py", line 139, in activate_bluestore
    prepare_utils.create_osd_path(osd_id, tmpfs=True)
  File "/usr/lib/python3.6/site-packages/ceph_volume/util/prepare.py", line 223, in create_osd_path
    mount_tmpfs(path)
  File "/usr/lib/python3.6/site-packages/ceph_volume/util/prepare.py", line 216, in mount_tmpfs
    system.set_context(path)
  File "/usr/lib/python3.6/site-packages/ceph_volume/util/system.py", line 284, in set_context
    process.run(['restorecon', path])
  File "/usr/lib/python3.6/site-packages/ceph_volume/process.py", line 121, in run
    terminal.write(command_msg)
  File "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 134, in write
    return _Write().raw(msg)
  File "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 117, in raw
    self.write(string)
  File "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 120, in write
    self._writer.write(self.prefix + line + self.suffix)
ValueError: I/O operation on closed file.
[root@ceph-2 ceph]#

Comment 4 RHEL Program Management 2019-08-20 19:09:03 UTC
Please specify the severity of this bug. If severity is not set, "medium" will be assumed.

Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 5 John Fulton 2019-08-20 19:40:36 UTC
We had this problem with 14.2.2-431.gb28c939.el8cp but not with 14.2.2-339.g1fd0f60.el8cp

Comment 14 John Fulton 2019-09-11 14:36:52 UTC
*** Bug 1749465 has been marked as a duplicate of this bug. ***

Comment 15 John Fulton 2019-09-11 15:06:12 UTC
ceph-volume team there's a lot of comments on this bug about using different containers as workarounds, but I hope that doesn't distract you. This is a ceph-volume bug and comment #3 shows the problem we encountered while trying to use ceph-volume. It's been 3 weeks; would you please triage the bug?

Comment 16 Alfredo Deza 2019-09-11 19:08:52 UTC
I think this is not

Comment 29 errata-xmlrpc 2020-01-31 12:47:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0312


Note You need to log in before you can comment on or make changes to this bug.