Bug 1619472 - non-ceph deployments execute ceph-disk /etc/cron.hourly/subman and exhaust system resources
Summary: non-ceph deployments execute ceph-disk /etc/cron.hourly/subman and exhaust sy...
Keywords:
Status: CLOSED DUPLICATE of bug 1405881
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director-images
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Mike Burns
QA Contact: Gurenko Alex
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-08-21 00:35 UTC by Robin Cernin
Modified: 2021-12-10 17:14 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-09-17 17:17:02 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-11570 0 None None None 2021-12-10 17:14:19 UTC

Description Robin Cernin 2018-08-21 00:35:42 UTC
Description of problem:

This subman cronjob executes ceph-disks:

etc/cron.hourly/subman:
...
disks = json.loads(subprocess.check_output("ceph-disk list --format json", shell=True))
...
facts_file = os.environ.get("CEPH_FACTS_FILE", "/etc/rhsm/facts/ceph_usage.facts")

ceph-disks executes blkid which have no timeout see https://bugzilla.redhat.com/show_bug.cgi?id=1591603

And it tends to hang forever if customer uses multipath and any of the path fails.

That exhausts the system resources during the time and ending up running like 600 processes of blkid, load average goes to 100s.

That slows down the system, the file should not be required for non-ceph environments

Version-Release number of selected component (if applicable):

OSP10

Expected:

This file should not be part of overcloud.qcow image unless ceph is enabled

Currently:

This file is part of overcloud images and used even ceph is disabled

Comment 2 Alan Bishop 2018-09-17 17:17:02 UTC
The cron job is included in the ceph-osd package, which we know is included in
the default overcloud image in releases prior to OSP-13. I manually tested a
fresh OSP-10 installation and confirmed the cron job is removed when the
ceph-osd package is removed.

For releases prior to OSP-13, the ceph-osd package is automatically removed
from an overcloud node when the node does not require the package (see bug
#1405881). This is done by upgrading the undercloud, followed by performing a
minor update of the overcloud nodes. The update procedure will remove the
ceph-osd package from all nodes that do not require it.

I looked at an sosreport in the customer case, it it's apparent that a few
specific packages have been manually updated, but the ceph-osd package is
still installed. From this I conclude a full update has not been performed on
the overcloud nodes.

I'm marking this BZ a duplicate of the one that handles removal of the
ceph-osd package. If the customer does not wish to update the overcloud nodes,
then an alternative would be to manually remove the ceph-osd package on at
least the compute nodes. This will remove the ceph-disk cron job.

*** This bug has been marked as a duplicate of bug 1405881 ***


Note You need to log in before you can comment on or make changes to this bug.