Description of problem: This subman cronjob executes ceph-disks: etc/cron.hourly/subman: ... disks = json.loads(subprocess.check_output("ceph-disk list --format json", shell=True)) ... facts_file = os.environ.get("CEPH_FACTS_FILE", "/etc/rhsm/facts/ceph_usage.facts") ceph-disks executes blkid which have no timeout see https://bugzilla.redhat.com/show_bug.cgi?id=1591603 And it tends to hang forever if customer uses multipath and any of the path fails. That exhausts the system resources during the time and ending up running like 600 processes of blkid, load average goes to 100s. That slows down the system, the file should not be required for non-ceph environments Version-Release number of selected component (if applicable): OSP10 Expected: This file should not be part of overcloud.qcow image unless ceph is enabled Currently: This file is part of overcloud images and used even ceph is disabled
The cron job is included in the ceph-osd package, which we know is included in the default overcloud image in releases prior to OSP-13. I manually tested a fresh OSP-10 installation and confirmed the cron job is removed when the ceph-osd package is removed. For releases prior to OSP-13, the ceph-osd package is automatically removed from an overcloud node when the node does not require the package (see bug #1405881). This is done by upgrading the undercloud, followed by performing a minor update of the overcloud nodes. The update procedure will remove the ceph-osd package from all nodes that do not require it. I looked at an sosreport in the customer case, it it's apparent that a few specific packages have been manually updated, but the ceph-osd package is still installed. From this I conclude a full update has not been performed on the overcloud nodes. I'm marking this BZ a duplicate of the one that handles removal of the ceph-osd package. If the customer does not wish to update the overcloud nodes, then an alternative would be to manually remove the ceph-osd package on at least the compute nodes. This will remove the ceph-disk cron job. *** This bug has been marked as a duplicate of bug 1405881 ***