The subscription-management utility wants to report on the available raw disk capacity available per OSD. We need a flag to ceph-disk which will produce a machine readable format (JSON to start) for parsing.
Based on subsequent conversations, the proposal is for the Ceph OSD to create the fact file for rhsm to parse so it can report on total aggregate usage across all OSDs on any cluster the user deploys. This should probably be created each time ceph-disk is invoked. This file would be in JSON format as described above. I think the minimum number of facts we need are: 1. Total raw disk space available for Ceph on the host. 2. Used raw disk space available for Ceph on the host. For completeness, while not used by rhsm in the first instance, we might want to add a total and used figure for each disk in the host to be explicit.
From Barnaby: Create a file "/etc/rhsm/facts/ceph_disk.facts". That file needs to contain json of { "band.storage.usage": <integer number of terabytes used on this node, such as 55> } for example: { "band.storage.usage": 55 } as long as that file exists (and it can be named anything that ends in ".facts") the contents will be read in and added to the system facts. This can be verified by running 'subscription-manager facts' to ensure that your value is showing up properly.
ceph-disk list --format json returns the information in JSON. $ ceph-disk list --format json | jq . [ { "path": "/dev/dm-0", "type": "other", "dmcrypt": {}, "ptype": "unknown", "is_partition": false }, { "path": "/dev/loop0", "type": "other", "dmcrypt": {}, "ptype": "unknown", "is_partition": false }, { "path": "/dev/loop1", "type": "other", "dmcrypt": {}, "ptype": "unknown", "is_partition": false }, { "path": "/dev/loop2", "type": "other", "dmcrypt": {}, "ptype": "unknown", "is_partition": false }, { "path": "/dev/loop3", "type": "other", "dmcrypt": {}, "ptype": "unknown", "is_partition": false }, { "path": "/dev/loop4", "type": "other", "dmcrypt": {}, "ptype": "unknown", "is_partition": false }, { "path": "/dev/loop5", "type": "other", "dmcrypt": {}, "ptype": "unknown", "is_partition": false }, { "path": "/dev/loop6", "type": "other", "dmcrypt": {}, "ptype": "unknown", "is_partition": false }, { "path": "/dev/loop7", "type": "other", "dmcrypt": {}, "ptype": "unknown", "is_partition": false }, { "path": "/dev/nbd0", "type": "other", "dmcrypt": {}, "ptype": "unknown", "is_partition": false }, { "path": "/dev/nbd1", "type": "other", "dmcrypt": {}, "ptype": "unknown", "is_partition": false }, { "path": "/dev/nbd10", "type": "other", "dmcrypt": {}, "ptype": "unknown", "is_partition": false }, { "path": "/dev/nbd11", "type": "other", "dmcrypt": {}, "ptype": "unknown", "is_partition": false }, { "path": "/dev/nbd12", "type": "other", "dmcrypt": {}, "ptype": "unknown", "is_partition": false }, { "path": "/dev/nbd13", "type": "other", "dmcrypt": {}, "ptype": "unknown", "is_partition": false }, { "path": "/dev/nbd14", "type": "other", "dmcrypt": {}, "ptype": "unknown", "is_partition": false }, { "path": "/dev/nbd15", "type": "other", "dmcrypt": {}, "ptype": "unknown", "is_partition": false }, { "path": "/dev/nbd2", "type": "other", "dmcrypt": {}, "ptype": "unknown", "is_partition": false }, { "path": "/dev/nbd3", "type": "other", "dmcrypt": {}, "ptype": "unknown", "is_partition": false }, { "path": "/dev/nbd4", "type": "other", "dmcrypt": {}, "ptype": "unknown", "is_partition": false }, { "path": "/dev/nbd5", "type": "other", "dmcrypt": {}, "ptype": "unknown", "is_partition": false }, { "path": "/dev/nbd6", "type": "other", "dmcrypt": {}, "ptype": "unknown", "is_partition": false }, { "path": "/dev/nbd7", "type": "other", "dmcrypt": {}, "ptype": "unknown", "is_partition": false }, { "path": "/dev/nbd8", "type": "other", "dmcrypt": {}, "ptype": "unknown", "is_partition": false }, { "path": "/dev/nbd9", "type": "other", "dmcrypt": {}, "ptype": "unknown", "is_partition": false }, { "path": "/dev/ram0", "type": "other", "dmcrypt": {}, "ptype": "unknown", "is_partition": false }, { "path": "/dev/ram1", "type": "other", "dmcrypt": {}, "ptype": "unknown", "is_partition": false }, { "path": "/dev/ram10", "type": "other", "dmcrypt": {}, "ptype": "unknown", "is_partition": false }, { "path": "/dev/ram11", "type": "other", "dmcrypt": {}, "ptype": "unknown", "is_partition": false }, { "path": "/dev/ram12", "type": "other", "dmcrypt": {}, "ptype": "unknown", "is_partition": false }, { "path": "/dev/ram13", "type": "other", "dmcrypt": {}, "ptype": "unknown", "is_partition": false }, { "path": "/dev/ram14", "type": "other", "dmcrypt": {}, "ptype": "unknown", "is_partition": false }, { "path": "/dev/ram15", "type": "other", "dmcrypt": {}, "ptype": "unknown", "is_partition": false }, { "path": "/dev/ram2", "type": "other", "dmcrypt": {}, "ptype": "unknown", "is_partition": false }, { "path": "/dev/ram3", "type": "other", "dmcrypt": {}, "ptype": "unknown", "is_partition": false }, { "path": "/dev/ram4", "type": "other", "dmcrypt": {}, "ptype": "unknown", "is_partition": false }, { "path": "/dev/ram5", "type": "other", "dmcrypt": {}, "ptype": "unknown", "is_partition": false }, { "path": "/dev/ram6", "type": "other", "dmcrypt": {}, "ptype": "unknown", "is_partition": false }, { "path": "/dev/ram7", "type": "other", "dmcrypt": {}, "ptype": "unknown", "is_partition": false }, { "path": "/dev/ram8", "type": "other", "dmcrypt": {}, "ptype": "unknown", "is_partition": false }, { "path": "/dev/ram9", "type": "other", "dmcrypt": {}, "ptype": "unknown", "is_partition": false }, { "path": "/dev/sda", "partitions": [ { "dmcrypt": {}, "uuid": null, "mount": "/", "ptype": null, "is_partition": true, "path": "/dev/sda1", "type": "other" }, { "dmcrypt": {}, "uuid": null, "ptype": null, "is_partition": true, "path": "/dev/sda2", "type": "swap" }, { "dmcrypt": {}, "uuid": null, "mount": "/home", "ptype": null, "is_partition": true, "path": "/dev/sda3", "type": "other" } ] }, { "path": "/dev/sr0", "type": "other", "dmcrypt": {}, "ptype": "unknown", "is_partition": false } ]
The output doesn't seem to show any of the usage data for the disks or partitions though ? To generate a fact file for subscription-manager do we need to composite ceph-desk and a /bin/df ?
ceph-disk mostly limits itself to information that can't be conveniently obtained by other means. Once it is known that a given partition has a given role in Ceph, usage statistics or can be obtained either from df or from an existing plugin already equipped to present file system related information to the caller.
OK. Changing the subject of this RFE. We need to write a script which runs in cron every 4 hours and: 1. Determines which devices are being used by Ceph OSDs, not including journals, as listed by ceph-disk 2. Determines the used space on those devices. 3. Creates a fact file in the format detailed above (see comment 4 and URL in comment 2)
@Neil : I believe ceph-disk already provides 1. Please let me know if you think it needs to be adapted.
Side note: I don't know enough of the context in which the script in Comment 8 should run to usefully comment on its implementation.
I've posted an overview to ceph-devel: http://thread.gmane.org/gmane.comp.file-systems.ceph.devel/29924
Although I've implemented to script that reports the fact related to Ceph, I think you should ask for review to someone who is familiar with the subscription-manager (which I'm not).
Created attachment 1188728 [details] subman-facts output
Created attachment 1188738 [details] osd-subman-facts-output
What is the output of ceph-disk list on the machine on which subscription-manager facts is run ? I suggest you verify if /etc/cron.hourly/subman exists on the machine (it should if the osd package is installed0 and if /etc/rhsm/facts/ceph_usage.facts exists (it should, one hour after the installation of the osd package).
Here are the details. [root@cephqe3 ~]# rpm -qa | grep ceph python-cephfs-10.2.2-33.el7cp.x86_64 ceph-base-10.2.2-33.el7cp.x86_64 ceph-selinux-10.2.2-33.el7cp.x86_64 ceph-osd-10.2.2-33.el7cp.x86_64 libcephfs1-10.2.2-33.el7cp.x86_64 ceph-common-10.2.2-33.el7cp.x86_64 [root@cephqe3 ~]# ll /etc/cron.hourly/subman -rw-r--r--. 1 root root 550 Aug 2 20:36 /etc/cron.hourly/subman [root@cephqe3 ~]# ll /etc/rhsm/facts/ceph_usage.facts ls: cannot access /etc/rhsm/facts/ceph_usage.facts: No such file or directory [root@cephqe3 ~]# ll /etc/rhsm/facts/ total 0 [root@cephqe3 ~]# ll /etc/rhsm/ total 8 drwxr-xr-x. 2 root root 27 Jul 26 15:27 ca drwxr-xr-x. 2 root root 6 Oct 13 2015 facts -rw-r--r--. 1 root root 1492 Oct 13 2015 logging.conf drwxr-xr-x. 2 root root 6 Oct 13 2015 pluginconf.d -rw-r--r--. 1 root root 1659 Oct 13 2015 rhsm.conf [root@cephqe3 ~]# ceph-disk list /dev/dm-0 other, xfs, mounted on / /dev/dm-1 swap, swap /dev/dm-2 other, xfs, mounted on /home /dev/sda : /dev/sda1 other, xfs, mounted on /var/lib/ceph/osd/master-0 /dev/sdb other, unknown /dev/sdc other, unknown /dev/sdd other, unknown /dev/sde other, unknown /dev/sdf other, unknown /dev/sdg other, unknown /dev/sdh other, unknown /dev/sdi : /dev/sdi2 other, LVM2_member /dev/sdi1 other, xfs, mounted on /boot [root@cephqe3 ~]#
Could you please run manually /etc/cron.hourly/subman and show the output ? I suspect it fails for some reason. Also note that although /dev/sda is mounted on /var/lib/ceph/osd, it is not recognized as an osd partition, but that's probably a different problem. Even if there are no OSD active, the /etc/rhsm/facts/ceph_usage.facts file should exist. If subman runs ok and creates /etc/rhsm/facts/ceph_usage.facts, it suggests cron is not running for some reason.
Looks like the script is failing. [root@cephqe3 ~]# python /etc/cron.hourly/subman Traceback (most recent call last): File "/etc/cron.hourly/subman", line 20, in <module> """.format(used=used/(1024*1024*1024))) KeyError: '\n"band' [root@cephqe3 ~]#
I opened http://tracker.ceph.com/issues/16961 and will fix this immediately.
The jewel backport that fixes this problem is at https://github.com/ceph/ceph/pull/10625/commits
@Ken, I suspect this needs to go in the release about to be published ?
That's right, although this bug has been re-targeted to RHCS 2.1, now, so we'll handle this after 2.0 GAs.
@Neil is acceptable for the next Ceph release to be published without the ability to figure out how much space is used by a given OSD ? I'm under the impression that it is key to implement a business model where clients are billed on the actual space they use in a Ceph cluster, reason why I double check.
I've already moved this BZ out of the 2.0 target as it is too late to land. We should try to get this in for 2.1 or 2.2
This was fixed upstream in 10.2.3.
discussed at program meeting, Ian would like more time to review.
Thomas, this was thought to have been fixed. Alfredo, please review.
This is not running by cron because the file is installed with non-execute permissions: # ls -alh /etc/cron.hourly/ -rw-r--r--. 1 root root 633 May 16 17:50 subman When cron runs, it runs scripts like: ./subman So the above would not work. I think that because the script is not executable, cron will not even try (nothing shows up in the logs). The script needs to have executable permissions at install time. I am not proficient in packaging but seems like the spec file would need to indicate executable builds, unlike how it currently exists: %{_sysconfdir}/cron.hourly/subman Maybe with 0755 ? That is how the default script for cron.hourly exists: -rwxr-xr-x. 1 root root 392 Feb 23 2016 0anacron I see a couple of ways the spec file does this, again, might need someone else to help here making sure that permissions are set correctly. Ken, could you point me in the right direction here maybe?
Some additional info, after changing the permissions on subman to 0755: -rwxr-xr-x. 1 root root 633 May 16 17:50 /etc/cron.hourly/subman Logs show execution: May 24 13:01:01 magna086 run-parts(/etc/cron.hourly)[18679]: starting subman May 24 13:01:02 magna086 run-parts(/etc/cron.hourly)[18776]: finished subman And facts are populated: # ls -l /etc/rhsm/facts total 4 -rw-r--r--. 1 root root 28 May 24 13:01 ceph_usage.facts
I think we could do something this in the spec file, in the "%files osd" section: %attr(0755,-,-) %{_sysconfdir}/cron.hourly/subman Does 0750 also work? Maybe safer?
I think 0755 would be fine.
discussed a program meeting, Thomas to do PR today and back to ON_QA today, worst case tomorrow (25-May)
Upstream PR: https://github.com/ceph/ceph/pull/15270
Verified in build: ceph version 10.2.7-21.el7cp ll /etc/cron.hourly/ total 8 -rwxr-xr-x. 1 root root 392 Feb 23 2016 0anacron -rwxr-xr-x. 1 root root 633 May 25 21:31 subman ll /etc/rhsm/facts/ total 4 -rw-r--r--. 1 root root 28 May 26 07:01 ceph_usage.facts
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:1497