Bug 1265435 - [RFE] Generate fact file for subman from ceph-disk and df
[RFE] Generate fact file for subman from ceph-disk and df
Status: CLOSED ERRATA
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: RADOS (Show other bugs)
1.3.0
Unspecified Unspecified
urgent Severity unspecified
: rc
: 2.3
Assigned To: Alfredo Deza
Vasishta
Erin Donnelly
: FutureFeature
Depends On:
Blocks: 1258382 1437916
  Show dependency treegraph
 
Reported: 2015-09-22 19:18 EDT by Neil Levine
Modified: 2017-07-30 11:09 EDT (History)
22 users (show)

See Also:
Fixed In Version: RHEL: ceph-10.2.7-21.el7cp Ubuntu: ceph_10.2.7-23redhat1
Doc Type: Enhancement
Doc Text:
.Subscription Manager now reports on the raw disk capacity available per OSD With this release, Red Hat Subscription Manager can report on the raw disk capacity available per OSD. To do so: ---- # subscription-manager facts ----
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-06-19 09:25:10 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
subman-facts output (4.65 KB, text/plain)
2016-08-08 08:41 EDT, Harish NV Rao
no flags Details
osd-subman-facts-output (4.71 KB, text/plain)
2016-08-08 08:48 EDT, Harish NV Rao
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Ceph Project Bug Tracker 14972 None None None 2016-03-03 10:43 EST
Ceph Project Bug Tracker 16961 None None None 2017-04-03 16:29 EDT
Ceph Project Bug Tracker 20074 None None None 2017-05-24 13:38 EDT

  None (edit)
Description Neil Levine 2015-09-22 19:18:46 EDT
The subscription-management utility wants to report on the available raw disk capacity available per OSD. We need a flag to ceph-disk which will produce a machine readable format (JSON to start) for parsing.
Comment 3 Neil Levine 2015-12-09 18:45:30 EST
Based on subsequent conversations, the proposal is for the Ceph OSD to create the fact file for rhsm to parse so it can report on total aggregate usage across all OSDs on any cluster the user deploys. This should probably be created each time ceph-disk is invoked.

This file would be in JSON format as described above.

I think the minimum number of facts we need are:

1. Total raw disk space available for Ceph on the host.
2. Used raw disk space available for Ceph on the host.

For completeness, while not used by rhsm in the first instance, we might want to add a total and used figure for each disk in the host to be explicit.
Comment 4 Neil Levine 2015-12-10 13:21:56 EST
From Barnaby:

Create a file "/etc/rhsm/facts/ceph_disk.facts". That file needs to contain json of
{
"band.storage.usage": <integer number of terabytes used on this node, such as 55>
}

for example:
{
"band.storage.usage": 55
}

as long as that file exists (and it can be named anything that ends in ".facts") the contents will be read in and added to the system facts. This can be verified by running 'subscription-manager facts' to ensure that your value is showing up properly.
Comment 5 Loic Dachary 2016-02-08 11:49:05 EST
ceph-disk list --format json

returns the information in JSON. 

$ ceph-disk list --format json | jq .
[
  {
    "path": "/dev/dm-0",
    "type": "other",
    "dmcrypt": {},
    "ptype": "unknown",
    "is_partition": false
  },
  {
    "path": "/dev/loop0",
    "type": "other",
    "dmcrypt": {},
    "ptype": "unknown",
    "is_partition": false
  },
  {
    "path": "/dev/loop1",
    "type": "other",
    "dmcrypt": {},
    "ptype": "unknown",
    "is_partition": false
  },
  {
    "path": "/dev/loop2",
    "type": "other",
    "dmcrypt": {},
    "ptype": "unknown",
    "is_partition": false
  },
  {
    "path": "/dev/loop3",
    "type": "other",
    "dmcrypt": {},
    "ptype": "unknown",
    "is_partition": false
  },
  {
    "path": "/dev/loop4",
    "type": "other",
    "dmcrypt": {},
    "ptype": "unknown",
    "is_partition": false
  },
  {
    "path": "/dev/loop5",
    "type": "other",
    "dmcrypt": {},
    "ptype": "unknown",
    "is_partition": false
  },
  {
    "path": "/dev/loop6",
    "type": "other",
    "dmcrypt": {},
    "ptype": "unknown",
    "is_partition": false
  },
  {
    "path": "/dev/loop7",
    "type": "other",
    "dmcrypt": {},
    "ptype": "unknown",
    "is_partition": false
  },
  {
    "path": "/dev/nbd0",
    "type": "other",
    "dmcrypt": {},
    "ptype": "unknown",
    "is_partition": false
  },
  {
    "path": "/dev/nbd1",
    "type": "other",
    "dmcrypt": {},
    "ptype": "unknown",
    "is_partition": false
  },
  {
    "path": "/dev/nbd10",
    "type": "other",
    "dmcrypt": {},
    "ptype": "unknown",
    "is_partition": false
  },
  {
    "path": "/dev/nbd11",
    "type": "other",
    "dmcrypt": {},
    "ptype": "unknown",
    "is_partition": false
  },
  {
    "path": "/dev/nbd12",
    "type": "other",
    "dmcrypt": {},
    "ptype": "unknown",
    "is_partition": false
  },
  {
    "path": "/dev/nbd13",
    "type": "other",
    "dmcrypt": {},
    "ptype": "unknown",
    "is_partition": false
  },
  {
    "path": "/dev/nbd14",
    "type": "other",
    "dmcrypt": {},
    "ptype": "unknown",
    "is_partition": false
  },
  {
    "path": "/dev/nbd15",
    "type": "other",
    "dmcrypt": {},
    "ptype": "unknown",
    "is_partition": false
  },
  {
    "path": "/dev/nbd2",
    "type": "other",
    "dmcrypt": {},
    "ptype": "unknown",
    "is_partition": false
  },
  {
    "path": "/dev/nbd3",
    "type": "other",
    "dmcrypt": {},
    "ptype": "unknown",
    "is_partition": false
  },
  {
    "path": "/dev/nbd4",
    "type": "other",
    "dmcrypt": {},
    "ptype": "unknown",
    "is_partition": false
  },
  {
    "path": "/dev/nbd5",
    "type": "other",
    "dmcrypt": {},
    "ptype": "unknown",
    "is_partition": false
  },
  {
    "path": "/dev/nbd6",
    "type": "other",
    "dmcrypt": {},
    "ptype": "unknown",
    "is_partition": false
  },
  {
    "path": "/dev/nbd7",
    "type": "other",
    "dmcrypt": {},
    "ptype": "unknown",
    "is_partition": false
  },
  {
    "path": "/dev/nbd8",
    "type": "other",
    "dmcrypt": {},
    "ptype": "unknown",
    "is_partition": false
  },
  {
    "path": "/dev/nbd9",
    "type": "other",
    "dmcrypt": {},
    "ptype": "unknown",
    "is_partition": false
  },
  {
    "path": "/dev/ram0",
    "type": "other",
    "dmcrypt": {},
    "ptype": "unknown",
    "is_partition": false
  },
  {
    "path": "/dev/ram1",
    "type": "other",
    "dmcrypt": {},
    "ptype": "unknown",
    "is_partition": false
  },
  {
    "path": "/dev/ram10",
    "type": "other",
    "dmcrypt": {},
    "ptype": "unknown",
    "is_partition": false
  },
  {
    "path": "/dev/ram11",
    "type": "other",
    "dmcrypt": {},
    "ptype": "unknown",
    "is_partition": false
  },
  {
    "path": "/dev/ram12",
    "type": "other",
    "dmcrypt": {},
    "ptype": "unknown",
    "is_partition": false
  },
  {
    "path": "/dev/ram13",
    "type": "other",
    "dmcrypt": {},
    "ptype": "unknown",
    "is_partition": false
  },
  {
    "path": "/dev/ram14",
    "type": "other",
    "dmcrypt": {},
    "ptype": "unknown",
    "is_partition": false
  },
  {
    "path": "/dev/ram15",
    "type": "other",
    "dmcrypt": {},
    "ptype": "unknown",
    "is_partition": false
  },
  {
    "path": "/dev/ram2",
    "type": "other",
    "dmcrypt": {},
    "ptype": "unknown",
    "is_partition": false
  },
  {
    "path": "/dev/ram3",
    "type": "other",
    "dmcrypt": {},
    "ptype": "unknown",
    "is_partition": false
  },
  {
    "path": "/dev/ram4",
    "type": "other",
    "dmcrypt": {},
    "ptype": "unknown",
    "is_partition": false
  },
  {
    "path": "/dev/ram5",
    "type": "other",
    "dmcrypt": {},
    "ptype": "unknown",
    "is_partition": false
  },
  {
    "path": "/dev/ram6",
    "type": "other",
    "dmcrypt": {},
    "ptype": "unknown",
    "is_partition": false
  },
  {
    "path": "/dev/ram7",
    "type": "other",
    "dmcrypt": {},
    "ptype": "unknown",
    "is_partition": false
  },
  {
    "path": "/dev/ram8",
    "type": "other",
    "dmcrypt": {},
    "ptype": "unknown",
    "is_partition": false
  },
  {
    "path": "/dev/ram9",
    "type": "other",
    "dmcrypt": {},
    "ptype": "unknown",
    "is_partition": false
  },
  {
    "path": "/dev/sda",
    "partitions": [
      {
        "dmcrypt": {},
        "uuid": null,
        "mount": "/",
        "ptype": null,
        "is_partition": true,
        "path": "/dev/sda1",
        "type": "other"
      },
      {
        "dmcrypt": {},
        "uuid": null,
        "ptype": null,
        "is_partition": true,
        "path": "/dev/sda2",
        "type": "swap"
      },
      {
        "dmcrypt": {},
        "uuid": null,
        "mount": "/home",
        "ptype": null,
        "is_partition": true,
        "path": "/dev/sda3",
        "type": "other"
      }
    ]
  },
  {
    "path": "/dev/sr0",
    "type": "other",
    "dmcrypt": {},
    "ptype": "unknown",
    "is_partition": false
  }
]
Comment 6 Neil Levine 2016-02-08 14:39:33 EST
The output doesn't seem to show any of the usage data for the disks or partitions though ?

To generate a fact file for subscription-manager do we need to composite ceph-desk and a /bin/df ?
Comment 7 Loic Dachary 2016-02-09 10:01:54 EST
ceph-disk mostly limits itself to information that can't be conveniently obtained by other means. Once it is known that a given partition has a given role in Ceph, usage statistics or can be obtained either from df or from an existing plugin already equipped to present file system related information to the caller.
Comment 8 Neil Levine 2016-02-09 15:39:39 EST
OK. Changing the subject of this RFE.

We need to write a script which runs in cron every 4 hours and:

1. Determines which devices are being used by Ceph OSDs, not including journals, as listed by ceph-disk
2. Determines the used space on those devices.
3. Creates a fact file in the format detailed above (see comment 4 and URL in comment 2)
Comment 9 Loic Dachary 2016-02-27 04:17:45 EST
@Neil : I believe ceph-disk already provides 1. Please let me know if you think it needs to be adapted.
Comment 11 Loic Dachary 2016-03-01 21:59:44 EST
Side note: I don't know enough of the context in which the script in Comment 8 should run to usefully comment on its implementation.
Comment 12 Ken Dreyer (Red Hat) 2016-03-02 17:51:54 EST
I've posted an overview to ceph-devel: http://thread.gmane.org/gmane.comp.file-systems.ceph.devel/29924
Comment 16 Loic Dachary 2016-06-08 05:53:55 EDT
Although I've implemented to script that reports the fact related to Ceph, I think you should ask for review to someone who is familiar with the subscription-manager (which I'm not).
Comment 19 Harish NV Rao 2016-08-08 08:41 EDT
Created attachment 1188728 [details]
subman-facts output
Comment 20 Harish NV Rao 2016-08-08 08:48 EDT
Created attachment 1188738 [details]
osd-subman-facts-output
Comment 22 Loic Dachary 2016-08-08 11:53:17 EDT
What is the output of ceph-disk list on the machine on which subscription-manager facts is run ? I suggest you verify if /etc/cron.hourly/subman exists on the machine (it should if the osd package is installed0 and if /etc/rhsm/facts/ceph_usage.facts exists (it should, one hour after the installation of the osd package).
Comment 24 Harish NV Rao 2016-08-08 12:29:10 EDT
Here are the details.

[root@cephqe3 ~]# rpm -qa | grep ceph
python-cephfs-10.2.2-33.el7cp.x86_64
ceph-base-10.2.2-33.el7cp.x86_64
ceph-selinux-10.2.2-33.el7cp.x86_64
ceph-osd-10.2.2-33.el7cp.x86_64
libcephfs1-10.2.2-33.el7cp.x86_64
ceph-common-10.2.2-33.el7cp.x86_64

[root@cephqe3 ~]# ll  /etc/cron.hourly/subman
-rw-r--r--. 1 root root 550 Aug  2 20:36 /etc/cron.hourly/subman

[root@cephqe3 ~]# ll /etc/rhsm/facts/ceph_usage.facts
ls: cannot access /etc/rhsm/facts/ceph_usage.facts: No such file or directory

[root@cephqe3 ~]# ll /etc/rhsm/facts/
total 0
[root@cephqe3 ~]# ll /etc/rhsm/
total 8
drwxr-xr-x. 2 root root   27 Jul 26 15:27 ca
drwxr-xr-x. 2 root root    6 Oct 13  2015 facts
-rw-r--r--. 1 root root 1492 Oct 13  2015 logging.conf
drwxr-xr-x. 2 root root    6 Oct 13  2015 pluginconf.d
-rw-r--r--. 1 root root 1659 Oct 13  2015 rhsm.conf

[root@cephqe3 ~]# ceph-disk list
/dev/dm-0 other, xfs, mounted on /
/dev/dm-1 swap, swap
/dev/dm-2 other, xfs, mounted on /home
/dev/sda :
 /dev/sda1 other, xfs, mounted on /var/lib/ceph/osd/master-0
/dev/sdb other, unknown
/dev/sdc other, unknown
/dev/sdd other, unknown
/dev/sde other, unknown
/dev/sdf other, unknown
/dev/sdg other, unknown
/dev/sdh other, unknown
/dev/sdi :
 /dev/sdi2 other, LVM2_member
 /dev/sdi1 other, xfs, mounted on /boot
[root@cephqe3 ~]#
Comment 25 Loic Dachary 2016-08-08 12:40:54 EDT
Could you please run manually /etc/cron.hourly/subman and show the output ? I suspect it fails for some reason. Also note that although /dev/sda is mounted on /var/lib/ceph/osd, it is not recognized as an osd partition, but that's probably a different problem. Even if there are no OSD active, the /etc/rhsm/facts/ceph_usage.facts file should exist. If subman runs ok and creates /etc/rhsm/facts/ceph_usage.facts, it suggests cron is not running for some reason.
Comment 26 Harish NV Rao 2016-08-08 12:49:19 EDT
Looks like the script is failing.

[root@cephqe3 ~]# python  /etc/cron.hourly/subman 
Traceback (most recent call last):
  File "/etc/cron.hourly/subman", line 20, in <module>
    """.format(used=used/(1024*1024*1024)))
KeyError: '\n"band'
[root@cephqe3 ~]#
Comment 27 Loic Dachary 2016-08-08 15:49:23 EDT
I opened http://tracker.ceph.com/issues/16961 and will fix this immediately.
Comment 28 Loic Dachary 2016-08-09 07:43:42 EDT
The jewel backport that fixes this problem is at https://github.com/ceph/ceph/pull/10625/commits
Comment 29 Loic Dachary 2016-08-09 07:45:22 EDT
@Ken, I suspect this needs to go in the release about to be published ?
Comment 30 Ken Dreyer (Red Hat) 2016-08-09 09:43:02 EDT
That's right, although this bug has been re-targeted to RHCS 2.1, now, so we'll handle this after 2.0 GAs.
Comment 34 Loic Dachary 2016-08-10 02:46:08 EDT
@Neil is acceptable for the next Ceph release to be published without the ability to figure out how much space is used by a given OSD ? I'm under the impression that it is key to implement a business model where clients are billed on the actual space they use in a Ceph cluster, reason why I double check.
Comment 35 Neil Levine 2016-08-10 13:36:31 EDT
I've already moved this BZ out of the 2.0 target as it is too late to land. We should try to get this in for 2.1 or 2.2
Comment 40 Josh Durgin 2017-03-31 21:57:13 EDT
This was fixed upstream in 10.2.3.
Comment 43 John Poelstra 2017-05-17 11:08:13 EDT
discussed at program meeting, Ian would like more time to review.
Comment 47 Ian Colle 2017-05-24 01:32:25 EDT
Thomas, this was thought to have been fixed. Alfredo, please review.
Comment 48 Alfredo Deza 2017-05-24 08:18:56 EDT
This is not running by cron because the file is installed with non-execute permissions:

    # ls -alh /etc/cron.hourly/
    -rw-r--r--.   1 root root  633 May 16 17:50 subman


When cron runs, it runs scripts like:

    ./subman

So the above would not work. I think that because the script is not executable, cron will not even try (nothing shows up in the logs).

The script needs to have executable permissions at install time. I am not proficient in packaging but seems like the spec file would need to indicate executable builds, unlike how it currently exists:

    %{_sysconfdir}/cron.hourly/subman

Maybe with 0755 ? That is how the default script for cron.hourly exists:

    -rwxr-xr-x.   1 root root  392 Feb 23  2016 0anacron

I see a couple of ways the spec file does this, again, might need someone else to help here making sure that permissions are set correctly. Ken, could you point me in the right direction here maybe?
Comment 49 Alfredo Deza 2017-05-24 09:59:48 EDT
Some additional info, after changing the permissions on subman to 0755:

    -rwxr-xr-x. 1 root root 633 May 16 17:50 /etc/cron.hourly/subman

Logs show execution:

    May 24 13:01:01 magna086 run-parts(/etc/cron.hourly)[18679]: starting subman
May 24 13:01:02 magna086 run-parts(/etc/cron.hourly)[18776]: finished subman


And facts are populated:


    # ls -l /etc/rhsm/facts
    total 4
    -rw-r--r--. 1 root root 28 May 24 13:01 ceph_usage.facts
Comment 50 tserlin 2017-05-24 10:51:43 EDT
I think we could do something this in the spec file, in the "%files osd" section:

%attr(0755,-,-) %{_sysconfdir}/cron.hourly/subman


Does 0750 also work? Maybe safer?
Comment 51 Ken Dreyer (Red Hat) 2017-05-24 10:54:01 EDT
I think 0755 would be fine.
Comment 52 John Poelstra 2017-05-24 11:11:33 EDT
discussed a program meeting, Thomas to do PR today and back to ON_QA today, worst case tomorrow (25-May)
Comment 53 tserlin 2017-05-24 13:26:11 EDT
Upstream PR: https://github.com/ceph/ceph/pull/15270
Comment 58 Tejas 2017-05-26 03:45:32 EDT
Verified in build:
ceph version 10.2.7-21.el7cp

ll /etc/cron.hourly/
total 8
-rwxr-xr-x. 1 root root 392 Feb 23  2016 0anacron
-rwxr-xr-x. 1 root root 633 May 25 21:31 subman

ll /etc/rhsm/facts/
total 4
-rw-r--r--. 1 root root 28 May 26 07:01 ceph_usage.facts
Comment 60 errata-xmlrpc 2017-06-19 09:25:10 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1497

Note You need to log in before you can comment on or make changes to this bug.