Description of problem: I have manually deployed a ceph cluster with ansible. In this cluster, journal are collocated and 1GB in size. The journal size information varies from one window to another and is very often incorrect. The cluster configuration window shows journal size = 5GB The osd configuration window shows journal size = 5KB All of this while the ceph.conf file does have osd_journal_size = 1024 Version-Release number of selected component (if applicable): 2.0 (0.0.39) How reproducible: 100% Steps to Reproduce: 1. Import an existing cluster 2. Go to the the cluster tab view 3. Go to the OSD view and click journal for osd.x Actual results: Cluster view shows journal size = 5GB OSD journal detail shows 5KB Expected results: Cluster view should show 1GB or nothing OSd view should show the actual correct size Additional info:
See these 2 related bug reports: * BZ 1342969: OSD journal details provides incorrect journal size * BZ 1346020: wrong filepaths reported on "Ceph Cluster Configuration" As BZ 1342969 shows, the current implementation of ceph journal size reporting is done in a way which returns wrong values in most cases and as such doesn't make much sense. And for the cluster configuration - I filed a related BZ 1346020 some time ago - and the resolution was that the tab is not expected to show the actual status, but rather: > the intent was to show what the defaults are in the ceph.conf file (on a Mon) and: > It should display the default paths here (and basically it would be what's in > the ceph.conf on the Mon). Hardcoding would work if users are not changing > the defaults. If you import a Cluster, this hardcoded paths could > potentially not be the same. which means that in this case, the BZ is still valid. This may be related to the fact that as another BZ 1363999 notes, the collocated journal setup was not originally considered at all which may be the reason why the reporting is broken in such setup.
Could you add a full list of rhscon*/ceph* package versions used[1], one for each role in your cluster? [1]: `rpm -qa rhscon*; rpm -qa ceph-*`
The problem is that when someone will import a cluster, it may have collocated journals like most clusters out there, at least for a significant portion of the OSDs used for S3 and swift. RBD storage often has separated journals on SSDs but this is just one of an imported cluster use cases. (In reply to Martin Bukatovic from comment #4) > See these 2 related bug reports: > > * BZ 1342969: OSD journal details provides incorrect journal size > * BZ 1346020: wrong filepaths reported on "Ceph Cluster Configuration" > > As BZ 1342969 shows, the current implementation of ceph journal > size reporting is done in a way which returns wrong values in > most cases and as such doesn't make much sense. > > And for the cluster configuration - I filed a related BZ 1346020 some time > ago - and the resolution was that the tab is not expected to show the actual > status, but rather: > > > the intent was to show what the defaults are in the ceph.conf file (on a Mon) > > and: > > > It should display the default paths here (and basically it would be what's in > > the ceph.conf on the Mon). Hardcoding would work if users are not changing > > the defaults. If you import a Cluster, this hardcoded paths could > > potentially not be the same. > > which means that in this case, the BZ is still valid. > > This may be related to the fact that as another BZ 1363999 notes, the > collocated journal setup was not originally considered at all which may be > the > reason why the reporting is broken in such setup.
Attacching the two as asked. (In reply to Martin Bukatovic from comment #5) > Could you add a full list of rhscon*/ceph* package versions used[1], one for > each > role in your cluster? > > [1]: `rpm -qa rhscon*; rpm -qa ceph-*`
Created attachment 1189787 [details] Cluster Node packages
Created attachment 1189788 [details] RHSCON packages
The cluster configuration window doesn't represent the actual size of the journal. This tab is meant for all the default configurations maintained by USM, the journal size is meant to be the default journal size to be taken if no values provided by the user. The actual size is displayed on the OSD tab. Here it is misbehaving due to collocated setup you have. This was not a requirement for USM as we always looking at journal on a separate disks which is the best practice for the performance cases. If the collocated case is a widely used configuration, we need to mark this bug for the async release.
(In reply to Nishanth Thomas from comment #12) > The cluster configuration window doesn't represent the actual size of the > journal. This tab is meant for all the default configurations maintained by > USM, the journal size is meant to be the default journal size to be taken if > no values provided by the user. I'm very confused here. When you deploy a cluster, the end user is being prompted for the journal size he wants to use so at no point there is a hidden default value used for deployment. The second problem is that even if you specify a value different than 5GB to deploy your cluster through the UI, let's say 1GB, this tab will still display 5GB. The third problem is that if you import a cluster that has a journal size set outside of the UI during the ceph-ansible deploment or a manual deploytment, this tab will still show 5GB. > > The actual size is displayed on the OSD tab. Here it is misbehaving due to > collocated setup you have. This was not a requirement for USM as we always > looking at journal on a separate disks which is the best practice for the > performance cases. Importing existing cluster is a supported use case and OSD configuration needs to be reported accurately in any case. > > If the collocated case is a widely used configuration, we need to mark this > bug for the async release. We have already explained that this collocated journal is a heavily used case.
(In reply to Jean-Charles Lopez from comment #13) > (In reply to Nishanth Thomas from comment #12) > > The cluster configuration window doesn't represent the actual size of the > > journal. This tab is meant for all the default configurations maintained by > > USM, the journal size is meant to be the default journal size to be taken if > > no values provided by the user. > > I'm very confused here. When you deploy a cluster, the end user is being > prompted for the journal size he wants to use so at no point there is a > hidden default value used for deployment. > > The second problem is that even if you specify a value different than 5GB to > deploy your cluster through the UI, let's say 1GB, this tab will still > display 5GB. > > The third problem is that if you import a cluster that has a journal size > set outside of the UI during the ceph-ansible deploment or a manual > deploytment, this tab will still show 5GB. > The cluster - configuration tab has the all the USM defaults and that is the way it is meant to be. When prompted during the cluster creation or any other screens(for example the threshold configuration), user has the option to override this. Also user has the option to update this defaults(currently not allowed from the UI(in future, yes) but some them can be altered through config files) > > > > The actual size is displayed on the OSD tab. Here it is misbehaving due to > > collocated setup you have. This was not a requirement for USM as we always > > looking at journal on a separate disks which is the best practice for the > > performance cases. > > Importing existing cluster is a supported use case and OSD configuration > needs to be reported accurately in any case. > > > > > If the collocated case is a widely used configuration, we need to mark this > > bug for the async release. > > We have already explained that this collocated journal is a heavily used > case. As Jeff mentioned, this is a sure candidate for async
Ack
Post discussion with Ju and acked from here below are the changes which would be done - 1. Remove default journal size from cluster -> config list 2. From backend correct value for journal path (target disk partition name) and actual journal size would be populated to UI as below ~~~~~~~~~~~~~~~~~~~~~~~ Device Path: /dev/vdb1 Capacity: 3.0 GB Storage Profile: <name> ~~~~~~~~~~~~~~~~~~~~~~~
+1 Comment 19 (Shubendu Tripathi). This is also mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1342969.
UI changes done as per comment 19
Created attachment 1203678 [details] Cluster configuration view
Created attachment 1203680 [details] osd detail tab
Tested on: rhscon-core-selinux-0.0.43-1.el7scon.noarch rhscon-core-0.0.43-1.el7scon.x86_64 rhscon-ui-0.0.57-1.el7scon.noarch rhscon-ceph-0.0.42-1.el7scon.x86_64 It seems to be not changed, see attached screenshots attachment 1203678 [details] Cluster configuration view attachment 1203680 [details] osd detail tab
Created attachment 1203690 [details] Cluster Config Dev setup
Created attachment 1203691 [details] OSD details from dev setup
The problem on my side was, that 'yum update' doesn't really update skyring service. It has to be restarted manually before import is done. Tested on: rhscon-core-selinux-0.0.43-1.el7scon.noarch rhscon-core-0.0.43-1.el7scon.x86_64 rhscon-ui-0.0.57-1.el7scon.noarch rhscon-ceph-0.0.42-1.el7scon.x86_64 seems fine: * journal size is not present on cluster configuration page * there's correct device path and size on OSD details page in journal information
Created attachment 1204841 [details] osd detail tab - after volume creation When I created a volume there all journal paths were broken.
The new patach takes care syncing journal details during pool creation properly now.
Tested with: rhscon-core-0.0.45-1.el7scon.x86_64 rhscon-core-selinux-0.0.45-1.el7scon.noarch rhscon-ceph-0.0.43-1.el7scon.x86_64 rhscon-ui-0.0.59-1.el7scon.noarch Device path is correct even after a pool and/or RBD is created.
doc-text looks good
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2016:2082