Bug 1365998

Summary: Incoherent OSD journal size display in the UI
Product: [Red Hat Storage] Red Hat Storage Console Reporter: Jean-Charles Lopez <jelopez>
Component: UIAssignee: Shubhendu Tripathi <shtripat>
Status: CLOSED ERRATA QA Contact: Lubos Trilety <ltrilety>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 2CC: japplewh, jelopez, julim, kchidamb, ltrilety, mbukatov, mkudlej, nthomas, rghatvis, sankarshan, shtripat, vsarmila
Target Milestone: ---   
Target Release: 2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: rhscon-core-0.0.44-1.el7scon.x86_64, rhscon-ceph-0.0.43-1.el7scon.x86_64, rhscon-ui-0.0.58-1.el7scon.noarch Doc Type: Bug Fix
Doc Text:
Previously, while importing a cluster with collocated journals through the console import cluster mechanism, the journal size used to incorrectly populate in the MongoDB database. Consequently, incorrect journal size and journal path were displayed in the OSD summary of the Host OSDs tab. With this update, the journal size and the journal path are displayed correctly in the OSD summary of the Host OSDs tab.
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-10-19 15:21:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1346350, 1353450, 1357777    
Attachments:
Description Flags
Cluster Node packages
none
RHSCON packages
none
Cluster configuration view
none
osd detail tab
none
Cluster Config Dev setup
none
OSD details from dev setup
none
osd detail tab - after volume creation none

Description Jean-Charles Lopez 2016-08-10 17:29:39 UTC
Description of problem:
I have manually deployed a ceph cluster with ansible. In this cluster, journal are collocated and 1GB in size. The journal size information varies from one window to another and is very often incorrect.
The cluster configuration window shows journal size = 5GB
The osd configuration window shows journal size = 5KB

All of this while the ceph.conf file does have osd_journal_size = 1024


Version-Release number of selected component (if applicable): 2.0 (0.0.39)


How reproducible:
100%

Steps to Reproduce:
1. Import an existing cluster
2. Go to the the cluster tab view
3. Go to the OSD view and click journal for osd.x

Actual results:
Cluster view shows journal size = 5GB
OSD journal detail shows 5KB


Expected results:
Cluster view should show 1GB or nothing
OSd view should show the actual correct size


Additional info:

Comment 4 Martin Bukatovic 2016-08-10 18:19:51 UTC
See these 2 related bug reports:

* BZ 1342969: OSD journal details provides incorrect journal size
* BZ 1346020: wrong filepaths reported on "Ceph Cluster Configuration"

As BZ 1342969 shows, the current implementation of ceph journal
size reporting is done in a way which returns wrong values in
most cases and as such doesn't make much sense.

And for the cluster configuration - I filed a related BZ 1346020 some time
ago - and the resolution was that the tab is not expected to show the actual
status, but rather:

> the intent was to show what the defaults are in the ceph.conf file (on a Mon)

and:

> It should display the default paths here (and basically it would be what's in
> the ceph.conf on the Mon).  Hardcoding would work if users are not changing
> the defaults. If you import a Cluster, this hardcoded paths could
> potentially not be the same.

which means that in this case, the BZ is still valid.

This may be related to the fact that as another BZ 1363999 notes, the
collocated journal setup was not originally considered at all which may be the
reason why the reporting is broken in such setup.

Comment 5 Martin Bukatovic 2016-08-10 18:22:42 UTC
Could you add a full list of rhscon*/ceph* package versions used[1], one for each
role in your cluster?

[1]: `rpm -qa rhscon*; rpm -qa ceph-*`

Comment 6 Jean-Charles Lopez 2016-08-10 19:31:24 UTC
The problem is that when someone will import a cluster, it may have collocated journals like most clusters out there, at least for a significant portion of the OSDs used for S3 and swift.

RBD storage often has separated journals on SSDs but this is just one of an imported cluster use cases.

(In reply to Martin Bukatovic from comment #4)
> See these 2 related bug reports:
> 
> * BZ 1342969: OSD journal details provides incorrect journal size
> * BZ 1346020: wrong filepaths reported on "Ceph Cluster Configuration"
> 
> As BZ 1342969 shows, the current implementation of ceph journal
> size reporting is done in a way which returns wrong values in
> most cases and as such doesn't make much sense.
> 
> And for the cluster configuration - I filed a related BZ 1346020 some time
> ago - and the resolution was that the tab is not expected to show the actual
> status, but rather:
> 
> > the intent was to show what the defaults are in the ceph.conf file (on a Mon)
> 
> and:
> 
> > It should display the default paths here (and basically it would be what's in
> > the ceph.conf on the Mon).  Hardcoding would work if users are not changing
> > the defaults. If you import a Cluster, this hardcoded paths could
> > potentially not be the same.
> 
> which means that in this case, the BZ is still valid.
> 
> This may be related to the fact that as another BZ 1363999 notes, the
> collocated journal setup was not originally considered at all which may be
> the
> reason why the reporting is broken in such setup.

Comment 7 Jean-Charles Lopez 2016-08-10 19:37:54 UTC
Attacching the two as asked.

(In reply to Martin Bukatovic from comment #5)
> Could you add a full list of rhscon*/ceph* package versions used[1], one for
> each
> role in your cluster?
> 
> [1]: `rpm -qa rhscon*; rpm -qa ceph-*`

Comment 8 Jean-Charles Lopez 2016-08-10 19:38:26 UTC
Created attachment 1189787 [details]
Cluster Node packages

Comment 9 Jean-Charles Lopez 2016-08-10 19:38:50 UTC
Created attachment 1189788 [details]
RHSCON packages

Comment 12 Nishanth Thomas 2016-08-11 05:29:57 UTC
The cluster configuration window doesn't represent the actual size of the journal. This tab is meant for all the default configurations maintained by USM, the journal size is meant to be the default journal size to be taken if no values provided by the user.

The actual size is displayed on the OSD tab. Here it is misbehaving due to collocated setup you have. This was not a requirement for USM as we always looking at journal on a separate disks which is the best practice for the performance cases.

If the collocated case is a widely used configuration, we need to mark this bug for the async release.

Comment 13 Jean-Charles Lopez 2016-08-11 15:04:40 UTC
(In reply to Nishanth Thomas from comment #12)
> The cluster configuration window doesn't represent the actual size of the
> journal. This tab is meant for all the default configurations maintained by
> USM, the journal size is meant to be the default journal size to be taken if
> no values provided by the user.

I'm very confused here. When you deploy a cluster, the end user is being prompted for the journal size he wants to use so at no point there is a hidden default value used for deployment.

The second problem is that even if you specify a value different than 5GB to deploy your cluster through the UI, let's say 1GB, this tab will still display 5GB.

The third problem is that if you import a cluster that has a journal size set outside of the UI during the ceph-ansible deploment or a manual deploytment, this tab will still show 5GB.

> 
> The actual size is displayed on the OSD tab. Here it is misbehaving due to
> collocated setup you have. This was not a requirement for USM as we always
> looking at journal on a separate disks which is the best practice for the
> performance cases.

Importing existing cluster is a supported use case and OSD configuration needs to be reported accurately in any case.

> 
> If the collocated case is a widely used configuration, we need to mark this
> bug for the async release.

We have already explained that this collocated journal is a heavily used case.

Comment 15 Nishanth Thomas 2016-08-12 05:41:58 UTC
(In reply to Jean-Charles Lopez from comment #13)
> (In reply to Nishanth Thomas from comment #12)
> > The cluster configuration window doesn't represent the actual size of the
> > journal. This tab is meant for all the default configurations maintained by
> > USM, the journal size is meant to be the default journal size to be taken if
> > no values provided by the user.
> 
> I'm very confused here. When you deploy a cluster, the end user is being
> prompted for the journal size he wants to use so at no point there is a
> hidden default value used for deployment.
> 
> The second problem is that even if you specify a value different than 5GB to
> deploy your cluster through the UI, let's say 1GB, this tab will still
> display 5GB.
> 
> The third problem is that if you import a cluster that has a journal size
> set outside of the UI during the ceph-ansible deploment or a manual
> deploytment, this tab will still show 5GB.
> 

The cluster - configuration tab has the all the USM defaults and that is the way it is meant to be. When prompted during the cluster creation or any other screens(for example the threshold configuration), user has the option to override this. Also user has the option to update this defaults(currently not allowed from the UI(in future, yes) but some them can be altered through config files)

> > 
> > The actual size is displayed on the OSD tab. Here it is misbehaving due to
> > collocated setup you have. This was not a requirement for USM as we always
> > looking at journal on a separate disks which is the best practice for the
> > performance cases.
> 
> Importing existing cluster is a supported use case and OSD configuration
> needs to be reported accurately in any case.
> 
> > 
> > If the collocated case is a widely used configuration, we need to mark this
> > bug for the async release.
> 
> We have already explained that this collocated journal is a heavily used
> case.

As Jeff mentioned, this is a sure candidate for async

Comment 18 Nishanth Thomas 2016-08-17 09:20:52 UTC
Ack

Comment 19 Shubhendu Tripathi 2016-09-08 08:27:00 UTC
Post discussion with Ju and acked from here below are the changes which would be done -
1. Remove default journal size from cluster -> config list
2. From backend correct value for journal path (target disk partition name) and actual journal size would be populated to UI as below

~~~~~~~~~~~~~~~~~~~~~~~
Device Path: /dev/vdb1
Capacity: 3.0 GB
Storage Profile: <name>
~~~~~~~~~~~~~~~~~~~~~~~

Comment 20 Ju Lim 2016-09-08 15:05:18 UTC
+1 Comment 19 (Shubendu Tripathi).  This is also mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1342969.

Comment 21 Karnan 2016-09-13 13:17:08 UTC
UI changes done as per comment 19

Comment 23 Lubos Trilety 2016-09-22 10:27:00 UTC
Created attachment 1203678 [details]
Cluster configuration view

Comment 24 Lubos Trilety 2016-09-22 10:27:38 UTC
Created attachment 1203680 [details]
osd detail tab

Comment 25 Lubos Trilety 2016-09-22 10:29:56 UTC
Tested on:
rhscon-core-selinux-0.0.43-1.el7scon.noarch
rhscon-core-0.0.43-1.el7scon.x86_64
rhscon-ui-0.0.57-1.el7scon.noarch
rhscon-ceph-0.0.42-1.el7scon.x86_64

It seems to be not changed, see attached screenshots
attachment 1203678 [details]
Cluster configuration view

attachment 1203680 [details]
osd detail tab

Comment 27 Shubhendu Tripathi 2016-09-22 11:40:33 UTC
Created attachment 1203690 [details]
Cluster Config Dev setup

Comment 28 Shubhendu Tripathi 2016-09-22 11:41:15 UTC
Created attachment 1203691 [details]
OSD details from dev setup

Comment 30 Lubos Trilety 2016-09-22 13:50:15 UTC
The problem on my side was, that 'yum update' doesn't really update skyring service. It has to be restarted manually before import is done.

Tested on:
rhscon-core-selinux-0.0.43-1.el7scon.noarch
rhscon-core-0.0.43-1.el7scon.x86_64
rhscon-ui-0.0.57-1.el7scon.noarch
rhscon-ceph-0.0.42-1.el7scon.x86_64

seems fine:
* journal size is not present on cluster configuration page
* there's correct device path and size on OSD details page in journal information

Comment 31 Lubos Trilety 2016-09-26 13:47:16 UTC
Created attachment 1204841 [details]
osd detail tab - after volume creation

When I created a volume there all journal paths were broken.

Comment 33 Shubhendu Tripathi 2016-09-30 06:08:04 UTC
The new patach takes care syncing journal details during pool creation properly now.

Comment 34 Lubos Trilety 2016-10-04 08:09:48 UTC
Tested with:
rhscon-core-0.0.45-1.el7scon.x86_64
rhscon-core-selinux-0.0.45-1.el7scon.noarch
rhscon-ceph-0.0.43-1.el7scon.x86_64
rhscon-ui-0.0.59-1.el7scon.noarch

Device path is correct even after a pool and/or RBD is created.

Comment 36 Shubhendu Tripathi 2016-10-17 11:30:39 UTC
doc-text looks good

Comment 37 errata-xmlrpc 2016-10-19 15:21:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2016:2082