1342969 – OSD journal details provides incorrect journal size

Bug 1342969 - OSD journal details provides incorrect journal size

Summary: OSD journal details provides incorrect journal size

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Storage Console
Classification:	Red Hat Storage
Component:	UI
Sub Component:
Version:	2
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	medium
Target Milestone:	---
Target Release:	2
Assignee:	Karnan
QA Contact:	Filip Balák
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1366006 (view as bug list)
Depends On:
Blocks:	Console-2-Async
TreeView+	depends on / blocked

Reported:	2016-06-06 08:53 UTC by Martin Kudlej
Modified:	2016-10-19 15:20 UTC (History)
CC List:	12 users (show)
Fixed In Version:	rhscon-ceph-0.0.43-1.el7scon.x86_64, rhscon-ui-0.0.58-1.el7scon.noarch
Doc Type:	Bug Fix
Doc Text:	The journal device details were not being read and synchronized properly during pool creation and importing cluster workflows. Additionally, the actual target device partition details for the journal failed to display. Only the link data returned from Ceph displayed as is. Consequently, the journal size for the OSDs defaulted to 5GB and the journal path displayed the link data returned from Ceph. This behaviour is now fixed to fetch the actual device details for the OSD journal during pool creation and importing cluster workflows. As a result, the journal details displays as expected in the UI with the correct journal size and correct target partition path as journal path.
Clone Of:
Environment:
Last Closed:	2016-10-19 15:20:11 UTC
Embargoed:

Attachments	(Terms of Use)
Cluster object details - osd tab (1.01 MB, image/png) 2016-06-16 15:25 UTC, Ju Lim	no flags	Details
screenshot 3: journal details (failed qe screenshot) (57.26 KB, image/png) 2016-07-20 08:12 UTC, Martin Bukatovic	no flags	Details
Screenshot: Incorrect device path (153.24 KB, image/png) 2016-09-22 13:34 UTC, Filip Balák	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Gerrithub.io	294928	None	None	None	2016-09-20 09:58:19 UTC
Gerrithub.io	295095	None	None	None	2016-09-20 09:58:58 UTC
Gerrithub.io	295992	None	None	None	2016-09-30 06:10:40 UTC
Red Hat Product Errata	RHSA-2016:2082	normal	SHIPPED_LIVE	Moderate: Red Hat Storage Console 2 security and bug fix update	2017-04-18 19:29:02 UTC

Description Martin Kudlej 2016-06-06 08:53:57 UTC

Description of problem:
As you can see at screenshot journal size is 5GB. But there is info about disk with journal with size 10GB in UI in section "Journal on osd.0".
I think this is confusing.

I think there are 2 options how to change it:
1) change UI and show info about partition with journal. In this case "Device Path:/dev/vdb1; Capacity: 5GB" or
2) change name of this section from "Journal on osd.0" to "Device dedicated for partitions with journal for osd.0". I know that this is long title, but current title is confusing.


Version-Release number of selected component (if applicable):
ceph-ansible-1.0.5-15.el7scon.noarch
ceph-installer-1.0.11-1.el7scon.noarch
perl-Scalar-List-Utils-1.27-248.el7.x86_64
rhscon-ceph-0.0.20-1.el7scon.x86_64
rhscon-core-0.0.21-1.el7scon.x86_64
rhscon-ui-0.0.35-1.el7scon.noarch

How reproducible:
100%

Steps to Reproduce:
1. create cluster
2. check osd section of cluster

Actual results:
Section "Journal on osd.x" contains confusing info.

Expected results:
Title will be changed or section content so user is clear about journal size and its location.

Comment 2 :Deb 2016-06-14 15:41:38 UTC

Need clarification from UXD team about the way to go.

Comment 3 Ju Lim 2016-06-16 15:25:27 UTC

Created attachment 1168764 [details]
Cluster object details - osd tab

Comment 4 Ju Lim 2016-06-16 15:26:48 UTC

I agree that the label "Journal on osd.x" is confusing, as this is the related to journal for osd.x.  Instead, I would suggest a different title/label, ie. "Journal for osd.x"

Comment 5 :Deb 2016-06-24 12:10:32 UTC

So this is basically a rephrasing... Or am I missing out on any technical change that is required?

Comment 6 Ju Lim 2016-06-25 04:59:50 UTC

@Deb - you are correct in that this is just rephrasing of the label.  No other technical change needed beyond changing the text.

Comment 7 Martin Bukatovic 2016-07-20 07:50:43 UTC

Checking with
============

Comment 8 Martin Bukatovic 2016-07-20 08:03:57 UTC

I think that the issue here was not understood properly by the dev team
and labeling it as a mere design issue is incorrect.

There is no way I could declare this as resolved.

See expanded explanation below.

Checking with
=============

On RHSC 2.0 server machine:

rhscon-ui-0.0.48-1.el7scon.noarch
rhscon-core-selinux-0.0.34-1.el7scon.noarch
rhscon-ceph-0.0.33-1.el7scon.x86_64
rhscon-core-0.0.34-1.el7scon.x86_64
ceph-installer-1.0.14-1.el7scon.noarch
ceph-ansible-1.0.5-28.el7scon.noarch

On Ceph OSD machine:

rhscon-core-selinux-0.0.34-1.el7scon.noarch
rhscon-agent-0.0.15-1.el7scon.noarch
ceph-selinux-10.2.2-22.el7cp.x86_64
ceph-common-10.2.2-22.el7cp.x86_64
ceph-base-10.2.2-22.el7cp.x86_64
ceph-osd-10.2.2-22.el7cp.x86_64

How reproducible
================

100 %

Steps to Reproduce
==================

1. Install RHSC 2.0 following the documentation.
2. Make sure future storage machines (OSD role) have a dedicated,
   at least 10 GB disk for ceph journal.
3. Accept few nodes for the ceph cluster.
4. Create new ceph cluster named 'alpha', selecting default journal
   size 5GB (the crucial detail here is that this 5GB default is
   smaller compared to the side of dedicated journal devices on
   OSD machines).
5. When the cluster is created, go to "OSDs" tab of the cluster
   page and select one osd from the list there.

Actual results
==============

Selecting osd.0 from the list, I see osd.0 details in the right
sidebar. When I click on 'Journal for osd.0' link there, the details
expand further and I see the following info:

~~~
Device Path: /var/lib/ceph/osd/alpha-0/journal
Capacity: 10.0 GB
Storage Profile: default
~~~

Which is not correct (checking on the machine hosting the osd):

~~~
# ceph-disk list
/dev/vda :
 /dev/vda1 other, swap
 /dev/vda2 other, xfs, mounted on /
/dev/vdb :
 /dev/vdb1 ceph journal, for /dev/vdc1
/dev/vdc :
 /dev/vdc1 ceph data, active, cluster alpha, osd.0, journal /dev/vdb1
# lsblk /dev/vdb
NAME   MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vdb    253:16   0  10G  0 disk 
└─vdb1 253:17   0   5G  0 part 
~~~

As we can see here, the journal for osd.0 is hosted on 5GB partition.

The confusion here seems to be caused by the fact that the 5GB journal
partition is hosted on 10GB device. But the size of the device on which the
journal partition is stored on should not be confused with the size of the
journal itself.

Expected results
================

The details of 'Journal for osd.0' is reported as follows:

~~~
Device Path: /var/lib/ceph/osd/alpha-0/journal
Capacity: 5.0 GB
Storage Profile: default
~~~

So that the size of journal of osd.0 reported here matches the size of the
journal partition.

Comment 9 Martin Bukatovic 2016-07-20 08:07:57 UTC

(In reply to Martin Kudlej from comment #0)
> 2) change name of this section from "Journal on osd.0" to "Device dedicated
> for partitions with journal for osd.0". I know that this is long title, but
> current title is confusing.

This would make only sense when the actual size of the journal partition is 
provided as well.

Showing just the size of disk on which RHSC 2.0 have the journal partition
created is absolutely pointless from the ceph admin point of view.

Comment 10 Martin Bukatovic 2016-07-20 08:10:10 UTC

(In reply to :Deb from comment #5)
> So this is basically a rephrasing... Or am I missing out on any technical
> change that is required?

See the description in comment 8.

Comment 11 Martin Bukatovic 2016-07-20 08:12:50 UTC

Created attachment 1181959 [details]
screenshot 3: journal details (failed qe screenshot)

Comment 13 Nishanth Thomas 2016-08-11 05:30:56 UTC

*** Bug 1366006 has been marked as a duplicate of this bug. ***

Comment 14 Ju Lim 2016-08-24 18:38:10 UTC

JC Lopez: 1342969 Journal information must be accurate and journal related information must be amended (Cluster Configuration Window Default Journal Size element should be removed)

Comment 15 Ju Lim 2016-08-24 18:48:49 UTC

From reading this bug and issues mentioned, there are a couple things called out.  Journal partition and the disk the journal partition is sitting on are not reporting size correctly in the OSD tabbed view.  JC is asking for the default journal size not to be shown in the Cluster object details view > Configuration tab, which is showing the default Cluster journal size.  One is incorrect reporting, and the latter is probably text that does not clearly explain what it is, hence why JC Lopez is asking for its removal.

Comment 16 Jeff Applewhite 2016-08-25 17:46:57 UTC

1) the size of the symlink pointing to the journal is not relevant, we must display the actual journal size even in cases where we have colocation. In this case we show the size of the partition where the journal resides in GB with 1 decimal point ex. 2.5
2) we should remove the cluster config display of the default journal size - it's misleading

Comment 17 Jeff Applewhite 2016-08-25 18:33:17 UTC

we should also disable auto-expansion of the cluster with new OSDs when we detect colocation

Comment 18 Shubhendu Tripathi 2016-08-26 05:51:52 UTC

@Jeff, regarding journal size, say symlink direct to /dev/vdb1 partition of disk /dev/vdb, which is carved out of a 10GB disk (/dev/vdb) and journal size is 3.0 GB. Can we show the details something like

Device Path: /dev/vdb1
Capacity: 3.0 GB
Storage Profile: <name>

I have a patch https://review.gerrithub.io/#/c/287136/ to get the actual partition name of the journal and size it can show as above.

Is this as expected ??

Comment 19 Jeff Applewhite 2016-08-26 20:04:49 UTC

Yes that is correct Shubhendu - perfect

Comment 20 Shubhendu Tripathi 2016-09-08 08:35:29 UTC

The fix for https://bugzilla.redhat.com/show_bug.cgi?id=1365998 would take care correct display of journal details like path and size.

Also the default journal size would be removed from cluster -> config list.

Regarding comment#17 from Jeff, below are the points finalized post discussion with Ju and acked

1. Update documentation that auto expansion would be disabled for a cluster imported in RHS Console with co-located journals
2. During import cluster flow, there would be a step added saying "Disabling auto expand for the cluster", if co-located journals are discovered on the cluster getting imported
3. A highlighted text saying "auto expansion disabled" would be added next to Cluster Name in cluster object page as below 

"Clusters >> Ceph (*auto expansion disabled)"

Comment 21 Ju Lim 2016-09-08 15:03:33 UTC

+1 to Comment 20 (Shubhendu Tripathi has written above).

Comment 22 Ju Lim 2016-09-08 15:04:27 UTC

+1 to Comment 18 (Shubhendu Tripathi has written above).

Comment 23 Shubhendu Tripathi 2016-09-20 09:59:55 UTC

@Rakesh, kindly add required documentation for point-1 in comment#20.

Comment 25 Filip Balák 2016-09-22 13:34:43 UTC

Created attachment 1203763 [details]
Screenshot: Incorrect device path

Comment 26 Filip Balák 2016-09-22 13:36:57 UTC

Tested with
Server:
ceph-ansible-1.0.5-33.el7scon.noarch
ceph-installer-1.0.15-2.el7scon.noarch
graphite-web-0.9.12-8.1.el7.noarch
rhscon-ceph-0.0.42-1.el7scon.x86_64
rhscon-core-selinux-0.0.43-1.el7scon.noarch
rhscon-core-0.0.43-1.el7scon.x86_64
rhscon-ui-0.0.57-1.el7scon.noarch

Node:
calamari-server-1.4.8-1.el7cp.x86_64
ceph-base-10.2.2-41.el7cp.x86_64
ceph-common-10.2.2-41.el7cp.x86_64
ceph-mon-10.2.2-41.el7cp.x86_64
ceph-osd-10.2.2-41.el7cp.x86_64
ceph-selinux-10.2.2-41.el7cp.x86_64
libcephfs1-10.2.2-41.el7cp.x86_64
python-cephfs-10.2.2-41.el7cp.x86_64
rhscon-agent-0.0.19-1.el7scon.noarch
rhscon-core-selinux-0.0.43-1.el7scon.noarch

Capacity of Journal for osd.0 is shown as expected but Device Path is incorrect (/var/lib/ceph/osd/ceph-0/journal instead of /dev/vdb1 as seen on screenshot). --> Assigned

Comment 28 Shubhendu Tripathi 2016-09-22 13:52:00 UTC

The BZ https://bugzilla.redhat.com/show_bug.cgi?id=1365998 is verified by Lubos and its same details. Please collaborate with Lubos and see if its actually an issue still..

Comment 30 Shubhendu Tripathi 2016-09-26 04:54:37 UTC

I am not able to simulate this and works fine for me.
Also as https://bugzilla.redhat.com/show_bug.cgi?id=1365998 is verified, ideally I dont an issue. Can you please verify with Lubos's setup and verify this BZ?

Comment 31 Shubhendu Tripathi 2016-09-30 06:10:41 UTC

The issue was happening while pool creation and the latest patch takes care of syncing journal details properly now.

Comment 32 Filip Balák 2016-09-30 15:59:44 UTC

Tested with
Server:
ceph-ansible-1.0.5-34.el7scon.noarch
ceph-installer-1.0.15-2.el7scon.noarch
graphite-web-0.9.15-1.el7.noarch
rhscon-ceph-0.0.43-1.el7scon.x86_64
rhscon-core-0.0.45-1.el7scon.x86_64
rhscon-core-selinux-0.0.45-1.el7scon.noarch
rhscon-ui-0.0.59-1.el7scon.noarch


Node:
calamari-server-1.4.8-1.el7cp.x86_64
ceph-base-10.2.2-41.el7cp.x86_64
ceph-common-10.2.2-41.el7cp.x86_64
ceph-mon-10.2.2-41.el7cp.x86_64
ceph-osd-10.2.2-41.el7cp.x86_64
ceph-selinux-10.2.2-41.el7cp.x86_64
libcephfs1-10.2.2-41.el7cp.x86_64
python-cephfs-10.2.2-41.el7cp.x86_64
rhscon-agent-0.0.19-1.el7scon.noarch
rhscon-core-selinux-0.0.45-1.el7scon.noarch


and it works as it is expected. --> Verified

Comment 34 Shubhendu Tripathi 2016-10-17 10:26:00 UTC

doc-text looks good.

Comment 35 errata-xmlrpc 2016-10-19 15:20:11 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2016:2082

Note You need to log in before you can comment on or make changes to this bug.