1289118 – database getting bloated, as hot bricks having persistent database entries even for files which fail to promote due to lack of hot disk space

Bug 1289118 - database getting bloated, as hot bricks having persistent database entries even for files which fail to promote due to lack of hot disk space

Summary: database getting bloated, as hot bricks having persistent database entries ev...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	tier
Sub Component:
Version:	rhgs-3.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	hari gowtham
QA Contact:	Nag Pavan Chilakam
Docs Contact:
URL:
Whiteboard:	tier-migration
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-12-07 13:31 UTC by Nag Pavan Chilakam
Modified:	2019-04-03 09:15 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-02-06 17:42:57 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Nag Pavan Chilakam 2015-12-07 13:31:12 UTC

Description of problem:
======================
when we create a file(s), once the hot tier is full, the files go to cold tier as expected, But during the next promote cycle, these files are trying to promote themselves but due to hot tier being full, they fail, which is still expected as below:
[2015-12-07 13:10:00.810603] E [MSGID: 109037] [tier.c:488:tier_migrate_using_query_file] 0-testvol-tier-dht: Failed to migrate badfile1 
 [No space left on device]


But this trying of promotion, is creating entries in the data base unnessarily as below:

coldbrick:
faeec535-d984-461b-8ba5-e06efb48d457|1449493741|486421|0|0|0|0|0|0|0|
faeec535-d984-461b-8ba5-e06efb48d457|00000000-0000-0000-0000-000000000001|badfile1|/badfile1|0|0


Hot brick:
faeec535-d984-461b-8ba5-e06efb48d457|1449493800|808914|0|0|0|0|0|0|1|1
faeec535-d984-461b-8ba5-e06efb48d457|00000000-0000-0000-0000-000000000001|badfile1|/badfile1|0|0


NOTE:This entry is not at all getting cleared. Tried even after freeing hot tier, but still never clears. The db entry is removed only once a promote and demote happens



Version-Release number of selected component (if applicable):
===================================
[root@zod ~]# rpm -qa|grep gluster
glusterfs-fuse-3.7.5-9.el7rhgs.x86_64
glusterfs-debuginfo-3.7.5-9.el7rhgs.x86_64
glusterfs-3.7.5-9.el7rhgs.x86_64
glusterfs-server-3.7.5-9.el7rhgs.x86_64
glusterfs-client-xlators-3.7.5-9.el7rhgs.x86_64
glusterfs-cli-3.7.5-9.el7rhgs.x86_64
glusterfs-libs-3.7.5-9.el7rhgs.x86_64
glusterfs-api-3.7.5-9.el7rhgs.x86_64



How reproducible:
=============
consistently

Steps to Reproduce:
==================
1.create a tiered volume.
2.Have the hot tier full, and such that new file creates go to cold tier
3.Now create a set of files ,say 5, and they will go to cold tier
4. Now in the next cycle, they will try to  promote to hot tier but fail due to hot tier full.
5. Now check the back end db and it can be seen that there is entry for the file in hot db too


brick log:
[2015-12-07 13:10:00.809294] E [MSGID: 113069] [posix.c:2399:posix_create] 0-testvol-posix: open on /dummy/brick105/testvol_hot/badfile1 failed [No space left on device]
[2015-12-07 13:10:00.809338] I [MSGID: 115071] [server-rpc-fops.c:1608:server_create_cbk] 0-testvol-server: 75475: CREATE /badfile1 (00000000-0000-0000-0000-000000000001/badfile1) ==> (No space left on device) [No space left on device]





[root@zod ~]# gluster v info testvol
 
Volume Name: testvol
Type: Tier
Volume ID: 3909b526-0ce5-44b3-b5a8-d4d2868e9db9
Status: Started
Number of Bricks: 6
Transport-type: tcp
Hot Tier :
Hot Tier Type : Replicate
Number of Bricks: 1 x 2 = 2
Brick1: yarrow:/dummy/brick105/testvol_hot
Brick2: zod:/dummy/brick105/testvol_hot
Cold Tier:
Cold Tier Type : Distributed-Replicate
Number of Bricks: 2 x 2 = 4
Brick3: zod:/dummy/brick102/testvol
Brick4: yarrow:/dummy/brick102/testvol
Brick5: zod:/dummy/brick101/testvol
Brick6: yarrow:/dummy/brick101/testvol
Options Reconfigured:
cluster.tier-demote-frequency: 180
cluster.tier-mode: test
features.ctr-enabled: on
performance.readdir-ahead: on

Comment 5 Shyamsundar 2018-02-06 17:42:57 UTC

Thank you for your bug report.

We are no longer working on any improvements for Tier. This bug will be set to CLOSED WONTFIX to reflect this. Please reopen if the rfe is deemed critical.

Note You need to log in before you can comment on or make changes to this bug.