Bug 1314660 - File gets struck in hot tier after volume restart
File gets struck in hot tier after volume restart
Status: NEW
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: tier (Show other bugs)
3.1
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Bug Updates Notification Mailing List
nchilaka
: ZStream
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-03-04 03:09 EST by hari gowtham
Modified: 2017-03-25 12:26 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
sos report attached. (5.41 MB, application/x-xz)
2016-03-04 03:09 EST, hari gowtham
no flags Details

  None (edit)
Description hari gowtham 2016-03-04 03:09:12 EST
Created attachment 1133090 [details]
sos report attached.

Description of problem:
on a newly created tiered volume where demotion is taking place, the volume has to be stopped while the migration is being done and started so that the migration takes place. at this point one file might fail to get demoted. this file stays in the hot brick forever.

Version-Release number of selected component (if applicable):
3.1.2 
(has been found to be fixed in master i.e 3.7.8 and above)

How reproducible:
at least once in 3 times.

Steps to Reproduce:
1.create a tiered volume and mount it.
2.create a lot of files so that the demotion of these files will take a while
3.while demoting stop the volume.
4.start the volume (demotion also starts)
5.once the demotion is done check for the files. 
6. one file alone might end up on the hot tier

Actual results:
one file ends up on hot tier

Expected results:
file shouldn't be there on hot. it should get demoted.

Additional info:

set the tier mode to test and demote frequency low.

cluster.tier-demote-frequency: 10
cluster.tier-mode: test

i have pasted the script below which i used to come across this bug:

  1 #!bin/bash
  2 rm /home/md5sum/all_md5sum.txt                                                           
  3 glusterd
  4 gluster v create v1 10.70.42.193:/home/bricks/b1 10.70.42.193:/home/bricks/b2 force
  5 gluster v start v1
  6 gluster v tier v1 attach 10.70.42.193:/home/bricks/hb1 10.70.42.193:/home/bricks/hb2 force
  7 gluster v set v1 cluster.tier-mode test
  8 gluster v set v1 cluster.tier-demote-frequency 10
  9 mkdir /data/gluster/mount
 10 mount -t glusterfs 10.70.42.193:v1 /data/gluster/mount/
 11 crefi --single --random --min=1k --max=10k /data/gluster/mount/
 12 md5sum /data/gluster/mount/* > /home/md5sum/all_md5sum.txt
 13 gluster v rebal v1 status
 14 ls /home/bricks/*
 15 sleep 12
 16 ls /home/bricks/*
 17 ps aux | ag tier
 18 gluster v stop v1
 19 ls /home/bricks/*
 20 sleep 5
 21 gluster v start v1
 22 sleep 12
 23 ps aux | ag tier
 24 ls /home/bricks/*
 25 gluster v rebal v1 status
 26 md5sum -c /home/md5sum/all_md5sum.txt

The configure might vary as per one's setup. The md5sum is to check if the file was intact.(yes it is) 
I have used crefi to generate files of various file randomly(if if script is going to be used, then crefi has to be install)

once these steps are done. check the hot brick after a while(more then 10 seconds or more). It will have the file.

===============================================================================

Volume Name: v1
Type: Tier
Volume ID: 18bd9d73-1449-445f-98f4-5a9fcfd5150a
Status: Started
Number of Bricks: 4
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distribute
Number of Bricks: 2
Brick1: 10.70.42.193:/home/bricks/hb2
Brick2: 10.70.42.193:/home/bricks/hb1
Cold Tier:
Cold Tier Type : Distribute
Number of Bricks: 2
Brick3: 10.70.42.193:/home/bricks/b1
Brick4: 10.70.42.193:/home/bricks/b2
Options Reconfigured:
cluster.tier-demote-frequency: 10
cluster.tier-mode: test
features.ctr-enabled: on
performance.readdir-ahead: on

Note You need to log in before you can comment on or make changes to this bug.