1334262 – [Tiering]: Handling of inconsistent state in case of timeout during tier attach

Bug 1334262 - [Tiering]: Handling of inconsistent state in case of timeout during tier attach

Summary: [Tiering]: Handling of inconsistent state in case of timeout during tier attach

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	tier
Sub Component:
Version:	rhgs-3.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	hari gowtham
QA Contact:	Nag Pavan Chilakam
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1311843
TreeView+	depends on / blocked

Reported:	2016-05-09 09:48 UTC by Sweta Anandpara
Modified:	2018-11-08 19:03 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	Known Issue
Doc Text:	If the gluster volume tier attach" command times out, it could result in either of two situations. Either the volume does not become a tiered volume, or the tier daemon is not started. Workaround: When the timeout is observed, follow these steps: Check if the volume has become a tiered volume. If not, then rerun attach tier. If it has, then proceed with the next step. Check if the tier daemons were created on each server. If the tier daemons were not created, then execute the following command: gluster volume tier <volname> start
Clone Of:
Environment:
Last Closed:	2018-11-08 19:03:56 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Sweta Anandpara 2016-05-09 09:48:45 UTC

Description of problem:
========================

Had a 4node cluster, and created a 2*(4+2) volume. Ran load over two nfs mounted clients. When that was in progress, tried to attach a 4*1 distribute volume as hot tier and that timed out. This resulted in the volume landing in an inconsistent state where tier attach had successfully completed, but tier start had not taken place. In other words, tier process had not started, nor was there a <volname>-tier.log file created in the /var/log/glusterfs/ folder. 
'gluster v info' showed a healthy state of tier volume, with cold and hot bricks running. 

On discussion with Rafi, whenever we do a tier-attach, stage 1 is to send a request to add hot tier brick to every node concerned, and on successful completion of that, stage 2 is to 'start' the tier process. However if a request times out in the middle of stage1, between stage1 and 2, or in the middle of stage2 , there is no rewind/rollback that takes place. This leaves us with a volume for which we have no definite clue of how functional it is. 

In my case, where stage1 looked to be successfully completed, the workaround was fairly simple, to do 'gluster v tier start force'. On doing that, I see these errors popping up in the window (pasted below) - which still gives a feeling all is not well with my tier volume. 

We need a way to work out a process/to-do-steps to recover, if we land up in such (or worse) a state. 

[root@dhcp47-64 ~]# 
Broadcast message from systemd-journald.eng.blr.redhat.com (Mon 2016-05-09 14:47:26 IST):

bricks-brick4-nash_tier[1155]: [2016-05-09 09:17:26.799349] M [MSGID: 113075] [posix-helpers.c:1845:posix_health_check_thread_proc] 0-nash-posix: health-check failed, going down

Message from syslogd@dhcp47-64 at May  9 14:47:26 ...
 bricks-brick4-nash_tier[1155]:[2016-05-09 09:17:26.799349] M [MSGID: 113075] [posix-helpers.c:1845:posix_health_check_thread_proc] 0-nash-posix: health-check failed, going down

Broadcast message from systemd-journald.eng.blr.redhat.com (Mon 2016-05-09 14:47:56 IST):

bricks-brick4-nash_tier[1155]: [2016-05-09 09:17:56.800744] M [MSGID: 113075] [posix-helpers.c:1851:posix_health_check_thread_proc] 0-nash-posix: still alive! -> SIGTERM

Message from syslogd@dhcp47-64 at May  9 14:47:56 ...
 bricks-brick4-nash_tier[1155]:[2016-05-09 09:17:56.800744] M [MSGID: 113075] [posix-helpers.c:1851:posix_health_check_thread_proc] 0-nash-posix: still alive! -> SIGTERM

[root@dhcp47-64 ~]# 


Version-Release number of selected component (if applicable):
============================================================
3.7.9-3


How reproducible: Hit it once
=================


Additional info:
===============

[root@dhcp47-64 ~]# gluster v tier nash attach 10.70.47.64:/bricks/brick4/nash_tier 10.70.46.33:/bricks/brick4/nash_tier 10.70.46.121:/bricks/brick4/nash_tier 10.70.47.190:/bricks/brick4/nash_tier
Error : Request timed out

Tier command failed
[root@dhcp47-64 ~]#
[root@dhcp47-64 ~]#
[root@dhcp47-64 ~]# gluster v tier nash attach 10.70.47.64:/bricks/brick4/nash_tier 10.70.46.33:/bricks/brick4/nash_tier 10.70.46.121:/bricks/brick4/nash_tier 10.70.47.190:/bricks/brick4/nash_tier
volume attach-tier: failed: Volume nash is already a tier.

Tier command failed

[root@dhcp47-64 ~]# 
[root@dhcp47-64 ~]# gluster  v info 
 
Volume Name: nash
Type: Tier
Volume ID: 16f0b5a8-913b-42d1-b3a7-e3e9344f5535
Status: Started
Number of Bricks: 16
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distribute
Number of Bricks: 4
Brick1: 10.70.47.190:/bricks/brick4/nash_tier
Brick2: 10.70.46.121:/bricks/brick4/nash_tier
Brick3: 10.70.46.33:/bricks/brick4/nash_tier
Brick4: 10.70.47.64:/bricks/brick4/nash_tier
Cold Tier:
Cold Tier Type : Distributed-Disperse
Number of Bricks: 2 x (4 + 2) = 12
Brick5: 10.70.47.64:/bricks/brick1/nash
Brick6: 10.70.46.121:/bricks/brick1/nash
Brick7: 10.70.46.33:/bricks/brick1/nash
Brick8: 10.70.47.190:/bricks/brick1/nash
Brick9: 10.70.47.64:/bricks/brick2/nash
Brick10: 10.70.46.121:/bricks/brick2/nash
Brick11: 10.70.46.33:/bricks/brick2/nash
Brick12: 10.70.47.190:/bricks/brick2/nash
Brick13: 10.70.47.64:/bricks/brick3/nash
Brick14: 10.70.46.121:/bricks/brick3/nash
Brick15: 10.70.46.33:/bricks/brick3/nash
Brick16: 10.70.47.190:/bricks/brick3/nash
Options Reconfigured:
cluster.tier-mode: cache
features.ctr-enabled: on
performance.readdir-ahead: on
[root@dhcp47-64 ~]# 
[root@dhcp47-64 ~]# 
[root@dhcp47-64 ~]# gluster v tier nash start 
Tiering Migration Functionality: nash: success: Attach tier is successful on nash. use tier status to check the status.
ID: 8950b59b-b423-4d25-911d-0a0eb7c65dce

[root@dhcp47-64 ~]# ps -ef | grep tier
root      1155     1  0 10:31 ?        00:00:58 /usr/sbin/glusterfsd -s 10.70.47.64 --volfile-id nash.10.70.47.64.bricks-brick4-nash_tier -p /var/lib/glusterd/vols/nash/run/10.70.47.64-bricks-brick4-nash_tier.pid -S /var/run/gluster/b93a3815e235a7bab53b4d2d1e796a83.socket --brick-name /bricks/brick4/nash_tier -l /var/log/glusterfs/bricks/bricks-brick4-nash_tier.log --xlator-option *-posix.glusterd-uuid=a34abfd0-300d-4d57-a047-8550c10acec8 --brick-port 49155 --xlator-option nash-server.listen-port=49155
root      5710     1 12 14:41 ?        00:00:02 /usr/sbin/glusterfs -s localhost --volfile-id rebalance/nash --xlator-option *dht.use-readdirp=yes --xlator-option *dht.lookup-unhashed=yes --xlator-option *dht.assert-no-child-down=yes --xlator-option *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off --xlator-option *dht.readdir-optimize=on --xlator-option *tier-dht.xattr-name=trusted.tier.tier-dht --xlator-option *dht.rebalance-cmd=6 --xlator-option *dht.node-uuid=a34abfd0-300d-4d57-a047-8550c10acec8 --xlator-option *dht.commit-hash=3112346234 --socket-file /var/run/gluster/gluster-tier-16f0b5a8-913b-42d1-b3a7-e3e9344f5535.sock --pid-file /var/lib/glusterd/vols/nash/tier/a34abfd0-300d-4d57-a047-8550c10acec8.pid -l /var/log/glusterfs/nash-tier.log
root      5733  5605  0 14:42 pts/0    00:00:00 grep --color=auto tier
[root@dhcp47-64 ~]# 
[root@dhcp47-64 ~]# 
[root@dhcp47-64 ~]# 
Broadcast message from systemd-journald.eng.blr.redhat.com (Mon 2016-05-09 14:47:26 IST):

bricks-brick4-nash_tier[1155]: [2016-05-09 09:17:26.799349] M [MSGID: 113075] [posix-helpers.c:1845:posix_health_check_thread_proc] 0-nash-posix: health-check failed, going down


Message from syslogd@dhcp47-64 at May  9 14:47:26 ...
 bricks-brick4-nash_tier[1155]:[2016-05-09 09:17:26.799349] M [MSGID: 113075] [posix-helpers.c:1845:posix_health_check_thread_proc] 0-nash-posix: health-check failed, going down

Broadcast message from systemd-journald.eng.blr.redhat.com (Mon 2016-05-09 14:47:56 IST):

bricks-brick4-nash_tier[1155]: [2016-05-09 09:17:56.800744] M [MSGID: 113075] [posix-helpers.c:1851:posix_health_check_thread_proc] 0-nash-posix: still alive! -> SIGTERM


Message from syslogd@dhcp47-64 at May  9 14:47:56 ...
 bricks-brick4-nash_tier[1155]:[2016-05-09 09:17:56.800744] M [MSGID: 113075] [posix-helpers.c:1851:posix_health_check_thread_proc] 0-nash-posix: still alive! -> SIGTERM

[root@dhcp47-64 ~]#
[root@dhcp47-64 ~]# rpm -qa | grep gluster
glusterfs-client-xlators-3.7.9-3.el7rhgs.x86_64
glusterfs-server-3.7.9-3.el7rhgs.x86_64
gluster-nagios-addons-0.2.6-1.el7rhgs.x86_64
python-gluster-3.7.5-19.el7rhgs.noarch
vdsm-gluster-4.16.30-1.3.el7rhgs.noarch
glusterfs-3.7.9-3.el7rhgs.x86_64
glusterfs-api-3.7.9-3.el7rhgs.x86_64
glusterfs-cli-3.7.9-3.el7rhgs.x86_64
glusterfs-geo-replication-3.7.9-3.el7rhgs.x86_64
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
glusterfs-libs-3.7.9-3.el7rhgs.x86_64
glusterfs-fuse-3.7.9-3.el7rhgs.x86_64
glusterfs-rdma-3.7.9-3.el7rhgs.x86_64
[root@dhcp47-64 ~]# 
[root@dhcp47-64 ~]#

Comment 1 Sweta Anandpara 2016-05-09 09:51:47 UTC

Please ignore the SIGTERM messages. That happened with one of the bricks getting deleted, by mistake.

Comment 6 Dan Lambright 2016-05-16 16:43:12 UTC

Recovery steps: discussed and reviewed with glusterd engineering after recreating the problem. The hot tier would have been attached on either all nodes or none of them. 

On seeing a timeout:

1.  Check if the graph has become a tiered volume.
1a. if not, rerun attach tier.
1b. if it has, goto step 2.

2.  Check if the rebalance daemons were created on each server.
2b. If rebalance daemons were not created, run gluster tier <vol> start

Comment 11 hari gowtham 2018-11-08 19:03:56 UTC

As tier is not being actively developed, I'm closing this bug. Feel free to open it if necessary.

Note You need to log in before you can comment on or make changes to this bug.