Bug 1288051 - [Tier]: Following volume restart, tierd shows failure at status on some nodes
[Tier]: Following volume restart, tierd shows failure at status on some nodes
Status: POST
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: tier (Show other bugs)
3.1
x86_64 Linux
unspecified Severity urgent
: ---
: ---
Assigned To: hari gowtham
nchilaka
tier-glusterd
: ZStream
Depends On:
Blocks: 1276245 1315659 1318498
  Show dependency treegraph
 
Reported: 2015-12-03 06:44 EST by Rahul Hinduja
Modified: 2017-03-25 12:26 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1315659 (view as bug list)
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Rahul Hinduja 2015-12-03 06:44:56 EST
Description of problem:
=======================

Tierd is running but the status on most of the nodes in cluster is shown as failed after stopping and starting volume. 

For example: Tier process on localhost;

[root@dhcp37-165 glusterfs]# ps aux | grep tier
root     12829 74.6 71.2 4987944 2765140 ?     Ssl  Dec01 2046:18 /usr/sbin/glusterfs -s localhost --volfile-id rebalance/tiervolume --xlator-option *dht.use-readdirp=yes --xlator-option *dht.lookup-unhashed=yes --xlator-option *dht.assert-no-child-down=yes --xlator-option *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off --xlator-option *dht.readdir-optimize=on --xlator-option *tier-dht.xattr-name=trusted.tier.tier-dht --xlator-option *dht.rebalance-cmd=6 --xlator-option *dht.node-uuid=fb12984d-c631-4364-bf4e-aa91f9ea76fb --xlator-option *dht.commit-hash=3001417369 --socket-file /var/run/gluster/gluster-tier-bdb6ee8c-4410-4f0f-8714-8d4a3ff5812c.sock --pid-file /var/lib/glusterd/vols/tiervolume/tier/fb12984d-c631-4364-bf4e-aa91f9ea76fb.pid -l /var/log/glusterfs/tiervolume-tier.log
root     16484  7.8  1.1 1442396 46492 ?       Ssl  00:35   0:35 /usr/sbin/glusterfsd -s 10.70.37.165 --volfile-id tiervolume.10.70.37.165.rhs-brick3-tiervolume_hot -p /var/lib/glusterd/vols/tiervolume/run/10.70.37.165-rhs-brick3-tiervolume_hot.pid -S /var/run/gluster/4f46770e383fab1ee7789ff7a656a342.socket --brick-name /rhs/brick3/tiervolume_hot -l /var/log/glusterfs/bricks/rhs-brick3-tiervolume_hot.log --xlator-option *-posix.glusterd-uuid=fb12984d-c631-4364-bf4e-aa91f9ea76fb --brick-port 49153 --xlator-option tiervolume-server.listen-port=49153
root     16503 47.9  2.3 1577596 89472 ?       Ssl  00:35   3:35 /usr/sbin/glusterfsd -s 10.70.37.165 --volfile-id tiervolume.10.70.37.165.rhs-brick1-tiervolume_ct-disp1 -p /var/lib/glusterd/vols/tiervolume/run/10.70.37.165-rhs-brick1-tiervolume_ct-disp1.pid -S /var/run/gluster/4cdd38c5ea86fe823baaa5dcde1b4b57.socket --brick-name /rhs/brick1/tiervolume_ct-disp1 -l /var/log/glusterfs/bricks/rhs-brick1-tiervolume_ct-disp1.log --xlator-option *-posix.glusterd-uuid=fb12984d-c631-4364-bf4e-aa91f9ea76fb --brick-port 49152 --xlator-option tiervolume-server.listen-port=49152
root     16655  0.0  0.0 112648   956 pts/0    S+   00:42   0:00 grep --color=auto tier
[root@dhcp37-165 glusterfs]# 

Two for each brick and one for tierd. 

Tier status is shown as follows:
================================

[root@dhcp37-165 glusterfs]# gluster volume rebal tiervolume status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost              780        0Bytes       1161977             0             0               failed          164607.00
                            10.70.37.133             9725        0Bytes         11176           307             0               failed          164423.00
                            10.70.37.160                0        0Bytes             0             0             0          in progress             482.00
                            10.70.37.158             9791        0Bytes         11031             1             0               failed          164370.00
                            10.70.37.110              606        0Bytes       1194893             0             0               failed          164242.00
                            10.70.37.155                0        0Bytes             0             0             0          in progress             482.00
                             10.70.37.99              833        0Bytes       1673312             0             0               failed          164607.00
                             10.70.37.88             9790        0Bytes         11524             1             0               failed          164291.00
                            10.70.37.112                0        0Bytes             0             0             0          in progress             482.00
                            10.70.37.199             9839        0Bytes         11836           172             0               failed          164285.00
                            10.70.37.162                0        0Bytes             0             0             0          in progress             482.00
                             10.70.37.87             9885        0Bytes         12501           127             0               failed          164225.00
volume rebalance: tiervolume: success
[root@dhcp37-165 glusterfs]# 


Logs report following:
======================

[2015-12-02 19:04:26.687972] E [MSGID: 114058] [client-handshake.c:1524:client_query_portmap_cbk] 0-tiervolume-client-13: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.
[2015-12-02 19:04:26.688162] I [MSGID: 114018] [client.c:2042:client_rpc_notify] 0-tiervolume-client-13: disconnected from tiervolume-client-13. Client process will keep trying to connect to glusterd until brick's port is available
[2015-12-02 19:04:26.690100] E [MSGID: 114058] [client-handshake.c:1524:client_query_portmap_cbk] 0-tiervolume-client-15: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.
[2015-12-02 19:04:26.690220] I [MSGID: 114018] [client.c:2042:client_rpc_notify] 0-tiervolume-client-15: disconnected from tiervolume-client-15. Client process will keep trying to connect to glusterd until brick's port is available
[2015-12-02 19:04:26.693118] E [MSGID: 114058] [client-handshake.c:1524:client_query_portmap_cbk] 0-tiervolume-client-19: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.
[2015-12-02 19:04:26.693223] I [MSGID: 114018] [client.c:2042:client_rpc_notify] 0-tiervolume-client-19: disconnected from tiervolume-client-19. Client process will keep trying to connect to glusterd until brick's port is available
[2015-12-02 19:04:26.695809] E [MSGID: 114058] [client-handshake.c:1524:client_query_portmap_cbk] 0-tiervolume-client-7: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.
[2015-12-02 19:04:26.695914] I [MSGID: 114018] [client.c:2042:client_rpc_notify] 0-tiervolume-client-7: disconnected from tiervolume-client-7. Client process will keep trying to connect to glusterd until brick's port is available
[2015-12-02 19:04:26.698311] E [MSGID: 114058] [client-handshake.c:1524:client_query_portmap_cbk] 0-tiervolume-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.
[2015-12-02 19:04:26.698413] I [MSGID: 114018] [client.c:2042:client_rpc_notify] 0-tiervolume-client-1: disconnected from tiervolume-client-1. Client process will keep trying to connect to glusterd until brick's port is available



Version-Release number of selected component (if applicable):
=============================================================

glusterfs-3.7.5-8.el7rhgs.x86_64

How reproducible:
=================

1/1

Steps carried:
==============
1. 12 node cluster
2. Hot tier {6x2} , Cold tier {2x(4=2)}
3. Mounted the volume on 7.2,7.1 and 6.7 clients
4. Huge set of data is created on volume {148GB}
5. Stopped the volume {No data creation or IO was in progress at this time}
6. Started the volume
Comment 4 Mohammed Rafi KC 2015-12-04 08:39:41 EST
Partial RCA:

Tier will start only when all the child's are up for dht. So in this case tier process started running after a child. And then it started promoting/demoting threads in first cycle. During this time network got disconnected and child went down, When one child is down tier/rebalance process will die after completing it is current thread. But it immediately update the status as failed. That's why it is still showing as failed though it is running. After completing the spawned thread it will die immediately.

What we are not RCA'ed, why network connection interrupted. It might be because of the memory leak issue.
Comment 5 hari gowtham 2016-02-26 00:32:42 EST
The readv fails which disconnects the client from glusterd. readv could fail when the connection is down, which brings the child down and fails the rebalance. As the client is not able to communicate with the glusterd, the status is marked as failed.
Comment 6 Mohammed Rafi KC 2016-02-26 05:15:20 EST
RCA:

When we do a volume stop, we will get a child down event and we will mark the process as failed. But if we are under migration we will wait until we finish them.

So in case of tier daemon, If there are large number of files to migrate, the migration thread will take some time to finish even if it is marked for killing. Once it finishes the list, the process will be killed.

If we are again starting the process before it dies, it will try to start the tier daemon again, but here in this case it is still running, so it will just skip the starting of daemon.

But the process will die after the migration thread returns.
Comment 7 hari gowtham 2016-03-01 04:00:34 EST
The tiered volume was brought down while it was middle of the migration.
And brought back up before all the files that were marked for migration were migrated. 

Once the volume is back up, the migration continues. The status still remains correct ( in-progress )

couldn't reproduce the issue. Still there was another problem where the file which was being migrated during volume stop, remains in the hot tier. It does move to the cold tier.

The md5sum of the file is ok. No issues with that other than it not getting demoted.

The bug is not reproducible.
Comment 8 Mohammed Rafi KC 2016-03-04 01:17:23 EST
Here what we need to make sure is, there should be large number files to migrate in one cycle.

1) Create large number of files.
2) Make all of them promote or demote in a single cycle.
3) Just when the cycle starts with a huge list stop the volume and start the volume

If that doesn't work, I think we can reproduce using gdb with little change in code.
Comment 11 hari gowtham 2016-08-09 11:23:10 EDT
the fix on master: http://review.gluster.org/#/c/13646/

Is in 3.8 through rebase.

Note You need to log in before you can comment on or make changes to this bug.