Bug 1229233
Summary: | Data Tiering:3.7.0:data loss:detach-tier not flushing data to cold-tier | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Nag Pavan Chilakam <nchilaka> | ||||||||||
Component: | tier | Assignee: | Dan Lambright <dlambrig> | ||||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Nag Pavan Chilakam <nchilaka> | ||||||||||
Severity: | urgent | Docs Contact: | |||||||||||
Priority: | urgent | ||||||||||||
Version: | rhgs-3.1 | CC: | asrivast, bugs, dlambrig, gluster-bugs, josferna, ndevos, nsathyan, rhs-bugs, storage-qa-internal, trao | ||||||||||
Target Milestone: | --- | Keywords: | Triaged, ZStream | ||||||||||
Target Release: | --- | ||||||||||||
Hardware: | Unspecified | ||||||||||||
OS: | Linux | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | 1205540 | Environment: | |||||||||||
Last Closed: | 2015-10-30 12:39:57 UTC | Type: | Bug | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Bug Depends On: | 1205540 | ||||||||||||
Bug Blocks: | 1186580, 1202842, 1219513, 1220047 | ||||||||||||
Attachments: |
|
Description
Nag Pavan Chilakam
2015-06-08 10:15:15 UTC
Created attachment 1044337 [details]
server#1 logs sosreports failed_qa
Created attachment 1044339 [details]
server#2 logs sosreports failed_qa
Created attachment 1044340 [details]
server#3 logs sosreports failed_qa
Created attachment 1044341 [details]
problem 1 console logs failed qa
Moving the bug to failed QA due to below
=====Problem 1===(refer attachment 1044341 [details])
1) had a setup with 3 nodes, A(tettnang), B(zod) and C(yarrow)
2)Now created a 2x2 dist-rep volume with bricks belonging to only node B and C
[root@tettnang ~]# gluster v info v1
Volume Name: v1
Type: Distributed-Replicate
Volume ID: acd70756-8a8c-4cd9-a4c4-b5cc4bfad8ee
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: zod:/rhs/brick1/v1
Brick2: yarrow:/rhs/brick1/v1
Brick3: zod:/rhs/brick2/v1
Brick4: yarrow:/rhs/brick2/v1
Options Reconfigured:
performance.readdir-ahead: on
[root@tettnang ~]# gluster v status v1
Status of volume: v1
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick zod:/rhs/brick1/v1 49160 0 Y 26484
Brick yarrow:/rhs/brick1/v1 49159 0 Y 11082
Brick zod:/rhs/brick2/v1 49161 0 Y 26504
Brick yarrow:/rhs/brick2/v1 49160 0 Y 11100
NFS Server on localhost N/A N/A N N/A
Self-heal Daemon on localhost N/A N/A Y 21765
NFS Server on yarrow N/A N/A N N/A
Self-heal Daemon on yarrow N/A N/A Y 11130
NFS Server on zod N/A N/A N N/A
Self-heal Daemon on zod N/A N/A Y 26548
Task Status of Volume v1
------------------------------------------------------------------------------
There are no active volume tasks
3)attached hot tier pure distribute again with brick belonging to only node B and C
4)now created some files on the mount point (fuse mount)
6)I did a detach-tier start. On detach-tier start, i check the backend bricks and found that link files were created on the cold tier with the hot tier still having the cached file contents
7)now i did a detach-tier commit, and the commit passed.
But the files even though still existed on the mount(which means the files are migrated to the cold, atleast the same filenames are created due to T files)
But the file contents are missing.
I check the backend bricks and found that if i read the cold brick files(T files) the content is missing
But on hot bricks, the file contents remain, which means the data of files are not getting flushed.
Note: I have done a read or access of another file after detach start but before commit, that file has its contents to moved to cold brick.
-====Problem2 ===
On same setup I created a tier volume with distribute over dist-rep, but this time used all nodes.
On detach i got following error
2)[root@tettnang ~]# gluster v detach-tier v2 start
volume detach-tier start: failed: Bricks not from same subvol for distribute
[2015-06-29 11:21:40.950707] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/54cb1d4c2770a75d3e2bccd62ecdecc8.socket failed (Invalid argument)
[2015-06-29 11:21:41.487010] I [MSGID: 106484] [glusterd-brick-ops.c:819:__glusterd_handle_remove_brick] 0-management: Received rem brick req
[2015-06-29 11:21:41.494410] E [MSGID: 106265] [glusterd-brick-ops.c:1063:__glusterd_handle_remove_brick] 0-management: Bricks not from same subvol for distribute
[2015-06-29 11:21:43.951084] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/54cb1d4c2770a75d3e2bccd62ecdecc8.socket failed (Invalid argument)
[2015-06-29 11:21:46.951408] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/54cb1d4c2770a75d3e2bccd62ecdecc8.socket failed (Invalid argument)
[2015-06-29 11:21:49.951739] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/54cb1d4c2770a75d3e2bccd62ecdecc8.socket failed (Invalid argument)
[2015-06-29 11:21:52.952016] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/54cb1d4c2770a75d3e2bccd62ecdecc8.socket failed (Invalid argument)
[2015-06-29 11:21:55.952298] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/54cb1d4c2770a75d3e2bccd62ecdecc8.socket failed (Invalid argument)
[2015-06-29 11:21:58.444632] E [MSGID: 106301] [glusterd-op-sm.c:4043:glusterd_op_ac_send_stage_op] 0-management: Staging of operation 'Volume Rebalance' failed on localhost : Detach-tier not started.
I Have tried the validation on a tier volume with hot layer as distribute and cold as disperse. The data gets flushed but i see following errors in rebalance log of the volume [2015-06-29 09:42:56.872045] E [MSGID: 109023] [dht-rebalance.c:553:__dht_rebalance_create_dst_file] 0-vol1-tier-dht: ftruncate failed for /coldir/hotf on vol1-cold-dht (Input/output error) [2015-06-29 09:42:56.872082] E [MSGID: 108008] [afr-transaction.c:1984:afr_transaction] 0-vol1-cold-replicate-0: Failing FSETATTR on gfid 00000000-0000-0000-0000-000000000000: split-brain observed. [Input/output error] [2015-06-29 09:42:56.872342] E [MSGID: 109023] [dht-rebalance.c:562:__dht_rebalance_create_dst_file] 0-vol1-tier-dht: chown failed for /coldir/hotf on vol1-cold-dht (Input/output error) [2015-06-29 09:42:56.875000] E [MSGID: 109039] [dht-helper.c:1162:dht_rebalance_inprogress_task] 0-vol1-hot-dht: /coldir/hotf: failed to get the 'linkto' xattr [No data available] [2015-06-29 09:42:56.875321] E [MSGID: 109023] [dht-rebalance.c:792:__dht_rebalance_open_src_file] 0-vol1-tier-dht: failed to set xattr on /coldir/hotf in vol1-hot-dht (Invalid argument) [2015-06-29 09:42:56.875335] E [MSGID: 109023] [dht-rebalance.c:1098:dht_migrate_file] 0-vol1-tier-dht: Migrate file failed: failed to open /coldir/hotf on vol1-hot-dht [2015-06-29 09:42:56.875794] I [MSGID: 109028] [dht-rebalance.c:3029:gf_defrag_status_get] 0-vol1-tier-dht: Rebalance is completed. Time taken is 0.00 secs [2015-06-29 09:42:56.875816] I [MSGID: 109028] [dht-rebalance.c:3033:gf_defrag_status_get] 0-vol1-tier-dht: Files migrated: 0, size: 0, lookups: 9, failures: 0, skipped: 3 Failing to set an xattr seems to be valid, as ecvolume doesnt have hashranges, I suppose. From a sanity perspective the ecvol data flushing seems to work *** Bug 1227485 has been marked as a duplicate of this bug. *** |