1760399 – WORMed files couldn't be migrated during rebalancing

Bug 1760399 - WORMed files couldn't be migrated during rebalancing

Summary: WORMed files couldn't be migrated during rebalancing

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	distribute
Sub Component:
Version:	mainline
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Mohit Agrawal
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-10-10 12:59 UTC by david.spisla
Modified:	2019-10-21 15:07 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2019-10-12 03:03:47 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Worm_specific_patch (4.30 KB, patch) 2019-10-11 02:18 UTC, Mohit Agrawal	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Gluster.org Gerrit	23540	0	None	Abandoned	core(WIP): Restrict internal client based on pid Problem: To know about the source of fop request belongs to in...	2019-10-12 03:04:55 UTC

Description david.spisla 2019-10-10 12:59:19 UTC

Description of problem:
WORMed files couldn't be migrated during rebalancing 

Version-Release number of selected component (if applicable):
5.5

How reproducible:
Steps to Reproduce:
1. Create Replica 2 test volume and write some files to it
2. Set this files RO so they become WORMed
3. Add another Replica pair to the volume
4. Trigger rebalancing with data migration

Actual results:
The WORMed files couldn't be migrated to the new replica pair

Expected results:
The WORMed files could be migrated to the new replica pair

Additional info:
The WORM Xlator could check if the current client is an internal gluster process.
This already happens in some code lines ( see https://review.gluster.org/#/c/glusterfs/+/16661/ )

According to Mohammed Rafi KC it is probably not enough to check the PID in a simple way, since there could be a malicious client (FUSE process from an external server), which is choosing a PID on its own.

Comment 1 Mohit Agrawal 2019-10-11 02:18:42 UTC

Created attachment 1624571 [details]
Worm_specific_patch

Comment 2 Mohit Agrawal 2019-10-11 02:21:38 UTC

Hi,

   Yes he is right.To know about the client at server xlators we check the pid value if pid 
   is negative it means the fop request has come from an internal client otherwise request has
   come from an external client.    

    Below are the total defined pid's those we use as a internal pid

    >>>>>>>>>>>>>>>>>>>>>

    GF_CLIENT_PID_MAX = 0,
    GF_CLIENT_PID_GSYNCD = -1,
    GF_CLIENT_PID_HADOOP = -2,
    GF_CLIENT_PID_DEFRAG = -3,
    GF_CLIENT_PID_NO_ROOT_SQUASH = -4,
    GF_CLIENT_PID_QUOTA_MOUNT = -5,
    GF_CLIENT_PID_SELF_HEALD = -6,
    GF_CLIENT_PID_GLFS_HEAL = -7,
    GF_CLIENT_PID_BITD = -8,
    GF_CLIENT_PID_SCRUB = -9,
    GF_CLIENT_PID_TIER_DEFRAG = -10,
    GF_SERVER_PID_TRASH = -11,
    GF_CLIENT_PID_ADD_REPLICA_MOUNT = -12

    >>>>>>>>>>>>>>>>>>>>>>>>

    To avoid the client_pid assign to internal pid we have to put some conditional checks in 
    set_fuse_mount_options so that user would not be able to assign any defined internal 
    pid to the fuse process and on the server-side we need to validate pid value should be less than 0 
    and greater than -12.

    I have included a raw patch(specific to worm xlator only) but the same we need to 
    update in other xlator also.  I will upload the complete patch.

Thanks,
Mohit Agrawal

Comment 3 Mohit Agrawal 2019-10-11 06:04:35 UTC

The patch is posted to resolve the same
https://review.gluster.org/#/c/glusterfs/+/23540/

Comment 4 Mohit Agrawal 2019-10-12 03:02:44 UTC

Hi David,

  It is not a legitimate way to access volume through negative client-pid.
  By default mount.glustefs script does not pass any client-pid argument to the fuse process. 
  The fuse process internal fop request will not pass negative pid to the server unless the user 
  will not mount a volume without using mount command and pass the negative client-pid directly 
  to the glusterfs process so we think it is not a bug and there is no harm with
  this option. Multiple times we use negative PID to execute script-based file migration and 
  own testing purposes.

  To restrict malicious client access user can configure (auth.allow) without hurting any performance.


Regards,
Mohit Agrawal

Comment 5 Mohit Agrawal 2019-10-12 03:06:30 UTC

For more you can access the same discussion here
https://lists.gluster.org/pipermail/gluster-devel/2019-October/056616.html

Comment 6 david.spisla 2019-10-16 06:28:41 UTC

Hello Mohit,

I understand the reason why there is no need to implement more protection against an potential malicious client. 
But the main issue here is to ensure migration of WORMed files during full rebalancing. There is still no solution
for this. What do you think?

Regards
David Spisla

Comment 7 Mohit Agrawal 2019-10-16 08:37:58 UTC

Hi,

Yes wormed file should be move on newly added brick.
I have tried to reproduce the same on the below version

glusterfs-libs-5.5-1.el7.x86_64
glusterfs-fuse-5.5-1.el7.x86_64
glusterfs-devel-5.5-1.el7.x86_64
glusterfs-rdma-5.5-1.el7.x86_64
glusterfs-5.5-1.el7.x86_64
glusterfs-cli-5.5-1.el7.x86_64
glusterfs-api-devel-5.5-1.el7.x86_64
glusterfs-cloudsync-plugins-5.5-1.el7.x86_64
glusterfs-client-xlators-5.5-1.el7.x86_64
glusterfs-server-5.5-1.el7.x86_64
glusterfs-events-5.5-1.el7.x86_64
glusterfs-debuginfo-5.5-1.el7.x86_64
glusterfs-api-5.5-1.el7.x86_64
glusterfs-extra-xlators-5.5-1.el7.x86_64
glusterfs-geo-replication-5.5-1.el7.x86_64

Reproducer Steps:
 1) gluster v create test1 replica 3 10.74.251.224:/dist1/b{0..2} force
 2) gluster v set test1 features.worm-file-level on
 3) Mount the volume /mnt 
 4) Write the data 
    time for (( i=0 ; i<=10 ; i++ ));     do        dd if=/dev/urandom of=/mnt/file$i bs=1M count=100;        mkdir -p /mnt/dir$i/dir1/dir2/dir3/dir4/dir5/;     done
 5) Run add-brick
    gluster v add-brick test1 10.74.251.224:/dist2/b{0..2}
 6) Start rebalance 
    gluster v rebalance test1 start
    5 files are successfully transferred on dist2/b{0..2}
  
I am not able to reproduce the issue, please correct me if I have missed any steps to reproduce the same.
Can you please share the rebalance logs and confirm about the reproducer steps.In most of of the fops
in worm xlator it is checking if fop request has come from internal client then wind a fop to next xlator.
For some of the fops it is not checking the same, to confirm the same need rebalance logs along with reproducer steps.

Regards,
Mohit Agrawal

Comment 8 david.spisla 2019-10-16 09:19:23 UTC

Hello Mohit,

after creating the files they should be set Read-Only

Reproducer Steps:
 1) gluster v create test1 replica 3 10.74.251.224:/dist1/b{0..2} force
 2) gluster v set test1 features.worm-file-level on
 3) Mount the volume /mnt 
 4) Write the data 
    time for (( i=0 ; i<=10 ; i++ ));     do        dd if=/dev/urandom of=/mnt/file$i bs=1M count=100;        mkdir -p /mnt/dir$i/dir1/dir2/dir3/dir4/dir5/;     done
 5) for i in {0..10}; do chmod 444 file$i; done # Set files RO
 6) Run add-brick
    gluster v add-brick test1 10.74.251.224:/dist2/b{0..2}
 7) Start rebalance 
    gluster v rebalance test1 start
    5 files are successfully transferred on dist2/b{0..2}

Regards
David Spisla

Comment 9 Mohit Agrawal 2019-10-16 10:52:40 UTC

Hi David,

  Are you sure you are using the same gluster bits(glusterfs-5.5-1.el7.x86_64). 
  I am still not able to reproduce the issue.

  Can you share the rebalance logs for the same while rebalance is not working fine?

Thanks,
Mohit Agrawal

Comment 10 david.spisla 2019-10-16 10:57:49 UTC

Hello Mohit,

I will do so when I have some free minutes.

Comment 11 Mohit Agrawal 2019-10-16 11:44:35 UTC

Hi,

 Kindly share the volume option also along with volume topology to debug it more.

Thanks,
Mohit Agrawal

Comment 12 Mohit Agrawal 2019-10-21 14:43:20 UTC

Please share if you have any updates.

Thanks,
Mohit Agrawal

Comment 13 david.spisla 2019-10-21 14:52:25 UTC

Hello Mohit,

just now I did some observations and find out, that the reason for the bug is found in some custom changes, that we made to the WORM Xlator, so it is our own fault.

Regards
David Spisla

Comment 14 Mohit Agrawal 2019-10-21 14:56:13 UTC

Hi,

 Thanks for your update.

Regards,
Mohit Agrawal

Comment 15 david.spisla 2019-10-21 15:07:44 UTC

No problem. Sorry for the inconvenience!!!

Regards
David Spisla

Note You need to log in before you can comment on or make changes to this bug.