1062401 – RFE: move code for directory tree setup on hcfs to standalone script

Bug 1062401 - RFE: move code for directory tree setup on hcfs to standalone script

Summary: RFE: move code for directory tree setup on hcfs to standalone script

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	rhs-hadoop-install
Sub Component:
Version:	unspecified
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	medium
Target Milestone:	---
Target Release:	RHGS 3.1.0
Assignee:	Jeff Vance
QA Contact:	BigData QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1057253 1202842
TreeView+	depends on / blocked

Reported:	2014-02-06 20:04 UTC by Martin Bukatovic
Modified:	2015-07-29 04:15 UTC (History)
CC List:	12 users (show)
Fixed In Version:	2.29-1
Doc Type:	Enhancement
Doc Text:	The previous HTB version of the scripts have been significantly rewritten to enhance the modularity and supportability. With some basic understanding of shell command syntax, you can use the auxiliary supporting scripts available at bin/add_dirs.sh and bin/gen_dirs.sh directories.
Clone Of:
Environment:
Last Closed:	2015-07-29 04:15:25 UTC
Embargoed:
Dependent Products:
Flags:	jvance: needinfo+

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1145625	1	None	None	None	2024-09-18 00:38:42 UTC
Red Hat Product Errata	RHEA-2015:1517	0	normal	SHIPPED_LIVE	Red Hat Storage Server 3.1 Hadoop plug-in enhancement	2015-07-29 08:15:17 UTC
Red Hat Product Errata	RHSA-2015:1495	0	normal	SHIPPED_LIVE	Important: Red Hat Gluster Storage 3.1 update	2015-07-29 08:26:26 UTC

Internal Links: 1145625

Description Martin Bukatovic 2014-02-06 20:04:20 UTC

Description
===========

This is a proposal for installer enthancement for GA

Currently installer sets up directory tree in hadoop glusterfs volume: making
directories and setting up their permissions during the installation process.

I propose that this code would be moved out into new script, which installer
will use. This will enable user to rerun this step to restore correct access
rights in case of problems. We may also consider to add a checking feature,
when instead of the setup, directories which have incorrect access rights or
are missing will be reported (not sure how much this check would be actually
useful though).

Additional information
======================

Just for reference, rhs-hadoop-install 0_62-1 sets the directories in the
following way:

~~~
chmod 0775 /mnt/glusterfs
chown yarn:hadoop /mnt/glusterfs
chmod 0770 /mnt/glusterfs/mapred
chown mapred:hadoop /mnt/glusterfs/mapred
chmod 0755 /mnt/glusterfs/mapred/system
chown mapred:hadoop /mnt/glusterfs/mapred/system
chmod 1777 /mnt/glusterfs/tmp
chown yarn:hadoop /mnt/glusterfs/tmp
chmod 0775 /mnt/glusterfs/user
chown yarn:hadoop /mnt/glusterfs/user
chmod 0755 /mnt/glusterfs/mr-history
chown yarn:hadoop /mnt/glusterfs/mr-history
chmod 1777 /mnt/glusterfs/tmp/logs
chown yarn:hadoop /mnt/glusterfs/tmp/logs
chmod 1777 /mnt/glusterfs/mr-history/tmp
chown yarn:hadoop /mnt/glusterfs/mr-history/tmp
chmod 0750 /mnt/glusterfs/mr-history/done
chown yarn:hadoop /mnt/glusterfs/mr-history/done
chmod 0770 /mnt/glusterfs/job-staging-yarn
chown yarn:hadoop /mnt/glusterfs/job-staging-yarn
chmod 1777 /mnt/glusterfs/app-logs
chown yarn:hadoop /mnt/glusterfs/app-logs
~~~

Comment 3 Jeff Vance 2014-02-18 19:39:31 UTC

There is a new --dirs option (a brick-dev is still a required arg to install) which executes only the code related to creating the hadoop related directories, eg. ./install --dirs <brick-dev>

Fixed in version 0.70, https://brewweb.devel.redhat.com//buildinfo?buildID=337259

Comment 4 Jeff Vance 2014-02-19 22:51:01 UTC

--dirs was changed to --mkdirs in version 0.72.

Comment 5 Martin Bukatovic 2014-03-25 17:59:36 UTC

The BigTop project has a similar tool written in groovy for this kind of task (directory initialization).

The script provision.groovy[1] creates a directory structure in hcfs based on file init-hcfs.json[2].

Which makes me think about reusing it. But since it's written in groovy (which uses hadoop hcfs api), the tool itself is not a good fit for our enviroment (we are not going to ship groovy just because of this tool, moreover we don't have to create directories via hcfs api because we have glusterfs mountpoint).

That said, we can still reuse the data format of this tool. It would have a benefit of easier reusing upstream's default configuration. In addition of that, it would be easier to check how our setup we differs from the setup of upstream (or of particular hadoop distribution): it would boil down into diffing 2  simple json files. Currently the inspection of our defaults is almost impossible as it's buried deep inside the shell installer.

It's also worth pointing out that the author of the upstream script is jvyas, so there should be no issues with using it for glusterfs.

With this in mind, I propose rewrite of the directory initialization, which is currently part of rhs-hadoop-install into separate script:

 * written in python (would make json processing straightforward, parsing json in shell script is not a good idea)
 * accessing gluster hadoop volume via mountpoint (posix api) or libgfapi (libgfapi would be more complicated, so I guess there is no reason of going this way unless there is a requirement for the script to work without the mountpoint)

So in the end, we would provide:

 * python directory setup script
 * json file with our defaults


[1] https://git-wip-us.apache.org/repos/asf?p=bigtop.git;a=blob;f=bigtop-packages/src/common/bigtop-utils/provision.groovy;h=df4cc1cfb5dfdba9a5b66d937a17296b7ed5a86a;hb=HEAD
[2] https://git-wip-us.apache.org/repos/asf?p=bigtop.git;a=blob;f=bigtop-packages/src/common/hadoop/init-hcfs.json;h=d8825aa33839b88d09ee928e0be83c46e88a2992;hb=HEAD

Comment 7 Jeff Vance 2014-04-08 17:00:57 UTC

Martin, does the (undocumented) --_hadoop-dirs option meet your needs here?

Comment 8 Martin Bukatovic 2014-05-13 07:22:35 UTC

No, I don't think that undocumented --_hadoop-dirs option is a good idea (see my
points above). Moreover I think there is a value in following the json
file format used by bigtop's provision.groovy script as described in comment 5.

Comment 10 Jeff Vance 2014-07-10 20:01:52 UTC

The GA version of the installer provides separate scripts (in bin/*) and addresses this request (specifically use bin/add_dirs.sh)

Comment 11 Daniel Horák 2014-09-17 09:10:18 UTC

I have to move this BZ back to ASSIGNED because of following reasons:

(In reply to Martin Bukatovic from comment #0)
> This will enable user to rerun this step to restore correct access
> rights in case of problems.
* if the bin/add_dirs.sh should be available for customers (see Martin's description above)
- the bin/add_dirs.sh should be documented in customer documentation with description how to use it and when it could help (for example in Brownfield setup on existing gluster volume)
- the bin/add_dirs.sh should have proper --help message (if it is designed to be directly used)


* the help message in comment at the beginning of the file says, that the first argument should be path (to volume or brick mount) and second argument is -d or -l:
> # Syntax:
> #  $1=distributed gluster mount (single) or brick mount(s) (per-node) path 
> #     (required).
> #  -d=output only the distributed dirs, skip local dirs.
> #  -l=output only the local dirs, skip distributed dirs.

But it doesn't work this way:
# bin/add_dirs.sh /mnt/glusterfs/HadoopVol1/ -d
  Syntax error: -d or -l options are required

You have to specify firstly the parameter -l or -d and than the path:
# bin/add_dirs.sh -d /mnt/glusterfs/HadoopVol1/ 
  /mnt/glusterfs/HadoopVol1//mapred          created/updated with perms 0770
  /mnt/glusterfs/HadoopVol1//mapred/system   created/updated with perms 0755
  /mnt/glusterfs/HadoopVol1//tmp             created/updated with perms 1777
  /mnt/glusterfs/HadoopVol1//user            created/updated with perms 0755
  /mnt/glusterfs/HadoopVol1//mr-history      created/updated with perms 0755
  /mnt/glusterfs/HadoopVol1//tmp/logs        created/updated with perms 1777
  /mnt/glusterfs/HadoopVol1//mr-history/tmp  created/updated with perms 1777
  /mnt/glusterfs/HadoopVol1//mr-history/done  created/updated with perms 0770
  /mnt/glusterfs/HadoopVol1//job-staging-yarn  created/updated with perms 0770
  /mnt/glusterfs/HadoopVol1//app-logs        created/updated with perms 1777
  /mnt/glusterfs/HadoopVol1//apps            created/updated with perms 0775
  /mnt/glusterfs/HadoopVol1//apps/webhcat    created/updated with perms 0775
  12 new Hadoop directories added/updated

Comment 14 Jeff Vance 2014-09-19 15:01:52 UTC

All that is needed to add the Hadoop distributed directories is:
  ssh storage-node1 "bin/add_dirs.sh -d <volmnt-prefix>/<volName>"
Eg:
  ssh node1.vm "bin/add_dirs.sh -d /mnt/glusterfs/HadoopVol;
                bin/add_dirs.sh -p /mnt/glusterfs/HadoopVol"

Note: add_dirs calls gen_dirs to produce the list of hadoop dirs. gen_dirs accepts multiple options (-d, -p, -l), whereas add_dirs does not. If this is considered a bug then please create a new BZ against add_dirs.

The above steps are simple and I do not see a reason to spend time to promote add_dirs to be a main script, especially since it is a side step that is not needed in the main workflow.

Comment 15 Martin Kudlej 2014-09-23 11:46:39 UTC

I think this script is required for Brown field installation because user should create all required directories with proper permissions.
This means that this script is for users and it should have proper help messages and should be documented - I've files bug 1145625 for documentation.

-->Assigned

Comment 16 Jeff Vance 2014-09-23 15:56:21 UTC

The anticipated brown-field behavior is for the customer to run the "main" setup_cluster.sh script against their existing storage cluster. setup_cluster detects that the pool already exists and that they are not adding new nodes to the pool. It will validate their cluster to be suitable for Hadoop workloads, making change where needed. It is true that the distributed dirs are not created by setup_cluster.sh. The distributed dirs are created by create_vol.sh and by enable_vol.sh (after the ambari install steps have been followed). Enable_vol will detect that the required distributed dirs are missing and report an error. Perhaps, it is more user-friendly to have enable_vol create the missing distributed dirs?

Comment 17 Martin Kudlej 2014-09-24 06:41:48 UTC

Yes, I think it is possible solution to run this script in enable_vol in case of error because of missing directories.

Comment 19 Jeff Vance 2014-11-12 16:49:48 UTC

As of version 2.35 the syntax of bin/add_dirs.sh changed. The -d|-l|-p option has been removed, and instead, add_dirs accepts the directory-prefix (as it always has) followed by a list of directory-tuples to add. A dir-tuple is of the form "<dirname>:<octal-perms>:<owner>".

Eg.
bin/add_dirs /mnt/glusterfs dir1:0755:foo dir1/dir2:0775:bar   ...
or
bin/add_dirs /mnt/brick1/hadoop $(bin/gen_dirs.sh -l)

Comment 20 Jeff Vance 2014-11-21 16:57:43 UTC

Just FYI: this has been fixed for a long time (way before the GA version of 2.29). I also thought that it had been verified by QE quite a while ago...

Comment 21 Martin Bukatovic 2014-11-24 17:12:55 UTC

Note: Brownfield usecase of bin/add_dirs.sh is now properly described in
RHS Install Guide (see BZ 1145625).

Comment 23 Martin Bukatovic 2015-06-04 16:01:34 UTC

Code which creates directories in hadoop volume has been moved into separate
script since version 2.29-1 (FiV of this BZ), but it was not designed as a
user facing tool as required in this BZ. Moreover any additional suggestions
like using json file format as described in comment 5 has not been implemented.

Even though directory creating script is still considered to be an internal
helper script only, rhs-hadoop admin deploying brownfield cluster needs to
use it and so it's covered in out current documentation. See:

https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3/html/Installation_Guide/sect-Installing_the_Hadoop_FileSystem_Plugin_for_Red_Hat_Storage1.html#Enabling_Existing_Volumes_for_use_with_Hadoop1

> # bin/add_dirs.sh /mnt/glusterfs/HadoopVol $(bin/gen_dirs.sh -d)
 
Based on these facts, I'm moving this BZ into VERIFIED state.

Comment 24 Jeff Vance 2015-07-06 22:23:27 UTC

This is documented in 8.2.5. Enabling Existing Volumes for use with Hadoop, section 3. See: https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3/html/Installation_Guide/sect-Installing_the_Hadoop_FileSystem_Plugin_for_Red_Hat_Storage1.html#Configuring_the_Trusted_Storage_Pool_for_use_with_Hadoop

Comment 25 Divya 2015-07-22 10:30:16 UTC

Hi,

Please review and sign off the edited doc text.

Comment 27 errata-xmlrpc 2015-07-29 04:15:25 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2015-1517.html

Note You need to log in before you can comment on or make changes to this bug.