Bug 1316233 - openshift3/node unable to format EBS volumes with error "mkfs.ext4 executable file not found in $PATH"
Summary: openshift3/node unable to format EBS volumes with error "mkfs.ext4 executable...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 3.1.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Scott Dodson
QA Contact: Jianwei Hou
URL:
Whiteboard:
Depends On:
Blocks: 1267746
TreeView+ depends on / blocked
 
Reported: 2016-03-09 18:08 UTC by Eric Rich
Modified: 2019-10-10 11:29 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-05-12 16:32:09 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2016:1064 0 normal SHIPPED_LIVE Important: Red Hat OpenShift Enterprise 3.2 security, bug fix, and enhancement update 2016-05-12 20:19:17 UTC

Description Eric Rich 2016-03-09 18:08:17 UTC
Description of problem:

Per our documentation, when creating a PV it should be formated with a define file system: 

https://docs.openshift.com/enterprise/3.1/install_config/persistent_storage/persistent_storage_aws.html#volume-format-aws

When running a node in a container format (on atomic) you can not format the volume because the image does not contain the EXT4 binaries. 

Version-Release number of selected component (if applicable): 3.1

How reproducible: 100%

Steps to Reproduce:


### EXT4 ###
apiVersion: "v1"
kind: "PersistentVolume"
metadata:
  name: "pvebs"
spec:
  capacity:
    storage: "5Gi"
  accessModes:
    - "ReadWriteOnce"
  awsElasticBlockStore:
    fsType: "ext4"
    volumeID: "vol-10e8f9c9"
###########

### CLAIM ###
apiVersion: "v1"
kind: "PersistentVolumeClaim"
metadata:
  name: "claimebs"
spec:
  accessModes:
    - "ReadWriteOnce"
  resources:
    requests:
      storage: "5Gi"
###########

Our test template : https://paste.netwiki.fr/?3c5b35dcbc0b1b29#cD0N7Fz/tV3WhezMmOR+BJGANJ0UQA/eYmo0wKIFD9c=
Based on https://github.com/openshift/training/pull/373

Actual results:

The formating of the volume fails with:

Unable to mount volumes for pod "frontend-1-s3y3z_flo": Timeout waiting for volume state
  1m            1m              1       {kubelet ip-10-135-241-229.eu-central-1.compute.internal}                       FailedSync              Error syncing pod, skipping: Timeout waiting for volume state
  1m            2s              12      {kubelet ip-10-135-241-229.eu-central-1.compute.internal}                       FailedMount             Unable to mount volumes for pod "frontend-1-s3y3z_flo": exec: "mkfs.ext4": executable file not found in $PATH
  1m            2s              12      {kubelet ip-10-135-241-229.eu-central-1.compute.internal}                       FailedSync              Error syncing pod, skipping: exec: "mkfs.ext4": executable file not found in $PATH

Expected results:

That the volume be formatted and mounted correctly. 

Additional info:

Comment 3 hchen 2016-03-15 19:28:49 UTC
message "Error syncing pod, skipping: exec: "mkfs.ext4": executable file not found in $PATH" indicates the format binary is either not installed or not found in $PATH.

Comment 4 hchen 2016-03-16 13:17:34 UTC
Colin

Comment 5 Scott Dodson 2016-03-16 19:25:21 UTC
PR to add e2fsprogs to origin-base image. 
https://github.com/openshift/origin/pull/8057

Do we need xfsprogs too for any reason?

Comment 6 hchen 2016-03-17 00:29:40 UTC
great!
yes, we need xfsprogs so we can make xfs filesystem.

Comment 7 Jan Safranek 2016-03-17 17:18:39 UTC
I'm not sure, should we add also iscsiadm for iSCSI, /usr/bin/rbd (ceph-common.rpm) for Ceph, maybe also git? There might be other tools we need inside containerized openshift node.

Comment 8 Paul Morie 2016-03-17 21:31:24 UTC
Huamin, Jan-

Can someone do a an audit of the volume plugins, looking at calls to exec, and try to determine a list of the userspace packages that we need?

Comment 9 Jan Safranek 2016-03-18 08:44:48 UTC
Quick grep in volume sources shows that these are needed:

rbd
modprobe - to load rbd modules. The modules must be on the host! I checked last year they were available on Atomic.
/usr/bin/udevadm (for both Cinder and GCE)
iscsiadm
git

And the plugins use also some random junk, dunno why it's not in code:
cat
nice
du


Most of them need to be in $PATH, only udevadm needs to be specifically in /usr/bin. Also, I am not sure if "udevadm trigger" does anything in container. We need the *host* udev to be triggered. We might encounter https://github.com/kubernetes/kubernetes/issues/7972 in a corner case.

Comment 10 hchen 2016-03-18 14:54:13 UTC
udev has to be on the host i believe, it is part of systemd and hard to manage inside container

Comment 11 Scott Dodson 2016-03-18 15:17:12 UTC
So far, in non containerized environments, we've had the installer (openshift-ansible) install them optionally based on what the user tells us they want. nfs-utils being the only package that's required of the node RPM.

I can either add all these packages to the node container's Dockerfile or I can simply add them to the RPM dependencies and they'd get installed everywhere the node RPM is installed regardless of whether they're used or not. Which is the right thing to do?

Comment 12 Bradley Childs 2016-03-18 17:15:52 UTC
i believe that they need to be both in the playbook and the image.  if you run kubelet from host then packages need to be on the host. if you run in a container they need to be in the container

Comment 13 Eric Paris 2016-03-21 15:00:18 UTC
I think Brad is wrong here. They need to be on the host. They serve no (direct) useful purpose being in the container.

There may be/have been some checks in the kubelet that looked for the binaries and failed when they were not present. But these checks do not make sense if the kubelet is containerize. Those should be fixed. Not bloating the docker image...

Comment 14 Eric Rich 2016-03-21 18:29:50 UTC
(In reply to Eric Paris from comment #13)
> I think Brad is wrong here. They need to be on the host. They serve no
> (direct) useful purpose being in the container.
> 
> There may be/have been some checks in the kubelet that looked for the
> binaries and failed when they were not present. But these checks do not make
> sense if the kubelet is containerize. Those should be fixed. Not bloating
> the docker image...

The host is RHEL atomic, where utilities like this are slim and not provided, unless needed.

Comment 15 Eric Paris 2016-03-21 19:54:58 UTC
I pretty sure I should stand by my comment. Utilities need to be on the host. The only reason they provide value inside the container is because they trick the kubelet into running the utilities on the host. If the host is missing storage utilities the only option today is to install them on atomic host.

Comment 16 Scott Dodson 2016-03-21 20:10:53 UTC
Also, with the exception of 'git' all of those utilities are in fact in Atomic Host 7.2.0. I believe they were deliberately added to enable kubernetes/openshift.

Comment 17 Eric Paris 2016-03-21 20:22:30 UTC
@sdodson that is correct. git was excluded and it was deemed that a git volume was not supported in AOS

Comment 18 Eric Paris 2016-03-22 17:04:05 UTC
I was at least partially wrong. The mount/attach utilities MUST be installed on the host. Containerized or RPM install this is a requirement today.

Some volume types require formatting/mkfs before they can be mounted. The openshift/kubernetes code does NOT use nsenter to get to the host before it calls format/mkfs. 

So we have 2 choices:
1) Change openshift so that we can nsenter the host before calling the format/mkfs.
2) a) Install the utilities inside the container.
   b) Make sure the utilities access the right block device.

I believe we think it would be less risky to try option #2 and sdodson yesterday updated the 'node' to include the format/mkfs utilities.

@docker people in order to easily meet 2b we would need to do `docker run -m /dev:/dev`. In general is that safe? Is there some reason NOT to mount the host /dev onto the container /dev?

Comment 19 Scott Dodson 2016-03-22 17:15:05 UTC
This other bug requests that -v /dev:/dev be added to node already, https://bugzilla.redhat.com/show_bug.cgi?id=1313210

If that's required to get this working that should happen in the same PR, ie: https://github.com/openshift/origin/pull/8182

Comment 20 Jan Safranek 2016-03-23 09:53:17 UTC
(In reply to Scott Dodson from comment #19)
> If that's required to get this working that should happen in the same PR,
> ie: https://github.com/openshift/origin/pull/8182

This PR misses iscaiadm and rbd utilities (see comment #9), I created additional one:
https://github.com/openshift/origin/pull/8206

Moving back to ASSIGNED until it's merged.

Comment 21 Eric Paris 2016-03-24 03:08:28 UTC
And back modified.

Comment 23 Jianwei Hou 2016-03-25 08:19:34 UTC
Verified on 
openshift v3.2.0.7
kubernetes v1.2.0-36-g4a3f9c5
etcd 2.2.5


On the node *host*
[root@openshift-146 ~]# rpm -qa|grep -e iscsi -e nfs -e gluster -e ceph
glusterfs-3.7.1-16.el7.x86_64
iscsi-initiator-utils-6.2.0.873-32.el7.x86_64
nfs-utils-1.3.0-0.21.el7_2.x86_64
glusterfs-libs-3.7.1-16.el7.x86_64
glusterfs-client-xlators-3.7.1-16.el7.x86_64
ceph-common-0.80.7-3.el7.x86_64
iscsi-initiator-utils-iscsiuio-6.2.0.873-32.el7.x86_64
libnfsidmap-0.25-12.el7.x86_64
glusterfs-fuse-3.7.1-16.el7.x86_64

In openshift3/node:v3.2.0.7 *container*:
[root@openshift-156 origin]# ls /usr/sbin/|grep mkfs
mkfs
mkfs.cramfs
mkfs.ext2
mkfs.ext3
mkfs.ext4
mkfs.minix
mkfs.xfs

[root@openshift-156 origin]# rpm -q e2fsprogs xfsprogs
e2fsprogs-1.42.9-7.el7.x86_64
xfsprogs-3.2.2-2.el7.x86_64

Comment 26 errata-xmlrpc 2016-05-12 16:32:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2016:1064


Note You need to log in before you can comment on or make changes to this bug.