1260110 – trouble using persistentvolume from NFS after upgrading.

Bug 1260110 - trouble using persistentvolume from NFS after upgrading.

Summary: trouble using persistentvolume from NFS after upgrading.

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	OKD
Classification:	Red Hat
Component:	Storage
Sub Component:
Version:	3.x
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Mark Turansky
QA Contact:	Liang Xia
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-09-04 12:28 UTC by Henning F.
Modified:	2015-11-23 21:14 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2015-11-23 21:14:14 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Henning F. 2015-09-04 12:28:17 UTC

Description of problem: Cannot start pod with persistant volume, persistant volume recycling fails


Version-Release number of selected component (if applicable): 1.0.5


How reproducible: Tried two upgrades from 1.0.4 to 1.0.5 same error


Steps to Reproduce:
1. Upgrade from version 1.0.4 to version 1.0.5
2. Set up persistant volume with NFS
3. Try to start a pod that uses the persistantvolume
4. set persistentVolumeReclaimPolicy to Recycle
5. Delete project

Actual results:
Pod newer starts
Volume does not get recycled


Expected results:
Pod starts
Volume gets recycled


Additional info:
Recycling error: Unexpected error creating a pod to scrub volume :  Pod "pv-scrubber-nfs-kgp5y" is invalid: [spec.volumes[0].name: required value, spec.containers[0].volumeMounts[0].name: not found 'vol']
  phase: Failed

Comment 1 Mark Turansky 2015-09-08 18:29:23 UTC

This is fixed in Origin HEAD several days ago.  It is not in 1.0.5.

https://github.com/openshift/origin/pull/4384

Comment 2 Henning F. 2015-09-09 11:50:40 UTC

(In reply to Mark Turansky from comment #1)
> This is fixed in Origin HEAD several days ago.  It is not in 1.0.5.
> 
> https://github.com/openshift/origin/pull/4384

I tried to build a version of HEAD to test this, either I'm doing something wrong, or it's not quite fixed.

I tried to do this with the wordpress example (for the test I deliberately skipped doing anything else that creating the mysql pod) the empty "claim-wp" got available no problems, but the "claim-mysql" that contained data, failed.

error message from cleanup:
status:
  message: 'Recycling error: Pod failed, pod.Status.Message unknown.'
  phase: Failed

Comment 3 Liang Xia 2015-09-11 05:39:37 UTC

Move back based on #comment 2

Comment 4 Mark Turansky 2015-09-11 17:30:05 UTC

Please look at the container logs to see the error.

If the error says that the volume isn't empty, it's because dotfiles are not scrubbed in the recycler.   This PR fixes this issue:  https://github.com/openshift/origin/pull/3657

Comment 5 Henning F. 2015-09-14 08:58:14 UTC

(In reply to Mark Turansky from comment #4)
> Please look at the container logs to see the error.
> 
> If the error says that the volume isn't empty, it's because dotfiles are not
> scrubbed in the recycler.   This PR fixes this issue: 
> https://github.com/openshift/origin/pull/3657

No, it looks like it's a permission error.

docker logs da650e86b71c
removed '/scrub/ib_logfile0'
removed '/scrub/ib_logfile1'
removed '/scrub/ibdata1'
rm: cannot remove '/scrub/mysql': Permission denied
scrub directory /scrub is not empty
rm: cannot remove '/scrub/performance_schema': Permission denied
rm: cannot remove '/scrub/replication': Permission denied
rm: cannot remove '/scrub/wp_db': Permission denied

file rights:
pv0002]$ ls -lah
total 24K
drwxrwxrwx. 6 nfsnobody  nfsnobody 4.0K Sep 14 08:28 .
drwxrwxrwx. 4 nfsnobody  nfsnobody 4.0K Sep 14 08:23 ..
drwx------. 2 1000030000 root      4.0K Sep 14 08:27 mysql
drwx------. 2 1000030000 root      4.0K Sep 14 08:27 performance_schema
drwx------. 2 1000030000 root      4.0K Sep 14 08:27 replication
drwx------. 2 1000030000 root      4.0K Sep 14 08:27 wp_db

I tried again, this time, running "chmod -R 777 *" on the entire directory before setting the pv to recycle, when I did that, the recycler ran successfully.


docker logs 8d140bb2efd8
removed directory: '/scrub/replication'
removed '/scrub/wp_db/db.opt'
removed directory: '/scrub/wp_db'
scrub directory /scrub is empty
Scrub OK

Comment 6 DeShuai Ma 2015-09-14 10:56:28 UTC

When set persistentVolumeReclaimPolicy to Recycle. pvc can't bound to pv. pvc always pending.

$ openshift version
openshift v1.0.6-2-g1e58d08
kubernetes v1.1.0-alpha.0-1605-g44c91b1

[fedora@ip-172-18-6-78 db-templates]$ cat pv.json 
apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv0001
spec:
  capacity:
    storage: 1Gi
  accessModes:
    - ReadWriteOnce
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Recycle
  nfs:
    server: 172.18.6.78
    path: /myshare


logs:http://fpaste.org/266883/27918144/

Comment 7 Mark Turansky 2015-09-14 15:08:35 UTC

Setting a reclamation policy should have no effect on its binding.  If the PV/C you are using bound before, it should bind again.  

Open permissions recursively through the export is, at this time, required.  

Is this still a bug if you've run "chmod -R 777 *" and the recycler works as expected?

Comment 8 Mark Turansky 2015-09-14 15:11:55 UTC

Re: the binding issue, I see this in the logs:

NAME      LABELS    CAPACITY     ACCESSMODES   STATUS      CLAIM     REASON    AGE
pv0001    <none>    1073741824   RWX           Available                       3s


The PV's access modes should be accurate (i.e, RWO+ROX+RWX).

The PVC can use just "RWX" for binding when this PR is rebased into OS:  https://github.com/kubernetes/kubernetes/pull/10833

Comment 9 Henning F. 2015-09-14 16:34:37 UTC

(In reply to Mark Turansky from comment #7)
> Setting a reclamation policy should have no effect on its binding.  If the
> PV/C you are using bound before, it should bind again.  
> 
> Open permissions recursively through the export is, at this time, required.  
> 
> Is this still a bug if you've run "chmod -R 777 *" and the recycler works as
> expected?

I'm not shure, shoudn't the recycler be able to recycle the volume regardless of the rights on the claim that used it? Maybe this is something that should be solved outside the recycler? Either that or grant the recycler more rights?

Comment 10 Henning F. 2015-09-15 06:55:10 UTC

Just to clarify. If open permissions recursively at export time is required for the recycler to work, then I don't consider it a bug when the recycler runs fine when I first run "chmod +R 777 *"

Comment 11 DeShuai Ma 2015-09-15 08:21:09 UTC

The bug is fixed.

Version:
openshift v1.0.6-12-gb0c065c
kubernetes v1.1.0-alpha.0-1605-g44c91b1

The step to verify this bug:

1.Create a pv set "persistentVolumeReclaimPolicy" to "Recycle"
[fedora@ip-172-18-5-61 db-templates]$ cat /etc/exports
/myshare *(rw,all_squash)
[fedora@ip-172-18-5-61 db-templates]$ oc get pv
NAME      LABELS    CAPACITY     ACCESSMODES   STATUS      CLAIM     REASON    AGE
nfs       <none>    1073741824   RWX           Available                       12m
2.Create mysql to use this pv
[fedora@ip-172-18-5-61 db-templates]$ oc new-app mysql-persistent-template.json -n dma1
services/mysql
persistentvolumeclaims/mysql
deploymentconfigs/mysql
Service "mysql" created at 172.30.14.182 with port mappings 3306.
Run 'oc status' to view your app.
3.Check the pvc and mysql pod
[fedora@ip-172-18-5-61 db-templates]$ oc get pvc -n dma1
NAME      LABELS                                    STATUS    VOLUME    AGE
mysql     map[template:mysql-persistent-template]   Bound     nfs       8s
[fedora@ip-172-18-5-61 db-templates]$ oc get pod -n dma1
NAME            READY     STATUS    RESTARTS   AGE
mysql-1-4xrx9   1/1       Running   0          11s
4.Check the data in the nfs shared directory
[fedora@ip-172-18-5-61 db-templates]$ ls /myshare/
ibdata1  ib_logfile0  ib_logfile1  mysql  mysql-1-4xrx9.pid  performance_schema  replication  sampledb

5.Delete the project
[fedora@ip-172-18-5-61 db-templates]$ oc delete project dma1
project "dma1" deleted

6.Check the pv is Available and shared directory is empty
[fedora@ip-172-18-5-61 db-templates]$ oc get pv
NAME      LABELS    CAPACITY     ACCESSMODES   STATUS      CLAIM     REASON    AGE
nfs       <none>    1073741824   RWX           Available                       14m
[fedora@ip-172-18-5-61 db-templates]$ ls /myshare/

Actual results:
6.pv recycled again.

Expected results:
6.pv recycled again.

Note You need to log in before you can comment on or make changes to this bug.