Bug 1022444

Summary: "Tidy app" fail when application's quota is filled up
Product: OpenShift Online Reporter: Qiushui Zhang <qiuzhang>
Component: ocAssignee: Clayton Coleman <ccoleman>
Status: CLOSED CURRENTRELEASE QA Contact: libra bugs <libra-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 2.xCC: agrimm, dmace, dmcphers, wsun, xtian
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-01-30 00:49:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
logs from instance none

Description Qiushui Zhang 2013-10-23 09:56:21 UTC
Description of problem:
Fill up an application's 100% quota. Some app operation fail.
E.g rhc app tidy appname

Version-Release number of selected component (if applicable):
devenv_3932

How reproducible:
always

Steps to Reproduce:
1. Fill up an application quota
e.g. for a small gear app,  dd if=/dev/zero of=testfile bs=1M count=3000
100% quota will be filled.
2. Try to tidy the app
rhc app tidy rb

Actual results:
openshift@openshift-ubuntu:~/tmp$ rhc app tidy rb
Unable to complete the requested operation due to: Failed to correctly execute all parallel operations.
Reference ID: 223a7b09f2651977d0516f885cc1e194

Expected results:
Operations like tidy should be executed successfully.

Additional info:
Some other operations also fail. But I'm not sure whether they are bugs:
openshift@openshift-ubuntu:~/tmp$ rhc snapshot save rb
Pulling down a snapshot to rb.tar.gz...
Error writing to temporary file
Error in trying to save snapshot. You can try to save manually by running:
ssh 52673db26d6c465ab4000007.rhcloud.com 'snapshot' > rb.tar.gz

openshift@openshift-ubuntu:~/tmp$ rhc app restart rb
Unable to complete the requested operation due to: Failed to correctly execute all parallel operations.
Reference ID: 04b86acff0e882c824d20d5711a66543

Comment 1 Andy Grimm 2013-10-23 13:06:15 UTC
There are some cases here where tidy cannot succeed, because "git gc" requires disk space to perform the repack operation.  I think what we really need here is better feedback (and a recommended corrective action) to the user to explain what failed.  Currently they get the generic "Failed to correctly execute all parallel operations", and the only way for a user to be sure of where the failure occurs is for them to ssh into there gear and trace the processes that the "tidy" command spawns.

Comment 2 Dan Mace 2013-10-23 19:35:11 UTC
The only consistent behavior the platform attempts to enforce in an over-quota scenario is that when MCollective-based operations is performed, and the usage is over a threshold (default 90%), a generic warning is appended to the output of the reply regardless of the action. Here's an example of the tidy command I performed using QE's steps:

  $ rhc app tidy -pp
  Warning gear 5267e07b29934d01e2000026 is using 100.0% of disk quota

  RESULT:
  zendapp cleaned up

The only special-cased operations outside of MCollective which will generate such a message are snapshot and restore:

  $ rhc snapshot save -a zendapp -pp                                                                                           
  Pulling down a snapshot to zendapp.tar.gz...
  WARNING: The application's disk usage is very close to the quota limit. The snapshot may fail unexpectedly
  depending on the amount of data present and the snapshot procedure used by your application's cartridges.
  Creating and sending tar.gz
  /bin/tar: ./5267e07b29934d01e2000026/zend/usr/local/zend/tmp/monitor_commands.sock: socket ignored
  /bin/tar: ./5267e07b29934d01e2000026/zend/usr/local/zend/tmp/monitor_events.sock: socket ignored
  /bin/tar: ./5267e07b29934d01e2000026/zend/usr/local/zend/gui/lighttpd/tmp/php-fastcgi.socket: socket ignored
  
  RESULT:
  Success

If the message isn't rendered to the client for the `tidy` operation, then more information is required from the node to isolate the subsystem responsible: the runtime code generates the message in the MCollective reply; the broker interprets the response and feeds the message back to the REST API caller; rhc itself interprets the response from the broker and renders it to the client.

If this bug can be reproduced, please provide the broker and MCollective logs related to the scenario so we can assign the bug to the correct team. At this moment, neither myself nor Fabiano were able to reproduce the dropped message.

Comment 3 Qiushui Zhang 2013-10-24 08:46:20 UTC
I can reproduce it here in my environments.

openshift@openshift-ubuntu:~/tmp$ rhc app create rb ruby-1.9
Application Options
-------------------
  Domain:     walter
  Cartridges: ruby-1.9
  Gear Size:  default
  Scaling:    no

Creating application 'rb' ... done


Waiting for your DNS name to be available ... done

Cloning into 'rb'...
Warning: Permanently added 'rb-walter.dev.rhcloud.com,174.129.79.60' (RSA) to the list of known hosts.

Your application 'rb' is now available.

  URL:        http://rb-walter.dev.rhcloud.com/
  SSH to:     5268dccdcf5400d92d000007.rhcloud.com
  Git remote: ssh://5268dccdcf5400d92d000007.rhcloud.com/~/git/rb.git/
  Cloned to:  /home/openshift/tmp/rb

Run 'rhc show-app rb' for more details about your app.
openshift@openshift-ubuntu:~/tmp$ rhc ssh rb
Connecting to 5268dccdcf5400d92d000007.rhcloud.com ...

    *********************************************************************

    You are accessing a service that is for use only by authorized users.  
    If you do not have authorization, discontinue use at once. 
    Any use of the services is subject to the applicable terms of the 
    agreement which can be found at: 
    https://www.openshift.com/legal

    *********************************************************************

    Welcome to OpenShift shell

    This shell will assist you in managing OpenShift applications.

    !!! IMPORTANT !!! IMPORTANT !!! IMPORTANT !!!
    Shell access is quite powerful and it is possible for you to
    accidentally damage your application.  Proceed with care!
    If worse comes to worst, destroy your application with "rhc app delete"
    and recreate it
    !!! IMPORTANT !!! IMPORTANT !!! IMPORTANT !!!

    Type "help" for more info.


[rb-walter.dev.rhcloud.com 5268dccdcf5400d92d000007]\> dd if=/dev/zero of=~/app-root/data/testfile bs=1M count=3000
xvde2: write failed, user block limit reached.
dd: writing `/var/lib/openshift/5268dccdcf5400d92d000007//app-root/data/testfile': Disk quota exceeded
1024+0 records in
1023+0 records out
1072693248 bytes (1.1 GB) copied, 13.3805 s, 80.2 MB/s
[rb-walter.dev.rhcloud.com 5268dccdcf5400d92d000007]\> df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/xvde1            7.6G  5.7G  1.8G  77% /
none                  5.0M     0  5.0M   0% /dev/shm
/dev/xvde2            7.6G  3.5G  4.0G  47% /var
df: `/proc/sys/fs/binfmt_misc': Permission denied
[rb-walter.dev.rhcloud.com 5268dccdcf5400d92d000007]\> quota -s
Disk quotas for user 5268dccdcf5400d92d000007 (uid 1000): 
     Filesystem  blocks   quota   limit   grace   files   quota   limit   grace
     /dev/xvde2   1024M*      0   1024M             313       0   40000        
[rb-walter.dev.rhcloud.com 5268dccdcf5400d92d000007]\> ls -l ~/app-root/data/testfile 
-rw-r--r--. 1 5268dccdcf5400d92d000007 5268dccdcf5400d92d000007 1072693248 Oct 24 04:41 /var/lib/openshift/5268dccdcf5400d92d000007//app-root/data/testfile
[rb-walter.dev.rhcloud.com 5268dccdcf5400d92d000007]\> ls -lh ~/app-root/data/testfile 
-rw-r--r--. 1 5268dccdcf5400d92d000007 5268dccdcf5400d9


Please refer to the attached log for reference.

Comment 4 Qiushui Zhang 2013-10-24 08:46:46 UTC
Created attachment 815680 [details]
logs from instance

Comment 5 Qiushui Zhang 2013-10-24 08:58:25 UTC
---- Continue with comment 3

openshift@openshift-ubuntu:~/tmp$ rhc app tidy  rb
Unable to complete the requested operation due to: Failed to correctly execute all parallel operations.
Reference ID: 55d486bfde2cc54dafe4f25d08b5e31c

Comment 6 Dan Mace 2013-10-24 13:27:17 UTC
The broker development.log shows that the node returned the expected response containing CLIENT_MESSAGE entries, etc. It also shows a stack trace dumped by the broker while processing the response, and Qiushui's output indicates that rhc never received the output produced by the node.

Reassigning this to the broker team for further investigation since the current evidence suggests no bug on the node side.

Comment 7 Qiushui Zhang 2013-10-25 03:33:15 UTC
When all the quota of an application is filled up, user will see a prompt like "Your application is out of disk space; please run "rhc app-tidy myapp"" when SSH to the app.
We'd better do something - either cleaning the space or an useful/friendly hint message.

Comment 8 Abhishek Gupta 2013-10-25 22:57:15 UTC
With this fix (https://github.com/openshift/origin-server/pull/4005), the following messages are returned by the REST API. The CLI/UI now need to display them - the CLI currently does not display the debug/warning messages.

--------------------------

  <messages>
    <message>
      <severity>debug</severity>
      <text>Failed to execute: 'control stop' for /var/lib/openshift/526af118d4fc2bd249000001/ruby
Stopping Ruby cartridge

Error writing to temporary file

</text>
      <exit-code>1</exit-code>
      <field nil="true"></field>
      <index nil="true"></index>
    </message>
    <message>
      <severity>warning</severity>
      <text>Warning gear 526af118d4fc2bd249000001 is using 100.0% of disk quota
</text>
      <exit-code>1</exit-code>
      <field nil="true"></field>
      <index nil="true"></index>
    </message>
    <message>
      <severity>error</severity>
      <text>Unable to complete the requested operation due to: Failed to correctly execute all parallel operations.
Reference ID: 646a3bb8ccb7ffbe4ca0ea69af3314f2</text>
      <exit-code>1</exit-code>
      <field nil="true"></field>
      <index nil="true"></index>
    </message>
  </messages>

Comment 10 Clayton Coleman 2013-10-28 20:35:32 UTC
Fixed in https://github.com/openshift/rhc/pull/492

Comment 11 openshift-github-bot 2013-10-29 01:15:05 UTC
Commit pushed to master at https://github.com/openshift/rhc

https://github.com/openshift/rhc/commit/c67a8c95544f4180b70fec2e8db7671c6493475d
Bug 1022444 - Should also print warnings when an error occurs

Currently warnings are only being printed on success. Since warnings may be related to the failure of the user action we should also report them on errors.

Comment 12 Qiushui Zhang 2013-10-29 05:34:32 UTC
Tested on devenv_3959.

openshift@openshift-ubuntu:~/tmp$ rhc app tidy rb
Warning gear 526f4693eda29ec29e000007 is using 100.0% of disk quota

RESULT:
rb cleaned up

So does "rhc app restart/stop/.." and "rhc snapshot save"
Mark the bug as verified.