Bug 1022444 - "Tidy app" fail when application's quota is filled up
"Tidy app" fail when application's quota is filled up
Status: CLOSED CURRENTRELEASE
Product: OpenShift Online
Classification: Red Hat
Component: Command Line Interface (Show other bugs)
2.x
Unspecified Unspecified
medium Severity medium
: ---
: ---
Assigned To: Clayton Coleman
libra bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-10-23 05:56 EDT by Qiushui Zhang
Modified: 2015-05-14 22:25 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-01-29 19:49:33 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
logs from instance (34.31 KB, application/zip)
2013-10-24 04:46 EDT, Qiushui Zhang
no flags Details

  None (edit)
Description Qiushui Zhang 2013-10-23 05:56:21 EDT
Description of problem:
Fill up an application's 100% quota. Some app operation fail.
E.g rhc app tidy appname

Version-Release number of selected component (if applicable):
devenv_3932

How reproducible:
always

Steps to Reproduce:
1. Fill up an application quota
e.g. for a small gear app,  dd if=/dev/zero of=testfile bs=1M count=3000
100% quota will be filled.
2. Try to tidy the app
rhc app tidy rb

Actual results:
openshift@openshift-ubuntu:~/tmp$ rhc app tidy rb
Unable to complete the requested operation due to: Failed to correctly execute all parallel operations.
Reference ID: 223a7b09f2651977d0516f885cc1e194

Expected results:
Operations like tidy should be executed successfully.

Additional info:
Some other operations also fail. But I'm not sure whether they are bugs:
openshift@openshift-ubuntu:~/tmp$ rhc snapshot save rb
Pulling down a snapshot to rb.tar.gz...
Error writing to temporary file
Error in trying to save snapshot. You can try to save manually by running:
ssh 52673db26d6c465ab4000007@rb-qiuzhang.dev.rhcloud.com 'snapshot' > rb.tar.gz

openshift@openshift-ubuntu:~/tmp$ rhc app restart rb
Unable to complete the requested operation due to: Failed to correctly execute all parallel operations.
Reference ID: 04b86acff0e882c824d20d5711a66543
Comment 1 Andy Grimm 2013-10-23 09:06:15 EDT
There are some cases here where tidy cannot succeed, because "git gc" requires disk space to perform the repack operation.  I think what we really need here is better feedback (and a recommended corrective action) to the user to explain what failed.  Currently they get the generic "Failed to correctly execute all parallel operations", and the only way for a user to be sure of where the failure occurs is for them to ssh into there gear and trace the processes that the "tidy" command spawns.
Comment 2 Dan Mace 2013-10-23 15:35:11 EDT
The only consistent behavior the platform attempts to enforce in an over-quota scenario is that when MCollective-based operations is performed, and the usage is over a threshold (default 90%), a generic warning is appended to the output of the reply regardless of the action. Here's an example of the tidy command I performed using QE's steps:

  $ rhc app tidy -pp
  Warning gear 5267e07b29934d01e2000026 is using 100.0% of disk quota

  RESULT:
  zendapp cleaned up

The only special-cased operations outside of MCollective which will generate such a message are snapshot and restore:

  $ rhc snapshot save -a zendapp -pp                                                                                           
  Pulling down a snapshot to zendapp.tar.gz...
  WARNING: The application's disk usage is very close to the quota limit. The snapshot may fail unexpectedly
  depending on the amount of data present and the snapshot procedure used by your application's cartridges.
  Creating and sending tar.gz
  /bin/tar: ./5267e07b29934d01e2000026/zend/usr/local/zend/tmp/monitor_commands.sock: socket ignored
  /bin/tar: ./5267e07b29934d01e2000026/zend/usr/local/zend/tmp/monitor_events.sock: socket ignored
  /bin/tar: ./5267e07b29934d01e2000026/zend/usr/local/zend/gui/lighttpd/tmp/php-fastcgi.socket: socket ignored
  
  RESULT:
  Success

If the message isn't rendered to the client for the `tidy` operation, then more information is required from the node to isolate the subsystem responsible: the runtime code generates the message in the MCollective reply; the broker interprets the response and feeds the message back to the REST API caller; rhc itself interprets the response from the broker and renders it to the client.

If this bug can be reproduced, please provide the broker and MCollective logs related to the scenario so we can assign the bug to the correct team. At this moment, neither myself nor Fabiano were able to reproduce the dropped message.
Comment 3 Qiushui Zhang 2013-10-24 04:46:20 EDT
I can reproduce it here in my environments.

openshift@openshift-ubuntu:~/tmp$ rhc app create rb ruby-1.9
Application Options
-------------------
  Domain:     walter
  Cartridges: ruby-1.9
  Gear Size:  default
  Scaling:    no

Creating application 'rb' ... done


Waiting for your DNS name to be available ... done

Cloning into 'rb'...
Warning: Permanently added 'rb-walter.dev.rhcloud.com,174.129.79.60' (RSA) to the list of known hosts.

Your application 'rb' is now available.

  URL:        http://rb-walter.dev.rhcloud.com/
  SSH to:     5268dccdcf5400d92d000007@rb-walter.dev.rhcloud.com
  Git remote: ssh://5268dccdcf5400d92d000007@rb-walter.dev.rhcloud.com/~/git/rb.git/
  Cloned to:  /home/openshift/tmp/rb

Run 'rhc show-app rb' for more details about your app.
openshift@openshift-ubuntu:~/tmp$ rhc ssh rb
Connecting to 5268dccdcf5400d92d000007@rb-walter.dev.rhcloud.com ...

    *********************************************************************

    You are accessing a service that is for use only by authorized users.  
    If you do not have authorization, discontinue use at once. 
    Any use of the services is subject to the applicable terms of the 
    agreement which can be found at: 
    https://www.openshift.com/legal

    *********************************************************************

    Welcome to OpenShift shell

    This shell will assist you in managing OpenShift applications.

    !!! IMPORTANT !!! IMPORTANT !!! IMPORTANT !!!
    Shell access is quite powerful and it is possible for you to
    accidentally damage your application.  Proceed with care!
    If worse comes to worst, destroy your application with "rhc app delete"
    and recreate it
    !!! IMPORTANT !!! IMPORTANT !!! IMPORTANT !!!

    Type "help" for more info.


[rb-walter.dev.rhcloud.com 5268dccdcf5400d92d000007]\> dd if=/dev/zero of=~/app-root/data/testfile bs=1M count=3000
xvde2: write failed, user block limit reached.
dd: writing `/var/lib/openshift/5268dccdcf5400d92d000007//app-root/data/testfile': Disk quota exceeded
1024+0 records in
1023+0 records out
1072693248 bytes (1.1 GB) copied, 13.3805 s, 80.2 MB/s
[rb-walter.dev.rhcloud.com 5268dccdcf5400d92d000007]\> df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/xvde1            7.6G  5.7G  1.8G  77% /
none                  5.0M     0  5.0M   0% /dev/shm
/dev/xvde2            7.6G  3.5G  4.0G  47% /var
df: `/proc/sys/fs/binfmt_misc': Permission denied
[rb-walter.dev.rhcloud.com 5268dccdcf5400d92d000007]\> quota -s
Disk quotas for user 5268dccdcf5400d92d000007 (uid 1000): 
     Filesystem  blocks   quota   limit   grace   files   quota   limit   grace
     /dev/xvde2   1024M*      0   1024M             313       0   40000        
[rb-walter.dev.rhcloud.com 5268dccdcf5400d92d000007]\> ls -l ~/app-root/data/testfile 
-rw-r--r--. 1 5268dccdcf5400d92d000007 5268dccdcf5400d92d000007 1072693248 Oct 24 04:41 /var/lib/openshift/5268dccdcf5400d92d000007//app-root/data/testfile
[rb-walter.dev.rhcloud.com 5268dccdcf5400d92d000007]\> ls -lh ~/app-root/data/testfile 
-rw-r--r--. 1 5268dccdcf5400d92d000007 5268dccdcf5400d9


Please refer to the attached log for reference.
Comment 4 Qiushui Zhang 2013-10-24 04:46:46 EDT
Created attachment 815680 [details]
logs from instance
Comment 5 Qiushui Zhang 2013-10-24 04:58:25 EDT
---- Continue with comment 3

openshift@openshift-ubuntu:~/tmp$ rhc app tidy  rb
Unable to complete the requested operation due to: Failed to correctly execute all parallel operations.
Reference ID: 55d486bfde2cc54dafe4f25d08b5e31c
Comment 6 Dan Mace 2013-10-24 09:27:17 EDT
The broker development.log shows that the node returned the expected response containing CLIENT_MESSAGE entries, etc. It also shows a stack trace dumped by the broker while processing the response, and Qiushui's output indicates that rhc never received the output produced by the node.

Reassigning this to the broker team for further investigation since the current evidence suggests no bug on the node side.
Comment 7 Qiushui Zhang 2013-10-24 23:33:15 EDT
When all the quota of an application is filled up, user will see a prompt like "Your application is out of disk space; please run "rhc app-tidy myapp"" when SSH to the app.
We'd better do something - either cleaning the space or an useful/friendly hint message.
Comment 8 Abhishek Gupta 2013-10-25 18:57:15 EDT
With this fix (https://github.com/openshift/origin-server/pull/4005), the following messages are returned by the REST API. The CLI/UI now need to display them - the CLI currently does not display the debug/warning messages.

--------------------------

  <messages>
    <message>
      <severity>debug</severity>
      <text>Failed to execute: 'control stop' for /var/lib/openshift/526af118d4fc2bd249000001/ruby
Stopping Ruby cartridge

Error writing to temporary file

</text>
      <exit-code>1</exit-code>
      <field nil="true"></field>
      <index nil="true"></index>
    </message>
    <message>
      <severity>warning</severity>
      <text>Warning gear 526af118d4fc2bd249000001 is using 100.0% of disk quota
</text>
      <exit-code>1</exit-code>
      <field nil="true"></field>
      <index nil="true"></index>
    </message>
    <message>
      <severity>error</severity>
      <text>Unable to complete the requested operation due to: Failed to correctly execute all parallel operations.
Reference ID: 646a3bb8ccb7ffbe4ca0ea69af3314f2</text>
      <exit-code>1</exit-code>
      <field nil="true"></field>
      <index nil="true"></index>
    </message>
  </messages>
Comment 10 Clayton Coleman 2013-10-28 16:35:32 EDT
Fixed in https://github.com/openshift/rhc/pull/492
Comment 11 openshift-github-bot 2013-10-28 21:15:05 EDT
Commit pushed to master at https://github.com/openshift/rhc

https://github.com/openshift/rhc/commit/c67a8c95544f4180b70fec2e8db7671c6493475d
Bug 1022444 - Should also print warnings when an error occurs

Currently warnings are only being printed on success. Since warnings may be related to the failure of the user action we should also report them on errors.
Comment 12 Qiushui Zhang 2013-10-29 01:34:32 EDT
Tested on devenv_3959.

openshift@openshift-ubuntu:~/tmp$ rhc app tidy rb
Warning gear 526f4693eda29ec29e000007 is using 100.0% of disk quota

RESULT:
rb cleaned up

So does "rhc app restart/stop/.." and "rhc snapshot save"
Mark the bug as verified.

Note You need to log in before you can comment on or make changes to this bug.