Bug 1022444
Summary: | "Tidy app" fail when application's quota is filled up | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Online | Reporter: | Qiushui Zhang <qiuzhang> | ||||
Component: | oc | Assignee: | Clayton Coleman <ccoleman> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | libra bugs <libra-bugs> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 2.x | CC: | agrimm, dmace, dmcphers, wsun, xtian | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2014-01-30 00:49:33 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Qiushui Zhang
2013-10-23 09:56:21 UTC
There are some cases here where tidy cannot succeed, because "git gc" requires disk space to perform the repack operation. I think what we really need here is better feedback (and a recommended corrective action) to the user to explain what failed. Currently they get the generic "Failed to correctly execute all parallel operations", and the only way for a user to be sure of where the failure occurs is for them to ssh into there gear and trace the processes that the "tidy" command spawns. The only consistent behavior the platform attempts to enforce in an over-quota scenario is that when MCollective-based operations is performed, and the usage is over a threshold (default 90%), a generic warning is appended to the output of the reply regardless of the action. Here's an example of the tidy command I performed using QE's steps: $ rhc app tidy -pp Warning gear 5267e07b29934d01e2000026 is using 100.0% of disk quota RESULT: zendapp cleaned up The only special-cased operations outside of MCollective which will generate such a message are snapshot and restore: $ rhc snapshot save -a zendapp -pp Pulling down a snapshot to zendapp.tar.gz... WARNING: The application's disk usage is very close to the quota limit. The snapshot may fail unexpectedly depending on the amount of data present and the snapshot procedure used by your application's cartridges. Creating and sending tar.gz /bin/tar: ./5267e07b29934d01e2000026/zend/usr/local/zend/tmp/monitor_commands.sock: socket ignored /bin/tar: ./5267e07b29934d01e2000026/zend/usr/local/zend/tmp/monitor_events.sock: socket ignored /bin/tar: ./5267e07b29934d01e2000026/zend/usr/local/zend/gui/lighttpd/tmp/php-fastcgi.socket: socket ignored RESULT: Success If the message isn't rendered to the client for the `tidy` operation, then more information is required from the node to isolate the subsystem responsible: the runtime code generates the message in the MCollective reply; the broker interprets the response and feeds the message back to the REST API caller; rhc itself interprets the response from the broker and renders it to the client. If this bug can be reproduced, please provide the broker and MCollective logs related to the scenario so we can assign the bug to the correct team. At this moment, neither myself nor Fabiano were able to reproduce the dropped message. I can reproduce it here in my environments. openshift@openshift-ubuntu:~/tmp$ rhc app create rb ruby-1.9 Application Options ------------------- Domain: walter Cartridges: ruby-1.9 Gear Size: default Scaling: no Creating application 'rb' ... done Waiting for your DNS name to be available ... done Cloning into 'rb'... Warning: Permanently added 'rb-walter.dev.rhcloud.com,174.129.79.60' (RSA) to the list of known hosts. Your application 'rb' is now available. URL: http://rb-walter.dev.rhcloud.com/ SSH to: 5268dccdcf5400d92d000007.rhcloud.com Git remote: ssh://5268dccdcf5400d92d000007.rhcloud.com/~/git/rb.git/ Cloned to: /home/openshift/tmp/rb Run 'rhc show-app rb' for more details about your app. openshift@openshift-ubuntu:~/tmp$ rhc ssh rb Connecting to 5268dccdcf5400d92d000007.rhcloud.com ... ********************************************************************* You are accessing a service that is for use only by authorized users. If you do not have authorization, discontinue use at once. Any use of the services is subject to the applicable terms of the agreement which can be found at: https://www.openshift.com/legal ********************************************************************* Welcome to OpenShift shell This shell will assist you in managing OpenShift applications. !!! IMPORTANT !!! IMPORTANT !!! IMPORTANT !!! Shell access is quite powerful and it is possible for you to accidentally damage your application. Proceed with care! If worse comes to worst, destroy your application with "rhc app delete" and recreate it !!! IMPORTANT !!! IMPORTANT !!! IMPORTANT !!! Type "help" for more info. [rb-walter.dev.rhcloud.com 5268dccdcf5400d92d000007]\> dd if=/dev/zero of=~/app-root/data/testfile bs=1M count=3000 xvde2: write failed, user block limit reached. dd: writing `/var/lib/openshift/5268dccdcf5400d92d000007//app-root/data/testfile': Disk quota exceeded 1024+0 records in 1023+0 records out 1072693248 bytes (1.1 GB) copied, 13.3805 s, 80.2 MB/s [rb-walter.dev.rhcloud.com 5268dccdcf5400d92d000007]\> df -h Filesystem Size Used Avail Use% Mounted on /dev/xvde1 7.6G 5.7G 1.8G 77% / none 5.0M 0 5.0M 0% /dev/shm /dev/xvde2 7.6G 3.5G 4.0G 47% /var df: `/proc/sys/fs/binfmt_misc': Permission denied [rb-walter.dev.rhcloud.com 5268dccdcf5400d92d000007]\> quota -s Disk quotas for user 5268dccdcf5400d92d000007 (uid 1000): Filesystem blocks quota limit grace files quota limit grace /dev/xvde2 1024M* 0 1024M 313 0 40000 [rb-walter.dev.rhcloud.com 5268dccdcf5400d92d000007]\> ls -l ~/app-root/data/testfile -rw-r--r--. 1 5268dccdcf5400d92d000007 5268dccdcf5400d92d000007 1072693248 Oct 24 04:41 /var/lib/openshift/5268dccdcf5400d92d000007//app-root/data/testfile [rb-walter.dev.rhcloud.com 5268dccdcf5400d92d000007]\> ls -lh ~/app-root/data/testfile -rw-r--r--. 1 5268dccdcf5400d92d000007 5268dccdcf5400d9 Please refer to the attached log for reference. Created attachment 815680 [details]
logs from instance
---- Continue with comment 3 openshift@openshift-ubuntu:~/tmp$ rhc app tidy rb Unable to complete the requested operation due to: Failed to correctly execute all parallel operations. Reference ID: 55d486bfde2cc54dafe4f25d08b5e31c The broker development.log shows that the node returned the expected response containing CLIENT_MESSAGE entries, etc. It also shows a stack trace dumped by the broker while processing the response, and Qiushui's output indicates that rhc never received the output produced by the node. Reassigning this to the broker team for further investigation since the current evidence suggests no bug on the node side. When all the quota of an application is filled up, user will see a prompt like "Your application is out of disk space; please run "rhc app-tidy myapp"" when SSH to the app. We'd better do something - either cleaning the space or an useful/friendly hint message. With this fix (https://github.com/openshift/origin-server/pull/4005), the following messages are returned by the REST API. The CLI/UI now need to display them - the CLI currently does not display the debug/warning messages. -------------------------- <messages> <message> <severity>debug</severity> <text>Failed to execute: 'control stop' for /var/lib/openshift/526af118d4fc2bd249000001/ruby Stopping Ruby cartridge Error writing to temporary file </text> <exit-code>1</exit-code> <field nil="true"></field> <index nil="true"></index> </message> <message> <severity>warning</severity> <text>Warning gear 526af118d4fc2bd249000001 is using 100.0% of disk quota </text> <exit-code>1</exit-code> <field nil="true"></field> <index nil="true"></index> </message> <message> <severity>error</severity> <text>Unable to complete the requested operation due to: Failed to correctly execute all parallel operations. Reference ID: 646a3bb8ccb7ffbe4ca0ea69af3314f2</text> <exit-code>1</exit-code> <field nil="true"></field> <index nil="true"></index> </message> </messages> Commit pushed to master at https://github.com/openshift/origin-server https://github.com/openshift/origin-server/commit/f7114ec7fbbf02a1a98138df883d4185701b485c Fix for bug 1022444 Commit pushed to master at https://github.com/openshift/rhc https://github.com/openshift/rhc/commit/c67a8c95544f4180b70fec2e8db7671c6493475d Bug 1022444 - Should also print warnings when an error occurs Currently warnings are only being printed on success. Since warnings may be related to the failure of the user action we should also report them on errors. Tested on devenv_3959. openshift@openshift-ubuntu:~/tmp$ rhc app tidy rb Warning gear 526f4693eda29ec29e000007 is using 100.0% of disk quota RESULT: rb cleaned up So does "rhc app restart/stop/.." and "rhc snapshot save" Mark the bug as verified. |