Bug 846445 - Updating rhc-node rpm takes over an hour on nodes with a lot of gears
Updating rhc-node rpm takes over an hour on nodes with a lot of gears
Status: CLOSED CURRENTRELEASE
Product: OpenShift Origin
Classification: Red Hat
Component: Containers (Show other bugs)
2.x
Unspecified Unspecified
high Severity medium
: ---
: ---
Assigned To: Rob Millner
libra bugs
: Triaged
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-08-07 16:07 EDT by Thomas Wiest
Modified: 2015-05-14 18:57 EDT (History)
3 users (show)

See Also:
Fixed In Version: libra_ami #2027
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-09-17 17:29:42 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Thomas Wiest 2012-08-07 16:07:39 EDT
Description of problem:
Updating rhc-node rpm on nodes with a lot of gears is extremely slow. I didn't clock it, but it was definitely over an hour, possible as much as two.

The output showed a lot of chcons were run, and while it was running, pstree showed that it was running libra-tc.

|      |-sshd---ruby---yum---sh---service---libra-tc


Version-Release number of selected component (if applicable):
rhc-node-0.96.14-1.el6_3.x86_64


How reproducible:
Very on nodes with a high number of gears.


Steps to Reproduce:
1. Create over 3k gears on an instance
2. Upgrade the rhc-node rpm (it'd be a good idea to run this using 'time')
3. Notice how long it takes to upgrade

  
Actual results:
Takes a _very_ long time to update the rpm.


Expected results:
It shouldn't take that long to update the rpm.
Comment 1 Thomas Wiest 2012-08-07 16:20:41 EDT
Here's an example of a chcon that was run during the update:

chcon -t libra_var_lib_t -l s0:c2,c560 -R /var/lib/stickshift/088b2f0e4a764ee7a492254406d7f657/[^.]*
Comment 2 Rob Millner 2012-08-07 16:54:08 EDT
A patch went in today which should fix the issue where restorecon sets the wrong selinux context.

The libra-tc script sets traffic control limits on the gears.  Its unclear where or why it calls chcon; I'll have a look.
Comment 3 Rob Millner 2012-08-07 20:23:49 EDT
The rhc-node %post script does an rhc-restorecon to fix selinux permissions in /var/lib/stickshift.

Libra-tc sets up traffic control.

Both iterate over each gear and are likely suffer when there's a large number of gears on the node.
Comment 4 Mike McGrath 2012-08-10 14:46:20 EDT
One thing worth looking at is if these scripts even need to be run as a result of a simple update.
Comment 5 Rob Millner 2012-08-17 14:17:09 EDT
Created a node with 3000 gears and ran through the restarts in node's %post by hand with the following results:

cgconfig restart:          fast, but wipes out libra-cgroups
libra-cgroups restart:     2021 seconds
libra-tc restart:           226 seconds
rhc-restorecon:             749 seconds
rhc-ip-prep:               already controlled, didn't measure, takes a long time.


The conundrum is that changes to the cgroups and tc rules must take effect on upgrade, even on C9 nodes.

Similarly, either rhc-restorecon or a fixfiles should be run if the selinux file configuration policy is updated (libra.fc).

Maby the correct path is to touch /.autorelabel if libra.fc is new or has changed.
Comment 6 Rob Millner 2012-08-17 14:33:07 EDT
rhc-restorecon does the wrong thing anyway, we should remove the automatic invocation.
Comment 7 Rob Millner 2012-08-17 20:12:47 EDT
Commit 34af28a removes rhc-restorecon, and only runs cgconfig/libra-cgroups and libra-tc if they are not already initialized.
Comment 8 Rob Millner 2012-08-17 20:14:37 EDT
Pull request: https://github.com/openshift/li/pull/265
Comment 9 Rob Millner 2012-08-18 13:19:08 EDT
Pull request merged.
Comment 10 Johnny Liu 2012-08-21 07:25:25 EDT
Verified this bug with rhc-node-0.97.6-1.el6_3.x86_64.rpm, and PASS.

1. Start an old instance (devenv-stage_232)
2. Create an app
3. Run the following command to create a dummy testing envrionment that about 2000 gears are existing on this node.
$ for i in `seq 1 2000`; do useradd -b /var/lib/stickshift -c "libra guest" user$i; runuser -l user$i -s /bin/sh -c "cp -r /var/lib/stickshift/655c12fb14b14d9c820adb105e3c76e2/* /var/lib/stickshift/user${i}/"; done
4. Re-install rhc-node
# time yum -y reinstall rhc-node
<--snip-->
Running Transaction
  Installing : rhc-node-0.96.14-1.el6_3.x86_64                                                                                                                                       1/1
Stopping system message bus: [  OK  ]
Starting system message bus: [  OK  ]
Shutting down oddjobd: [  OK  ]
Starting oddjobd: [  OK  ]
<--snip-->
chcon -t libra_var_lib_t -l s0:c2,c200 -R /var/lib/stickshift/user1742/[^.]*
chcon -t libra_tmp_t -l s0:c2,c200 -R /var/lib/stickshift/user1742/.tmp/*
<--snip-->
Stopping stickshift-proxy: [  OK  ]
Starting stickshift-proxy: [  OK  ]
Stopping stickshift-proxy: [  OK  ]
Starting stickshift-proxy: [  OK  ]
  Verifying  : rhc-node-0.96.14-1.el6_3.x86_64         
<--snip-->
real    8m9.938s
user    0m47.861s
sys     5m6.053s


5. Download the latest rhc-node package, then re-install it.
# time yum install -y rhc-node-0.97.6-1.el6_3.x86_64.rpm
Stopping system message bus: [  OK  ]
Starting system message bus: [  OK  ]
Shutting down oddjobd: [  OK  ]
Starting oddjobd: [  OK  ]
Stopping stickshift-proxy: [  OK  ]
Starting stickshift-proxy: [  OK  ]
Stopping stickshift-proxy: [  OK  ]
Starting stickshift-proxy: [  OK  ]
  Cleanup    : rhc-node-0.96.14-1.el6_3.x86_64                                                                                                                                       2/2
  Verifying  : rhc-node-0.97.6-1.el6_3.x86_64                                                                                                                                        1/2

  Verifying  : rhc-node-0.96.14-1.el6_3.x86_64                                                                                                                                       2/2

<--snip-->
Updated:
  rhc-node.x86_64 0:0.97.6-1.el6_3
<--snip-->
real    0m27.186s
user    0m15.123s
sys     0m4.418s

The eclipsed time is shorter than before, it is very obvious.

Note You need to log in before you can comment on or make changes to this bug.