Bug 846445 - Updating rhc-node rpm takes over an hour on nodes with a lot of gears
Summary: Updating rhc-node rpm takes over an hour on nodes with a lot of gears
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OKD
Classification: Red Hat
Component: Containers
Version: 2.x
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: ---
Assignee: Rob Millner
QA Contact: libra bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-08-07 20:07 UTC by Thomas Wiest
Modified: 2015-05-14 22:57 UTC (History)
3 users (show)

Fixed In Version: libra_ami #2027
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-09-17 21:29:42 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Thomas Wiest 2012-08-07 20:07:39 UTC
Description of problem:
Updating rhc-node rpm on nodes with a lot of gears is extremely slow. I didn't clock it, but it was definitely over an hour, possible as much as two.

The output showed a lot of chcons were run, and while it was running, pstree showed that it was running libra-tc.

|      |-sshd---ruby---yum---sh---service---libra-tc


Version-Release number of selected component (if applicable):
rhc-node-0.96.14-1.el6_3.x86_64


How reproducible:
Very on nodes with a high number of gears.


Steps to Reproduce:
1. Create over 3k gears on an instance
2. Upgrade the rhc-node rpm (it'd be a good idea to run this using 'time')
3. Notice how long it takes to upgrade

  
Actual results:
Takes a _very_ long time to update the rpm.


Expected results:
It shouldn't take that long to update the rpm.

Comment 1 Thomas Wiest 2012-08-07 20:20:41 UTC
Here's an example of a chcon that was run during the update:

chcon -t libra_var_lib_t -l s0:c2,c560 -R /var/lib/stickshift/088b2f0e4a764ee7a492254406d7f657/[^.]*

Comment 2 Rob Millner 2012-08-07 20:54:08 UTC
A patch went in today which should fix the issue where restorecon sets the wrong selinux context.

The libra-tc script sets traffic control limits on the gears.  Its unclear where or why it calls chcon; I'll have a look.

Comment 3 Rob Millner 2012-08-08 00:23:49 UTC
The rhc-node %post script does an rhc-restorecon to fix selinux permissions in /var/lib/stickshift.

Libra-tc sets up traffic control.

Both iterate over each gear and are likely suffer when there's a large number of gears on the node.

Comment 4 Mike McGrath 2012-08-10 18:46:20 UTC
One thing worth looking at is if these scripts even need to be run as a result of a simple update.

Comment 5 Rob Millner 2012-08-17 18:17:09 UTC
Created a node with 3000 gears and ran through the restarts in node's %post by hand with the following results:

cgconfig restart:          fast, but wipes out libra-cgroups
libra-cgroups restart:     2021 seconds
libra-tc restart:           226 seconds
rhc-restorecon:             749 seconds
rhc-ip-prep:               already controlled, didn't measure, takes a long time.


The conundrum is that changes to the cgroups and tc rules must take effect on upgrade, even on C9 nodes.

Similarly, either rhc-restorecon or a fixfiles should be run if the selinux file configuration policy is updated (libra.fc).

Maby the correct path is to touch /.autorelabel if libra.fc is new or has changed.

Comment 6 Rob Millner 2012-08-17 18:33:07 UTC
rhc-restorecon does the wrong thing anyway, we should remove the automatic invocation.

Comment 7 Rob Millner 2012-08-18 00:12:47 UTC
Commit 34af28a removes rhc-restorecon, and only runs cgconfig/libra-cgroups and libra-tc if they are not already initialized.

Comment 8 Rob Millner 2012-08-18 00:14:37 UTC
Pull request: https://github.com/openshift/li/pull/265

Comment 9 Rob Millner 2012-08-18 17:19:08 UTC
Pull request merged.

Comment 10 Johnny Liu 2012-08-21 11:25:25 UTC
Verified this bug with rhc-node-0.97.6-1.el6_3.x86_64.rpm, and PASS.

1. Start an old instance (devenv-stage_232)
2. Create an app
3. Run the following command to create a dummy testing envrionment that about 2000 gears are existing on this node.
$ for i in `seq 1 2000`; do useradd -b /var/lib/stickshift -c "libra guest" user$i; runuser -l user$i -s /bin/sh -c "cp -r /var/lib/stickshift/655c12fb14b14d9c820adb105e3c76e2/* /var/lib/stickshift/user${i}/"; done
4. Re-install rhc-node
# time yum -y reinstall rhc-node
<--snip-->
Running Transaction
  Installing : rhc-node-0.96.14-1.el6_3.x86_64                                                                                                                                       1/1
Stopping system message bus: [  OK  ]
Starting system message bus: [  OK  ]
Shutting down oddjobd: [  OK  ]
Starting oddjobd: [  OK  ]
<--snip-->
chcon -t libra_var_lib_t -l s0:c2,c200 -R /var/lib/stickshift/user1742/[^.]*
chcon -t libra_tmp_t -l s0:c2,c200 -R /var/lib/stickshift/user1742/.tmp/*
<--snip-->
Stopping stickshift-proxy: [  OK  ]
Starting stickshift-proxy: [  OK  ]
Stopping stickshift-proxy: [  OK  ]
Starting stickshift-proxy: [  OK  ]
  Verifying  : rhc-node-0.96.14-1.el6_3.x86_64         
<--snip-->
real    8m9.938s
user    0m47.861s
sys     5m6.053s


5. Download the latest rhc-node package, then re-install it.
# time yum install -y rhc-node-0.97.6-1.el6_3.x86_64.rpm
Stopping system message bus: [  OK  ]
Starting system message bus: [  OK  ]
Shutting down oddjobd: [  OK  ]
Starting oddjobd: [  OK  ]
Stopping stickshift-proxy: [  OK  ]
Starting stickshift-proxy: [  OK  ]
Stopping stickshift-proxy: [  OK  ]
Starting stickshift-proxy: [  OK  ]
  Cleanup    : rhc-node-0.96.14-1.el6_3.x86_64                                                                                                                                       2/2
  Verifying  : rhc-node-0.97.6-1.el6_3.x86_64                                                                                                                                        1/2

  Verifying  : rhc-node-0.96.14-1.el6_3.x86_64                                                                                                                                       2/2

<--snip-->
Updated:
  rhc-node.x86_64 0:0.97.6-1.el6_3
<--snip-->
real    0m27.186s
user    0m15.123s
sys     0m4.418s

The eclipsed time is shorter than before, it is very obvious.


Note You need to log in before you can comment on or make changes to this bug.