Bug 1156361

Summary: Some apps cannot be accessed on node2 and node3
Product: OpenShift Online Reporter: Wenjing Zheng <wzheng>
Component: ContainersAssignee: Rajat Chopra <rchopra>
Status: CLOSED CURRENTRELEASE QA Contact: libra bugs <libra-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 2.xCC: admiller, jhonce, jhou, jokerman, mmccomas, rchopra, xtian
Target Milestone: ---   
Target Release: 2.x   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1157643 (view as bug list) Environment:
Last Closed: 2015-03-05 19:56:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1157643    

Description Wenjing Zheng 2014-10-24 09:48:53 UTC
Description:
a. After apps created on node3 cannot be accessed even after restart, including migrated apps and new created apps:
http://wordpress54-librat29.stg.rhcloud.com/
http://spy27-libra28t.stg.rhcloud.com/
http://jbossas7-libra28t.stg.rhcloud.com/
http://drupal53-libra28t.stg.rhcloud.com/
http://ews2storage-libra28t.stg.rhcloud.com/
http://wildflys-migrate.stg.rhcloud.com/
http://perldbjks-librat30.rhcloud.com/
http://storageapp-librat29.stg.rhcloud.com/
http://py27-migdm.stg.rhcloud.com/
http://jenkins1-librat29.stg.rhcloud.com/
http://php54s-librat30.stg.rhcloud.com/
http://ruby20dbjks-librat30.stg.rhcloud.com/
http://ruby19sha-librat29.stg.rhcloud.com/
b. Some apps on node1 return 404 after being accessed:
https://rubytest1-last.stg.rhcloud.com/
https://wordpress-gusun1.stg.rhcloud.com/
https://ruby-example.stg.rhcloud.com/
https://ruby-example.stg.rhcloud.com/
Version-Release number of selected component (if applicable):
STG(devenv-stage_1078)

How reproducible:
always

steps to Reproduce:
1. Create app on node3
2. Acess the app url

Actual results:
Some app cannot be access by returning below error:
The requested URL could not be retrieved

Expected results:
All should be accessed.

Additional info:

Comment 3 Rajat Chopra 2014-10-25 00:00:00 UTC
Fixed with pull request : https://github.com/openshift/origin-server/pull/5908

Summary -
Day one bug. cartridge configure timed out on mcollective side but the thread on the node didnt die. It went on to configure the frontend apache.
Meanwhile broker sent an app-destroy which deleted the vhost config directory.
Configure thread wakes up to create the conf file and barfs when it finds that the app's base dir is missing (and leaves its conf file behind after throwing up).

Fix : before configuring the vhost conf files, check if the dir exists or not.

Comment 4 Rajat Chopra 2014-10-27 19:18:59 UTC
To fix the node and corresponding broken apps, the solo config file that has its referenced directories deleted needs to be removed manually from the node.

In this bug's case, the config files associated with gear id 5449ce66dbd93cfa4d0004c6 on node3.

Comment 5 Wenjing Zheng 2014-10-28 13:09:10 UTC
Verified on devenv_5267 with below steps:
1. Create an app;
2. Delete base dir under/var/lib/openshift/.httpd.d, like 544fc7f127c8334a0300002b_d_app2;
3. Connect the fronted of the app:
$oo-devel-node frontend-connect --with-container-uuid 544fc7f127c8334a0300002b
4. Below error appears:
Base directory /etc/httpd/conf.d/openshift/544fc7f127c8334a0300002b_d_app2 does not exist for the app: app2-d.dev.rhcloud.com

Comment 6 Yujie Zhang 2014-10-29 07:31:56 UTC
Reopen this bug since this issue happens again on STG(devenv_stage_1083), applications created on node3 returns 404 error.

Comment 8 Jhon Honce 2014-11-11 22:33:51 UTC
rchopra says cannot reproduce.

Comment 9 Jianwei Hou 2014-11-12 11:35:10 UTC
This is fixed on STG. Both old and new apps can be accessed now.