Bug 719590 - Aeolus does not start properly after reboot
Summary: Aeolus does not start properly after reboot
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: CloudForms Cloud Engine
Classification: Retired
Component: aeolus-configure
Version: 0.3.1
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
Assignee: Mike Orazi
QA Contact: wes hayutin
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-07-07 12:32 UTC by wes hayutin
Modified: 2012-01-26 12:24 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:


Attachments (Terms of Use)
ruby script to check services (1017 bytes, text/plain)
2011-07-07 12:32 UTC, wes hayutin
no flags Details

Description wes hayutin 2011-07-07 12:32:52 UTC
Created attachment 511684 [details]
ruby script to check services

BUG:  Ensure that *ALL* the aeolus services are started after a reboot

Recreate:
Get a build, install, configure... 
Ensure everything is working well..

Reboot the server

Notice on reboot the following services do *NOT* start properly.
1. dbomatic
2. conductor-delayed_job
3. deltacloud-mock
4. imagefactory
5. aeolus-connector

Hrm.. OK.. this does not appear to be consistent...

On another box after rebooting... other services are *NOT* coming up.
1. deltacloud-mock  * dup
2. iwhd  * new
3. mongodb * new
The other services listed above are running




adding a test script so you can check the services before and after reboot

Comment 1 wes hayutin 2011-07-07 12:39:15 UTC
some debug info..

IWHD

[root@dell-pe2950-01 ~]# 
[root@dell-pe2950-01 ~]# /etc/init.d/mongod start
Starting mongod: [  OK  ]
[root@dell-pe2950-01 ~]# /etc/init.d/iwhd start
waiting for mongod to listen on localhost:27017[FAILED]
[root@dell-pe2950-01 ~]# telnet localhost 27017
Trying ::1...
telnet: connect to address ::1: Connection refused
Trying 127.0.0.1...
telnet: connect to address 127.0.0.1: Connection refused
[root@dell-pe2950-01 ~]# /etc/init.d/mongod restart
Stopping mongod: [FAILED]
Starting mongod: [  OK  ]
[root@dell-pe2950-01 ~]# telnet localhost 27017
Trying ::1...
telnet: connect to address ::1: Connection refused
Trying 127.0.0.1...
telnet: connect to address 127.0.0.1: Connection refused
[root@dell-pe2950-01 ~]# /etc/init.d/iwhd start
waiting for mongod to listen on localhost:27017[FAILED]
[root@dell-pe2950-01 ~]# 


This is caused by the mongodb lock file not getting cleaned up before shutdown..
We probably want to ensure that.. even though its not our responsibility 

[root@dell-pe2950-01 ~]# rm -Rf /var/lib/mongodb/mongod.lock 
[root@dell-pe2950-01 ~]# /etc/init.d/mongod start
Starting mongod: [  OK  ]
[root@dell-pe2950-01 ~]# ps -ef | grep mongodb
mongodb  15321     1  0 08:38 ?        00:00:00 /usr/bin/mongod --quiet -f /etc/mongodb.conf run
root     15329 14663  0 08:38 pts/0    00:00:00 grep --color=auto mongodb
[root@dell-pe2950-01 ~]#

Comment 2 wes hayutin 2011-07-07 12:42:08 UTC
On the other box.. once I started dbomatic.. the other scripts process started..

Comment 3 wes hayutin 2011-07-08 14:00:19 UTC
moving to on_qa..

be sure to set selinux to premissive before restarting.

Comment 4 wes hayutin 2011-07-08 16:35:01 UTC
working... in 


[root@sgi-xe310-02 ~]# ruby /root/checkServices.rb 

Checking aeolus-conductor ...
 Success: (pid  2130) is running...

Checking aeolus-connector ...
 Success: image_factory_connector (pid  1944) is running...

Checking condor ...
 Success: condor_master (pid  2115) is running...

Checking conductor-dbomatic ...
 Success: dbomatic (pid  2272) is running...

Checking conductor-delayed_job ...
 Success: delayed_job (pid  2318) is running...

Checking deltacloud-ec2-us-east-1 ...
 Success: deltacloudd (pid  1949) is running...

Checking deltacloud-ec2-us-west-1 ...
 Success: deltacloudd (pid  1962) is running...

Checking deltacloud-mock ...
 FAILURE: deltacloudd dead but pid file exists

Checking httpd ...
 Success: httpd (pid  1845) is running...

Checking imagefactory ...
 Success: imagefactory (pid  2333) is running...

Checking iwhd ...
 Success: iwhd (pid  1685) is running...

Checking libvirtd ...
 Success: libvirtd (pid  1985) is running...

Checking mongod ...
 Success: mongod (pid 1592) is running...

Checking ntpd ...
 Success: ntpd (pid  1758) is running...

Checking postgresql ...
 Success: postmaster (pid  1789) is running...

Checking qpidd ...
 Success: qpidd (pid  1875) is running...

Checking production solr ...
 Success: COMMAND  PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
java    1698 root   64u  IPv6  13930      0t0  TCP *:8983 (LISTEN)

Checking connector ...
 Success: COMMAND    PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
image_fac 1944 root   12u  IPv4  16297      0t0  TCP localhost:cfinger (LISTEN)

Checking condor_q ...
 Success: -- Submitter: sgi-xe310-02.rhts.eng.bos.redhat.com : <10.16.65.19:51801> : sgi-xe310-02.rhts.eng.bos.redhat.com
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               

0 jobs; 0 idle, 0 running, 0 held

Checking condor_status ...
 Success: 
[root@sgi-xe310-02 ~]# rpm -qa | grep aeolus
aeolus-conductor-0.3.0-0.el6.20110708135911gitdb1097c.noarch
rubygem-aeolus-cli-0.0.1-1.el6.20110708135911gitdb1097c.noarch
aeolus-all-0.3.0-0.el6.20110708135911gitdb1097c.noarch
aeolus-configure-2.0.1-0.el6.20110707131907gitfaa220b.noarch
aeolus-conductor-doc-0.3.0-0.el6.20110708135911gitdb1097c.noarch
aeolus-conductor-daemons-0.3.0-0.el6.20110708135911gitdb1097c.noarch
[root@sgi-xe310-02 ~]#

Comment 5 wes hayutin 2011-07-11 00:30:17 UTC
removing from tracker

Comment 6 wes hayutin 2011-08-01 20:00:55 UTC
release pending...

Comment 7 wes hayutin 2011-08-01 20:00:58 UTC
release pending...

Comment 8 wes hayutin 2011-08-01 20:01:09 UTC
release pending.. 2

Comment 10 wes hayutin 2011-12-08 13:59:52 UTC
perm close

Comment 11 wes hayutin 2011-12-08 14:02:40 UTC
closing out old bugs


Note You need to log in before you can comment on or make changes to this bug.