Bug 1624952 - Unhandled exception during appliance restart after upgrade
Summary: Unhandled exception during appliance restart after upgrade
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat CloudForms Management Engine
Classification: Red Hat
Component: Appliance
Version: 5.10.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: GA
: 5.10.0
Assignee: Gregg Tanzillo
QA Contact: Dave Johnson
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-09-03 16:39 UTC by Jan Zmeskal
Modified: 2018-09-06 13:59 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-09-05 16:07:02 UTC
Category: ---
Cloudforms Team: ---
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Jan Zmeskal 2018-09-03 16:39:13 UTC
Description of problem:
I had an appliance with CFME 5.10.0.12 and upgraded it to 5.10.0.14. After that, I restarted the appliance using appliance_console. However, the restart was unsuccessful and this was in output of systemctl status evmserverd:

Sep 03 09:00:17 <hostname> abrt[13090]: detected unhandled Ruby exception in '/var/www/miq/vmdb/lib/workers/bin/evm_server.rb'
Sep 03 09:00:17 <hostname> abrt[13090]: can't communicate with ABRT daemon, is it running? No such file or directory - connect(2) for /var/run/abrt/abrt.socket
Sep 03 09:00:17 <hostname> systemd[1]: evmserverd.service: main process exited, code=exited, status=1/FAILURE

Version-Release number of selected component (if applicable):
5.10.0.12 -> 5.10.0.14

How reproducible:
100 %
It also happened to me when I provisioned a brand-new 5.10.0.12 appliance and upgraded it to 5.10.0.14, so it's probably not anythnig random.

Steps to Reproduce:
1. Have a running CFME 5.10.0.12 appliance
2. Put packages.repo (in attachment) into /etc/yum.repos.d
3. yum -y update
4. Wait until yum update finished
5. Activate appliance console (by running "ap" in terminal)
6. Choose option 16 - Restart appliance
7. After the action is finished, check output of this command:
systemctl status evmserverd

Actual results:
MIQ server and all the processes should be running.

Expected results:
Soon after restart, you get output that is in the bug's description. After some while, it stabilizes on state "running". However, only those two processes are actually running:
MIQ Server
/bin/rpm -qa --qf %{NAME} %{VERSION}-%{RELEASE}
If you try to connect to web interface via browser, you get ERR_CONNECTION_REFUSED. 


Additional info:
- I am attaching several log files from the appliance. (You can see systemctl output in messages file that is taken from /var/log/messages.)
- The attached logs are not from the appliance where I originally encountered the problem, but for a fresh one provisioned just for the purpose of reproduction.
- I suspect that the updates between 5.10.0.X and 5.10.0.Y are probably not supported. However, I suspect that this unhandled exception might be indicative of some other problem and user probably never should encounter unhandled exception. Therefore I decided to report this bug.

Comment 4 Joe Rafaniello 2018-09-05 16:07:02 UTC
It is failing because there's an intermediate schema change missing.  

Because we're not at code freeze yet, we still allow schema changes from 5.10 to newer builds of 5.10.  Therefore, you can't upgrade from 5.10.x to 5.10.x yet without manually running migrations.

We code freeze on September 93 or so.  

[----] I, [2018-09-03T12:37:01.578289 #5782:434f7c]  INFO -- : MIQ(EvmDatabase.seed) Seeding NotificationType
[----] E, [2018-09-03T12:37:01.613245 #5782:434f7c] ERROR -- : [NoMethodError]: undefined method `link_to' for #<NotificationType:0x000000000c7ca7c0>  Method:[block (2 levels) in <class:LogProxy>]

The link_to column was added here:

https://github.com/ManageIQ/manageiq-schema/pull/263

You need to db/migrate after updating.

Once, we code freeze and create a hammer branch, we'll no longer allow schema changes and upgrades within 5.10 versions should work again.

Comment 5 Joe Rafaniello 2018-09-05 16:07:39 UTC
September 23rd, not 93.

Comment 6 Jan Zmeskal 2018-09-06 06:56:36 UTC
Joe, thank you for the clarification.

Jan


Note You need to log in before you can comment on or make changes to this bug.