Bug 1372672
Summary: | SSA Fails in Windows workloads but not in Linux ones on OSP9 | |||
---|---|---|---|---|
Product: | Red Hat CloudForms Management Engine | Reporter: | Victor Estival <vestival> | |
Component: | SmartState Analysis | Assignee: | Hui Song <hsong> | |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Ido Ovadia <iovadia> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | 5.6.0 | CC: | cbolz, dajohnso, iovadia, jhardy, jkeselma, obarenbo, roliveri, sbulage, simaishi, vestival, wrichter | |
Target Milestone: | GA | Keywords: | TestOnly, ZStream | |
Target Release: | 5.9.0 | |||
Hardware: | Unspecified | |||
OS: | Windows | |||
Whiteboard: | openstack:smartstate | |||
Fixed In Version: | 5.9.0.1 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1450514 1450515 1459235 (view as bug list) | Environment: | ||
Last Closed: | 2018-03-06 14:57:52 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | Openstack | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1437644 | |||
Bug Blocks: | 1450514, 1450515, 1459235 | |||
Attachments: |
Description
Victor Estival
2016-09-02 10:44:40 UTC
The two paste.fedoraproject.org files are not available. in addition, the attached screenshot has an error stating "Failed to create vm snapshot with EMS. Error: [NoMethodError]: [undefined method 'metadata' for nil:NilClass]. This does not match the error in your description of this BZ at all. Do you have any information related to the actual error you are reporting? Alternatively, do you have any background information related to the error in the screenshot? Created attachment 1220961 [details]
New screenshot showing windows SSA error message
Created attachment 1220962 [details]
evm.log
Windows image to be analysed is the image from cloudbase.it: https://cloudbase.it/windows-cloud-images/ Wolfram, This is also not the error in this BZ description. You stated above that this was a timeout issue. Are there three different issues here? Can you open separate BZs for each one if so? Thanks. Also can you please provide access to the appliance and Openstack provider in question. Thanks much. Jerry, understood that the information provided does not match the preliminary findings mentioned in the bug report. As background: Victor and I are building a demo and SSA on the windows image has consistently never worked. The appliance instance that Victor used for the initial bug report has long been destroyed, i.e. no log can be retrieved related to the original bug report. We can however provide screenshots, logs and access to three different openstack environments and cloudforms appliances SSA/Windows does not work today. Do you suggest to open another bugzilla to continue the analyisis and close this one since we cannot provide any updates? Created attachment 1221379 [details]
New Screenshot showing windows SSA status message showing results of both image and instance scans
Previous image contained only an image scan
Created attachment 1221380 [details]
evm log dated 2016-11-16
For an image scan, there is the error message mentioned in the previous bug report in the logs:
[----] W, [2016-11-16T12:57:32.424030 #53725:f53994] WARN -- : MIQ(VmScan#timeout!) Job: guid: [1620a31e-abec-11e6-a010-020000000111], job timed out after 3048.072594672 seconds of inactivity. Inactivity threshold [3000 seconds], aborting
[----] E, [2016-11-16T12:57:46.974131 #53717:f53994] ERROR -- : MIQ(VmScan#process_abort) job aborting, job timed out after 3048.072594672 seconds of inactivity. Inactivity threshold [3000 seconds]
[----] I, [2016-11-16T12:57:47.035964 #53717:f53994] INFO -- : MIQ(VmScan#process_finished) job finished, job timed out after 3048.072594672 seconds of inactivity. Inactivity threshold [3000 seconds]
[----] I, [2016-11-16T12:57:47.060489 #53717:f53994] INFO -- : MIQ(VmScan#dispatch_finish) Dispatch Status is 'finished'
Volfram, I tested SSA on the same window VM. It failed with following errors: [----] E, [2017-04-18T03:19:44.370855 #38245:36112c] ERROR -- : MIQ(MiqQueue#deliver) Message id: [919000000161855], Error: [wrong number of arguments (given 2, expected 1)] [----] E, [2017-04-18T03:19:44.371140 #38245:36112c] ERROR -- : [ArgumentError]: wrong number of arguments (given 2, expected 1) Method:[rescue in deliver] [----] E, [2017-04-18T03:19:44.371228 #38245:36112c] ERROR -- : /var/www/miq/vmdb/app/models/storage.rb:725:in `perf_capture' /var/www/miq/vmdb/app/models/storage.rb:717:in `perf_capture_hourly' /var/www/miq/vmdb/app/models/miq_queue.rb:347:in `block in deliver' /opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:91:in `block in timeout' /opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:33:in `block in catch' /opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:33:in `catch' ...... The BZ https://bugzilla.redhat.com/show_bug.cgi?id=1437644 has already opened for this problem. We have to wait until that fix is merged here. Updated the dependency. Hui, thanks for the analysis - what I find odd is that on CFME 5.6 SSA does work for Linux VMs, it only doesn't work for Windows. If the performance capture is on the code path I would assume that it should also fail for Linux VMs. Cheers, Wolfram (In reply to Wolfram Richter from comment #12) > Hui, > > thanks for the analysis - what I find odd is that on CFME 5.6 SSA does work > for Linux VMs, it only doesn't work for Windows. If the performance capture > is on the code path I would assume that it should also fail for Linux VMs. > > Cheers, > Wolfram Yes, you are right. This is not the root cause for SSA failing on Window VMs. It only blocked us to reproduce the issue. I'll retest it after the fix is patched. Thanks, Hui Wolfram, May I borrow your https://cloudforms.hailstorm2.coe.muc.redhat.com? I found one suspicious codes and need to prove it's the root cause. Thanks, you can use https://cloudforms.hailstorm1.coe.muc.redhat.com/ (CFME 5.7.0.17, same credentials). Hailstorm2 is currently being reinstalled as testbed for CFME 5.8 beta 2) Thank you, Wolfram. In this new appliance, somehow the openstack provider's validation failed. Can you revalidate it? Sorry, I see that all my OpenStack environments seem to have keystone problems - I'll report back when I have a usable env. https://cloudforms.hailstorm2.coe.muc.redhat.com/ (CFME 5.8.0.10) and its openstack are in a working condition again, the others (with CFME 5.7) will probably reappear tomorrow https://cloudforms.hailstorm3.coe.muc.redhat.com/ (CFME 5.7.2.1) and its openstack are also working again. The hailstorm2 env. will probable be of limited availability since I'm working on the underlying RHEL (hailstorm3 will be stable). Wolfram, The scan target here is Openstack image of windows server 2012. The root causes of SSA failure are: 1. Its OS device signature is shorter than expect (22 vs. 60), which causes 'Unable to mount filesystem. Reason:[No root filesystem found.]'; 2. vmConfig is undefined for MiqOpenstackeImage, which causes job hangs and finally times out. I'll send PR request to fix them. Thank you for your environments. There are two PRs for this BZ: https://github.com/ManageIQ/manageiq-gems-pending/pull/143 https://github.com/ManageIQ/manageiq-gems-pending/pull/144 They are merged into master branch. Verified ======== 5.9.0.22 Hello Dave, Worked with Ido Ovadia and cross-checked that SSA is gathering all required data from Windows Instance and Image. Thanks, Satyajit Bulage. |