Bug 1512631
Summary: | failing vdo status commands should mention vdoconf.yml as a possible solution | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Corey Marthaler <cmarthal> |
Component: | kmod-kvdo | Assignee: | Corey Marthaler <cmarthal> |
Status: | CLOSED NOTABUG | QA Contact: | vdo-qe |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 7.5 | CC: | awalsh, bgurney, jkrysl, jshimkus, limershe |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | 6.1.0.85 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-01-03 20:57:49 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Corey Marthaler
2017-11-13 17:42:24 UTC
There may have been a version mismatch here, due to BZ 1510176 causing the old module to not be unloaded after the "yum remove" and "yum install" phase. Also, BZ 1511096 covers the module not displaying its version in modinfo, which would have otherwise identified the old module. Let me know what the remove / create sequence looks like after the reboot. Corey let me know that the issue survives after the reboot; however, the manual removal was incomplete, because there was still an entry in the /etc/vdoconf.yml file. Since it was the only entry, he removed the /etc/vdoconf.yml file, and the "Failed to make FileLayer" error message for the nonexistent device no longer appeared. The remaining question: is there a message that the "vdo" command can relay off of the vdodumpconfig "Failed to make FileLayer... No such file or directory" message? It could be something to convey that there could be a configuration entry for a VDO volume stored on a device that no longer exists. I'm not sure we can do anything special here. vdodumpconfig really has no knowledge of vdo manager or its config file, nor should it. Could the message describe a probable cause for the error conditiion? Would that be useful or misleading? Rereading the question here. Let me rephrase my answer. Is there something we could do? Sure. I'm just not sure its the best idea. We could parse vdodumpconfig output to stderr for specific error messages and then try to relog them as more vdo specific things. But we would have to be very sure about what mappings to create. Also, we don't really do this sort of this now with other tools we use, like vdoformat for instance. We just let the tool display what error it gets. This feels like it should be a PM or CEE decision. At the minimnum, we should make sure that our generic messages provide information about common potential causes of failures. It is better if we can give the customer more direction with a couple days of engineering effort, i think we should. If it's more than that, we should think about putting more planning into doing it for a future release. Here's another 'vdo status' failure after a successful creation (but with a left over entry from a failed prior vdo creation) that again survives reboots. [root@host-116 ~]# vdostats --human-readable Device Size Used Available Use% Space saving% /dev/mapper/origin 20.0G 4.0G 16.0G 20% 94% [root@host-116 ~]# vdo status vdo: ERROR - VDO volume PV previous operation (create) is incomplete Nov 29 15:10:28 host-116 vdo: ERROR - VDO volume PV previous operation (create) is incomplete After removing the invalid entry (caused by a prior failed create), vdo status worked again. If vdo status is failing we need to educate users (i'd argue in the failure message itself) about the /etc/vdoconf.yml file if manually editing/cleaning it is going to be the only way in which to have the status command work again. (In reply to Corey Marthaler from comment #9) > Here's another 'vdo status' failure after a successful creation (but with a > left over entry from a failed prior vdo creation) that again survives > reboots. > > [root@host-116 ~]# vdostats --human-readable > Device Size Used Available Use% Space saving% > /dev/mapper/origin 20.0G 4.0G 16.0G 20% 94% > > [root@host-116 ~]# vdo status > vdo: ERROR - VDO volume PV previous operation (create) is incomplete > > Nov 29 15:10:28 host-116 vdo: ERROR - VDO volume PV previous operation > (create) is incomplete > > After removing the invalid entry (caused by a prior failed create), vdo > status worked again. > > If vdo status is failing we need to educate users (i'd argue in the failure > message itself) about the /etc/vdoconf.yml file if manually editing/cleaning > it is going to be the only way in which to have the status command work > again. If you have an entry in the config from a failed previous create, you shouldn't need to manually edit the config file (I would never suggest doing this ever). You should be able to run vdo remove with the --force method to clear it from the config file. (In reply to bjohnsto from comment #10) > (In reply to Corey Marthaler from comment #9) > > Here's another 'vdo status' failure after a successful creation (but with a > > left over entry from a failed prior vdo creation) that again survives > > reboots. > > > > [root@host-116 ~]# vdostats --human-readable > > Device Size Used Available Use% Space saving% > > /dev/mapper/origin 20.0G 4.0G 16.0G 20% 94% > > > > [root@host-116 ~]# vdo status > > vdo: ERROR - VDO volume PV previous operation (create) is incomplete > > > > Nov 29 15:10:28 host-116 vdo: ERROR - VDO volume PV previous operation > > (create) is incomplete > > > > After removing the invalid entry (caused by a prior failed create), vdo > > status worked again. > > > > If vdo status is failing we need to educate users (i'd argue in the failure > > message itself) about the /etc/vdoconf.yml file if manually editing/cleaning > > it is going to be the only way in which to have the status command work > > again. > > If you have an entry in the config from a failed previous create, you > shouldn't need to manually edit the config file (I would never suggest doing > this ever). You should be able to run vdo remove with the --force method to > clear it from the config file. meant the --force option, not method. I am not able to hit the error using vdo status. I reproduced it with vdo start by stopping the vdo, removing the lv under it and starting it again. At this point there is only the /etc/vdoconfig.yml entry, which makes the vdo think this particular volume still exists. # vdo start --name vdo Starting VDO vdo vdo: ERROR - Could not set up device mapper for vdo vdo: ERROR - vdodumpconfig: Failed to make FileLayer from '/dev/mapper/vg-lv' with No such file or directory Using 'vdo remove --name vdo --force' resolves this. But there is no change to the vdodumpconfig error as Louis suggested to direct customer to this solution. Is it possible to maybe check if underlying device still exists when this error is triggered and if not, give the --force option suggestion? Stepping outside the boundaries of defined management practices one can create scenarios which are (barring bugs) not possible within those boundaries. For any such scenario we can know what "correct" (meaning "what we want") response should occur. This, though, is only because we know the totality of the specifically crafted scenario and the desired outcome. This is not to say that such scenarios are impossible in the "real world." Given human fallibility it is well within the realm of possibilities that an error (whether of oversight or deliberate action) can arise. These real world occurrences do not provide the complete view of the constructed scenarios. As a consequence, determinism as to the correct response is impossible to achieve. Consider the situation described in Jakub's comment of 2017-12-15. We know what the "correct" response is because the scenario was crafted to evoke that response. In the case of an user erroneously removing the logical volume that same response is incorrect. The user, hopefully being able to non-destructively reconstruct the logical volume's description, will want the vdo instance to remain. As far as is possible we should provide correct, precise information and advice to the user. Unfortunately, not all possible scenarios can be so handled and require human intervention. Corey, I'm assigning the bug to you because it's marked ON_QA and you reported it. If it should be assigned to someone else I would appreciate it if you would do so. Thanks. I'm moving this back to assigned for now as the move to modified appears to have been invalid w/o an actual fix for this issue. Please correct me if I'm wrong. I think the best bet here is to have devel close this bug as either WONTFIX or NOTABUG. Originally it was thought that editing the vdoconf file was the only way to remedy this situation, but then it was learned that a 'vdo remove --force' appears to work for these types of issues as well. If we come across a scenario in the future where the force doesn't work then we can reopen this bug. As agreed yesterday (2018-01-02) in #vdo we're marking this as NOTABUG. In any specific test scenario one should attempt 'vdo remove --force' for the particular vdo and if that fails open a new bug (reopen this one only if changing the description). We are not specifically including recommendation in the face of these scenarios to use 'vdo remove --force' as differentiating between a test scenario and a failure/mistake in the field is not possible. |