Resuming Corrupted Suspended Guests

My recent post on fixing terminator was months in the making, partly because I lost the environment I was using to double-check all the steps and partly because other things just kept coming up. Several months back when I found the original fix it was only available as a patch in bugzilla. I’d constructed a whole article about how to get the patch, update the spec file, build a new rpm, etc. but in the midst of this I lost the virtual machine I was using twice to a suspend that would not resume. I also ran into some problems using yum-downloader to get the source rpm I could not reliably reproduce or document.

Time went by and an updated source rpm was available to fix vte so it seemed silly to document all the tedious steps to patch the spec file and rebuild when that work was already done, but then I was thwarted again when I rebooted my laptop and forgot I had a guest running.

It was a frustrating situation from the virt-manager GUI and the command line–my only options were resume (which didn’t work because of the error message below) and shut-down (which did the opposite of what I wanted the virtual machine to do).

On Google I found scant references to this situation so I turned to a company-wide mailing list at work where anyone can post technical questions. I had the solution in 30 minutes!

This was the cryptic error message I received in virt-manager when trying to resume the guest:

Error restoring domain: Unable to read from monitor: Connection reset by peer

Traceback (most recent call last):
File "/usr/share/virt-manager/virtManager/asyncjob.py", line 44, in cb_wrapper
callback(asyncjob, *args, **kwargs)
File "/usr/share/virt-manager/virtManager/asyncjob.py", line 65, in tmpcb
callback(*args, **kwargs)
File "/usr/share/virt-manager/virtManager/domain.py", line 1050, in startup
self._backend.create()
File "/usr/lib64/python2.6/site-packages/libvirt.py", line 510, in create
if ret == -1: raise libvirtError ('virDomainCreate() failed', dom=self)
libvirtError: Unable to read from monitor: Connection reset by peer

The error message made no sense to me--I can’t resume the domain because the monitor cannot be read? How about, “Unable to resume domain. If this persists try ‘virsh mangedsave-remove’ from a command line to remove the suspended session and reboot your machine.” Including the word “suspend” in the command instead of “managedsave” would also be more intuitive.

Here is the command to remove a corrupted suspended virtual machine session so you can boot your machine again–naturally you’ll lose the suspended session you had:

$ su -c 'virsh managedsave-remove NameOfDomain'

I’ve also confirmed on RHEL 6.2 (beta) that rebooting the hypervisor with a running guests, smoothly suspends and resumes them again when the hypervisor returns.

Feeling stuck?

Want clarity or help moving something forward?

Let me help

John Poelstra

Author archive

October 19, 2011

Software & Technology

config, corrupt, Fedora, file, guest, kvm, RHEL, snippet, suspend

67 Comments

Add yours

Gary Halpin
April 21, 2021 at 12:47 am

Reply

Life saver, thanks

Reply
Brian M Smith
January 7, 2019 at 6:36 am

Reply

Good command line, excellent work.

Reply
Vijay
February 24, 2018 at 10:43 am

Reply

Thanks John..I am able to start my domain 🙂

Reply
Manish Kelkar
October 22, 2016 at 8:49 pm

Reply

Very helpful ! Thanks for this post! Managed to start corrupted domains.

Reply
kish
May 19, 2016 at 8:55 pm

Reply

wow 🙂
awesome stuff ..worked like a charm

Reply
john robert rojas
May 13, 2015 at 5:56 am

Reply

Mil gracias

Me salvaste la patria

Reply
Jun Li
April 1, 2015 at 12:05 am

Reply

Thanks a lot ! So glad that the VMs can be working again

Reply
Leonardo Kenji
March 26, 2015 at 7:50 am

Reply

thanks man, really useful stuff

Reply
William Fragakis
November 15, 2014 at 4:36 pm

Reply

At the risk of being just another “me, too”, many thanks. It’s been a long and frustrating day but this post let me eat dinner a happy man.

Reply
- John Poelstra
  November 15, 2014 at 6:29 pm
  
  Reply
  
  “Me toos” are welcome any time! Glad it helped and sorry that confusing error message is still out there. I filed a bug a while ago.
  
  Reply
Reggie
March 26, 2014 at 2:14 pm

Reply

Thanks John this just worked for me on my Centos server.
I have been getting this for months and couldn’t resolve it.
Once again thanks.

Reply
- John Poelstra
  March 26, 2014 at 2:18 pm
  
  Reply
  
  Thanks for the feedback Reggie and I’m glad this post helped solve your problem!
  
  Reply
Steve Dowe
August 6, 2013 at 5:58 am

Reply

John,

Thanks for taking the time to blog this. Exactly what I was looking for (running a XP VM in CentOS 6). Kudos and regards to the RH team 🙂

Steve

Reply
Blake McBride
August 5, 2013 at 11:22 am

Reply

I am using the latest LinuxMint. I had a VM working fine but then it stopped working (won’t start anymore) for no aparent reason. I have not fiddled with anything. I only used default settings that came with the Virtual Machine Manager. I have 8GB on the host, and 2GB for the VM. There are no suspended images. I am attempting a clean startup. What I get is:

Error starting domain: Unable to read from monitor: Connection reset by peer

Traceback (most recent call last):
File “/usr/share/virt-manager/virtManager/asyncjob.py”, line 96, in cb_wrapper
callback(asyncjob, *args, **kwargs)
File “/usr/share/virt-manager/virtManager/asyncjob.py”, line 117, in tmpcb
callback(*args, **kwargs)
File “/usr/share/virt-manager/virtManager/domain.py”, line 1092, in startup
self._backend.create()
File “/usr/lib/python2.7/dist-packages/libvirt.py”, line 681, in create
if ret == -1: raise libvirtError (‘virDomainCreate() failed’, dom=self)
libvirtError: Unable to read from monitor: Connection reset by peer

Any help would be greatly appreciated.

Blake McBride

Reply
- KajMagnus
  May 12, 2014 at 12:39 am
  
  Reply
  
  This happened to me too, moments ago, and the solution in my case was to remove a certain Filesystem Passthrough hardware “device” that I had added, but the path that was shared on the host was no longer available, apparently causing this weird error message.
  
  Reply
  - Héctor Bernal
    October 27, 2014 at 2:58 am
    
    Reply
    
    Thanks for the hint, my problem was similar to what you describe. But in my case it was because virsh didn’t accepted the qcow2 format of the attached device.
    
    Reply
Ibra
August 1, 2013 at 9:18 pm

Reply

is this problem related to img file of OS..??

Reply
- John Poelstra
  August 2, 2013 at 7:38 am
  
  Reply
  
  Sorry, I don’t know.
  
  Reply
David Ramirez
August 1, 2013 at 3:08 pm

Reply

This just worked for me on a CentOS 6.4-based environment. I rebooted the server, with all the KVM instances going into suspended mode. When coming up, one VM showed this behavior. *Luckily* of all the 13 VM instances, this one was the less critical… Command given from inside the virsh shell. All OK:

virsh # managedsave-remove ORION
Removed managedsave image for domain ORION

Thanks

Reply
Ibra
July 31, 2013 at 10:16 pm

Reply

Im installing a feather linux (a lightweight OS).. because I have to run many VMs on Xen hypervisor so i used virt-manager to install OS…But after first it is installed , second time when i started it after shutting it down, it says no bootable device.

Reply
Ibra
July 31, 2013 at 2:13 am

Reply

I have installed VM on xen hypervisor using virt-manager. when i start the vm after shutting it down , it gives me an error saying “no bootable device find”. what am i missing ? is there something related to its setting??

Reply
- John Poelstra
  July 31, 2013 at 9:49 pm
  
  Reply
  
  Are you booting a LiveCD? If so, that will only work once. After that it expects to boot from a regular device. I’ve always thought this was a really weird and unfinished part of virt-manager.
  
  Reply
  - Ibra
    July 31, 2013 at 9:52 pm
    
    Reply
    
    I have given an iso file in virt-manager. the OS is installed but it isnt booted when i start the VM next time…The boot priority order is Hard disk…
    
    Reply
    - Ibra
      July 31, 2013 at 9:56 pm
      
      Reply
      
      and one thing Im not booting from Live CD
      
      Reply
      - John Poelstra
        July 31, 2013 at 10:13 pm
        
        Very strange. I guess I would try the whole process again. What OS are you installing? Are you setting a bootable partition with the installer? If it’s Fedora 18 or 19 the installer is very confusing.
        
        Probably not related, but I recently saw a problem where Fedora 18 installed, but would not boot on two different brand new Lenovo T430s. Fedora 19 did.
Ryan
July 29, 2013 at 9:45 am

Reply

This post really pulled me out of the fire. I’m in involved in validation for server hardware at a major company and my admin stations are in VMs on a RHEL workstation. I had my host power cycle due to a fat finger on my part and when it came back up one of my main admin VMs was broken. Being behind as it is I certainly didn’t’ have time to rebuild the VM, this post played a role in making sure this project didn’t break down. Thanks!

Reply
- John Poelstra
  July 31, 2013 at 9:49 pm
  
  Reply
  
  Love to hear it! Well, not that you had problems, but that “my pain was your gain.” Hope the rest of your project goes well.
  
  Reply
brando
July 16, 2013 at 5:21 pm

Reply

thanks a lot! it works for me. thanks for google too.

Reply
antman
June 18, 2013 at 1:32 pm

Reply

Correction did not work for me. I generated this error by accidently checking read only on IDE disk 1 in details.

Reply
D. Kniep
April 9, 2013 at 8:46 am

Reply

This same message occurs if one overallocates memory to the VM. I got this message and it took a while before I realized that the issue was not a suspended machine, but overallocation of memory!!

Reply
John Poelstra
April 7, 2013 at 5:44 pm

Reply

I filed a bug for this issue because the comments continue to come in on this blog post–is there anyone out there that that has a reliable reproducer to get into the error state?

https://bugzilla.redhat.com/show_bug.cgi?id=892007

Reply
Rich Megginson
February 27, 2013 at 10:05 am

Reply

This saved my work!!! Thanks!!!

Reply
Brenton Leanhardt
February 20, 2013 at 8:33 am

Reply

Seriously, thanks for posting this. Hit the error, googled, found this and had my problem fixed in under a minute. 🙂

Reply
Sebastian
February 15, 2013 at 8:51 am

Reply

Besides, i have also this error message

Error restoring domain: operation failed: failed to read qemu header

Traceback (most recent call last):
File “/usr/share/virt-manager/virtManager/asyncjob.py”, line 44, in cb_wrapper
callback(asyncjob, *args, **kwargs)
File “/usr/share/virt-manager/virtManager/asyncjob.py”, line 65, in tmpcb
callback(*args, **kwargs)
File “/usr/share/virt-manager/virtManager/domain.py”, line 1050, in startup
self._backend.create()
File “/usr/lib64/python2.6/site-packages/libvirt.py”, line 511, in create
if ret == -1: raise libvirtError (‘virDomainCreate() failed’, dom=self)
libvirtError: operation failed: failed to read qemu header

Reply
Sebastian
February 15, 2013 at 8:50 am

Reply

I have a problem, when i run the command it says
error: unknown command: ‘managedsave-remove’
and its appear the same problem when i run inside virsh
or outside.

Reply
tomas
January 21, 2013 at 3:55 pm

Reply

Thank you for this information. It saved me a lot of time.

Reply
Linuxito
January 11, 2013 at 5:57 am

Reply

Thanks for sharing!!!

Reply
Sudheer
January 4, 2013 at 10:46 am

Reply

Awesome post. Helped resolve by long lasting VM issue!

Reply
tianchen
November 8, 2012 at 12:42 am

Reply

I hit the same issue,but can not solve it with this solution.Anyone can help me?
virsh start test001
error: Failed to start domain test001
error: Unable to read from monitor: Connection reset by peer
virsh managedsave-remove test001
Domain test001 has no manage save image; removal skipped

Reply
- James Penick
  November 19, 2012 at 1:30 pm
  
  Reply
  
  Tianchen,
  Make sure you have enough ram to allocate another instance.
  
  Reply
Daniel
November 1, 2012 at 12:51 pm

Reply

Another heartfelt thank you!

Reply
Joseph Price
October 17, 2012 at 7:59 am

Reply

I had an instance in the ‘Shutoff’ state according to virt-manager. it wouldn’t start, giving the same error. (This is after a reboot of the host)

I had the exact same exception trace as you though and your answer also solved my issue! Thanks!

FTR, the only output from virsh was:
virsh # start NameOfInstance
error: Failed to start domain NameOfInstance
error: Unable to read from monitor: Connection reset by peer

Reply
- John Poelstra
  October 17, 2012 at 8:55 am
  
  Reply
  
  Thanks for your feedback Joseph. Glad this post helped solved your problem and hopefully your comment will help other people too 🙂
  
  Reply
John
August 21, 2012 at 2:14 pm

Reply

Thank you. This solved the issue of my guest not starting up but instead qemu spitting out these lines in /var/log/libvirt/qemu/my.domain.log:

qemu: warning: error while loading state section id 3
load of migration failed

Reply
TJ
July 23, 2012 at 8:44 am

Reply

Thought I had this problem because I got the same errors when rebooting the host. Guests would not resume. Could only clear saved state and boot a new instance of each guest.

Turns out my images were stored someplace other than the default location (/var/lib/libvirt/images). Once I updated the pool location, the resume on host reboot worked as expected.

Reply
- TJ
  July 24, 2012 at 6:07 am
  
  Reply
  
  Sorry, I posted this too quickly. Changing the storage location did not solve the problem.
  When rebooting the host, guests are paused, but the will not resume after host reboot complete.
  I tried moving the guests to another host with same OS, and do not see this problem. Therefore I think it has something to do with the system, but have not figured out what that is yet.
  
  Reply
swap
July 19, 2012 at 1:00 pm

Reply

Thanks a lot u .Hats off ,john u rock

Reply
Joshua Hoblitt
July 19, 2012 at 10:49 am

Reply

Just hit this problem on SL 6.2 running kernel 2.6.32-220.23.1.el6.x86_64 and libvirt-0.9.10-21.el6.x86_64. Is this corruption on shutdown issue a problem with the guest OS, or kvm/qemu/kernel of the host?

Reply
- John Poelstra
  July 20, 2012 at 12:29 am
  
  Reply
  
  It appears to be a “corruption on suspend” problem in the sense that you can’t resume the guest. I have no idea what causes it. I’m running latest RHEL 6.3 packages and have not seen the problem again for a long time. I’m also not sure what causes it. Sorry I couldn’t be of more help.
  
  Reply
Cocolocko
July 15, 2012 at 4:10 pm

Reply

Thank you very much, for sharing this INFO.
After Updating CentOS 6.2 to 6.3 the Guests not automatically starting up again. But with this nice virsh command everything was quickly solved. Greetings.

Reply
erica
July 14, 2012 at 9:54 pm

Reply

er I meant I was glad to find the article.. and chalk up my not knowing it needed more info to me being less experienced. Yup.. i need caffeine..

Reply
erica
July 14, 2012 at 9:53 pm

Reply

no problem. I was glad to find it. Chalk it up to me being less experienced too 😉

Reply
erica
July 14, 2012 at 9:45 pm

Reply

Thanks so much Bruce! I will give that a whirl when I am a bit more caffeinated and less yawning!

Reply
- John Poelstra
  July 14, 2012 at 9:49 pm
  
  Reply
  
  Sorry about that Erica. My example was misleading (I’ve updated it) and Bruce has it right, you have to supply the domain name want to manage.
  
  Reply
Bruce
July 14, 2012 at 4:37 pm

Reply

erica
I ran into the same problem. I could reboot the host without shutting down the vms on Centos 6.2 so I figured I could do it on 6.3…..not.
The Option it is talking about is the name of the virtual machine.
I accomplished this by running the following commands.
[root@gwdt ~]# virsh
Welcome to virsh, the virtualization interactive terminal.

Type: ‘help’ for help with commands
‘quit’ to quit

virsh # managedsave-remove win7
Removed managedsave image for domain win7
virsh #
***** win7 is the name of the virtual that would not resume.

Reply
Kamlesh Verma
July 13, 2012 at 10:16 pm

Reply

Thank you very much !! You saved my 3 VM

Reply
erica
July 13, 2012 at 7:17 am

Reply

I should have mentioned RHEL 6.3

Reply
erica
July 13, 2012 at 7:10 am

Reply

Hi, I ran into this problem with a RHEL machine at work. Thank you for the fix.
I did run into an issue trying to run the fix:
error: command ‘managedsave-remove’ requires option

I am still researching it. Will let you know if I find it. Thanks!

Reply
info@southcomputers.com
July 11, 2012 at 7:53 pm

Reply

Thank you! Panic mode over.

Reply
Jeff Williams
July 1, 2012 at 11:09 am

Reply

Ran into this same issue as well while setting up Spacewalk. Very glad I don’t have to start from a previous snapshot or from scratch! Nice find!

Reply
Rajesh Sababhathi
June 5, 2012 at 12:52 pm

Reply

Thanks John.!
It solved my issue on 2 different scenarios. It happened when I reset my KVM hosts (gracefully) but after reboot the VM’s doesn’t come up. My questioned is if you say its corrupted how could we possible retrieve the corrupted VM ? so it not the data corruption ?

Reply
- John Poelstra
  June 5, 2012 at 4:37 pm
  
  Reply
  
  Excellent! I’m glad to hear I’ve saved other people time and pain. 🙂
  
  My (limited technical) understanding is that that saved (suspended) session information is corrupted, but the VM’s main disk and machine configuration is not–kind of like if you suspend a notebook and the suspended session doesn’t resume. After a hard reset the notebook boots normally again but that session is lost.
  
  Reply
Marcel Kraan
June 4, 2012 at 12:17 am

Reply

Thanks this was my problem.
you fixed it and iam very happy now 🙂

Reply
coy
April 12, 2012 at 6:58 pm

Reply

Thank you! Good thing I landed here searching for a fix.

Reply
bfinal
January 28, 2012 at 10:04 am

Reply

I was able to solve this issue within minutes thanks to your post 😉

Reply
- John Poelstra
  January 30, 2012 at 7:19 am
  
  Reply
  
  I’m glad! That’s why I like to post stuff like this, so other people don’t have to experience as much pain and frustration as I did.
  
  Reply
Sam
November 28, 2011 at 5:49 am

Reply

THANK YOU. Busy setting up a demo for an exhibition and had this error – last thing you want to see when you have 2 hours left to set up!!

Reply