Virtual Machine Snapshots: The good, the bad, and the ugly side

Old No Comments on Virtual Machine Snapshots: The good, the bad, and the ugly side 659

What is a snapshot?

Let’s get started with a brief defining what a snapshot actually is. A snapshot is a capture of the currently active configuration, memory, storage system of the virtual machine. The configuration and memory are fairly straight forward, just copies of the configuration file and RAM of the virtual machine in that point in time, it is the storage system that can get confusing, and I am going to try to explain this further.

When the initial snapshot of the VM is created the hyper-visor will create a difference disk, these files are written too until the next snapshot is created or it is deleted, and the main virtual disk is essentially left in a read only state. These new difference disks do not contain copies of the data in the current virtual drive(s), they are essentially only the changes made to blocks of the disk are recorded. For every new snapshot, new difference disk(s) are created that are linked to the previous snapshots.

The good: What can we do with these?

Snapshots aid with things like online VM backups, and online VM cloning. For backups and cloning to happen properly the hyper-visor needs to create a snapshot of the VM to create a frozen state. In the case online VM backups, the backup program ( ex: Veeam ) communicates with the hyper-visor through API’s, grabs the full VM frozen image, if the first time, or it can grab only the changed blocks of the virtual hard drive ( only if changed block mode has been enabled for the disk ) and obtain the changed blocks since the last backup; this allows the VM to keep running while the backup is in progress, once the backup is done the snapshot is committed back to the previous snapshot or the virtual hard drive in the case of no previous snapshots.

Snapshots also allow system administrators to set a point to roll back to when testing new applications, services, or updates/upgrades to a VM. If there is a problem arises do to these testing deployments, with a few simple clicks the VM can be rolled back to the state prior to them.

As with most things, follow best practices, white papers, and RTM; usually if you do this you’ll be OK.

The bad: There’s a bad part???

Yes there are bad parts to snapshots, like everything you have to read the fine print, in this case RTM (read the manual). Each snapshot can technically grow to the same size as the original virtual hard drive, as the snapshot can represent the changes to every block of the virtual hard drive if needed; ex: 1 snapshot of a 100GB VM could mean up to approx 200GB of storage needed, 2 snapshots of 100GB VM could mean approx up to 300GB, 3 snapshots of 100GB approx up to 400GB… As you can see the possible storage can add up quickly.

To add to this the hyper-visor may at times have to have every snapshot disk difference file(s) and the original virtual hard drive file open when searching for files or writing to an old file that is not on the current snapshot disk difference file.

For example: You have a windows file server that stores a Quickbooks database. You took a snapshot at the beginning of the week, windows updates. Well it’s Friday and the accountant needs a backup copy of the Quickbooks data, say 20GB in size. You go in as admin, create a full backup and it takes forever, wait did it just freeze up!!! Why? Well during the week the book keeper was working in it, you know making invoices, creating your pay check, and was changing blocks on the database, but not all the blocks. So instead of the database file being on just the difference disk of the VM, it is on both the main virtual hard drive and the difference disk. So the hyper-visor is coordinating this all in the background which eats up IO and CPU cycles.

The ugly: What do you mean it gets worse???

When removing a snapshot and committing it back to the original virtual hard drive, the hyper-visor basically creates another snapshot to run on (usually this is all done in the background without the users knowledge), then the snapshots difference disk is committed to another snapshots difference disk, or in case there is only one snapshot left, it is then committed to the original virtual hard drive file. Once committed background snapshot is then committed back to the virtual hard drive file. The amount of time it takes to commit all snapshots back to the original hard drive file depends on how many block changes need to be applied to the virtual hard drive.

Plus don’t forget the I/O performance hits; the data from the snapshot(s) difference disks has to eventually be written to the main virtual hard drive of the VM during a commit; your reading data on one file in a directory, writing it all to another file, while a temp file is being written too, all in the same directory; you are going to have an I/O performance hit of some sort. This means all running VMs on that host will have some form of performance hit. Makes sense right?

Let’s look at an Exchange server for an example. Making a couple snapshots during the week doesn’t sound so bad right??? Well it is. Exchange isn’t just something that you have hours, yes I said hours, too shutdown to do a snapshot commit too. So you do it live, right? Yes… sort of, if you like coming into work early or staying late, or worse rushing into restore Exchange from a backup, well then it’s not a problem. By the way you do have a properly working battery backup before you started doing this, that can cover the commit time of the snapshot, right? Your sure you have no hardware failing; disks, power supply, RAM? Your server room has a working A/C-HVAC unit that keeps the room well under 77 deg F, right? You are using Veeam to do your backups, or something similar right? Well your working early, late, or both if you said yes to any of these.

VM virtual hard drive corruption can happen. Take hard drive corruption and add on complex layers of hypervisor datastores and subsystems. Doesn’t sound like a lot of fun does it.

So some guidelines:

1) RTM – read the manual, best practices, and follow them.
2) VM snapshots are not a backup, EVER! They are just one step in the backup & testing procedures.
3) Running on snapshots for an extended amount of time is a bad idea, don’t do it!
4) Hardware fails at the worst time; so never do rules 1 – 3.
5) Do not over commit your storage, remember this:
VM allocated storage X number of snapshots = approx theoretical total size of VM.

Author

Aaron Babitzke

I have recently went back to university, fall of 2013, to obtain my BSc in Computer Science. Prior to this I had spent the past 12 years in various positions in the IT field; IT consultant, Network Administrator, Network Technician, PC Repair Technician, and Wireless Internet Installer for WISPs. I have experience with various technologies; networking, mobile, systems administration, security, virtualization, on a broad array of operating systems; Android, Linux, Mac, Mac IOS, UNIX, and Windows. My client base has included WISPs, SMBs, charter schools (K - Jr High), and regular consumers.

Related Articles

Leave a comment

Back to Top