- Simplest approach: let VM take a single device or a partition
- Wastes resources if VMs don’t fully utilize their partitions
- Storage is virtualized by emulating multiple logical devices from a
single physical device
- For example, a VMDK (Virtual Machine Disk) file represents a virtual disk as seen by the VM
- Operations on the VMDK are translated into operations on the underlying storage device
- The stack:
- File system inside VM (ext4 on virtual disk)
- Virtual Disk inside VM
- File system on host (the VMDK file sits on this file system)
- Physical storage device on host
- Benefits
- You can overprovision your virtual disks. You can provide 5 1 TB virtual disks on top of a single 1 TB storage device. This works as long as the VMs don’t fully utilize their space.
- The VMDK starts out completely un-allocated.
- As the VM writes to a disk block, it becomes allocated on host storage device.
- You can snapshot a virtual disk simply by copying the VMDK file
- You can do de-duplication among multiple VMDKs since they are just files on the host
- All these layers introduce a number of performance problems
- Example: double journaling. Journaling inside the VM, and on the VMDK
file on the host
- If you are updating an inode in the guest file system, it is journaled (2X IO)
- If the host file system also uses journaling, the metadata of VMDK is also journaled (3X IO)
- Example: double journaling. Journaling inside the VM, and on the VMDK
file on the host
- File systems make assumptions about the storage device
- If not true, optimizations actually reduce performance
- The combination of the file system on the guest and the file system on
the host is really important
- The wrong combination can reduce throughput to 67% of max (reiserfs on ext2)
- When ext2 runs on top of ext3, throughput reduced by 10%
- When ext3 runs on top of ext3, throughput reduced by 40%
- For read only workloads, stacking file systems actually helps
- Why? Read-ahead issued by the host file system
- Ideally, host should not have smarts: use act as a simple on-demand allocator for guest file system
- Security depends upon a number of manual actions such as patching a machine
- Hard to do in VMs because of how many VMs there might be in an organization, and how easy it is to spin up new VMs
- Hard to understand state of the network
- VMs appearing and disappearing all the time
- You might think you have fixed all machines, but a vulnerable VM could simply be suspended
- What happens if you checkpoint VM and roll
back to an earlier state?
- The older state may not have security patches applied
- Random number generation in VMs may not be “fresh”
- Arbitrary time between generation and use
- Random numbers should be obtained from VMM instead of VM
- Diversity: a number of VMs may have OSes at different update
points
- This is hard for admin, who usually try to keep all machines updated and patched to the same point
- Identity: typically a real machine is identified by its MAC address
- What to do for VMs?
- In a non-virtualized setting, OS trusts the hardware
- In virtualized settings, VMs trust the VMM. However, is the VMM as trust-worthy as the hardware?
- Solution: Trusted Platform Module
- Can attest to integrity of software components
- Outside the CPU
- VMs introduce “introspection” capabilities, the ability to monitor at a fine-grained level what is going on inside the VM
- Time-based limited trials can be broken
- Encryption keys in software can be read
- Attack: VMM Rootkits.
- Transparently inserting a VMM under and OS
- Example from survey paper
- “Consider this example: Company A has a virtual server in an outsourced datacenter that undertakes financial transactions. Depending upon the contract with the data- center, it is likely that the datacenter does not have permission to view or alter any transactions undertaken (based on least need-to-know principles). However, because Company A does not control the underlying VMM, it has no way to ensure that the VMM has not altered transaction details or recorded credentials, a potential problem in many ways, as any local audit trails can be similarly compromised."
- Great paper on cloud security: Hey, You, Get Off my Cloud
- “Using the Amazon EC2 service as a case study, we show that it is possible to map the internal cloud infrastructure, identify where a particular target VM is likely to reside, and then instantiate new VMs until one is placed co-resident with the target. We explore how such placement can then be used to mount cross-VM side-channel attacks to extract information from a target VM on the same machine.”
- Can one determine where in the cloud infrastructure an instance is located? (Section 5)
- Can one easily determine if two instances are co-resident on the same physical machine? (Section 6)
- Can an adversary launch instances that will be co-resident with other user’s instances? (Section 7)
- Can an adversary exploit cross-VM information leakage once co-resident? (Section 8)
- Answer to all the above questions is yes!