Virtual Data Optimizer (VDO) is a block virtualization technology that provides transparent deduplication of data. By eliminating redundant chunks of data, VDO can greatly reduce actual used disk capacity. The CentOS implementation of VDO is quite good, but there are some caveats to be aware of, especially when you want filesystems on VDO to come up automatically at boot. If you do it wrong, your system will not boot! So make sure to read all the way to the end to learn how to avoid ending up in this situation!
VDO consists of two kernel modules and two commands:
- kvdo – This module loads into the Device Manager layer and provides a block storage volume for deduplication.
- uds – This module is responsible for communication with the Universal Deduplication Index on the VDO disk.
- vdo – This command is used to create, remove, start, and stop VDO volumes, as well as performing other configuration changes.
- vdostats – This command is used to report on various aspects of VDO volumes, including effective reduction and physical volume utilization. Think of this as ‘df’ for VDO capacity.
Step 1: Install VDO
The first thing to do is to install the VDO kernel modules, commands, and dependencies.
# yum -y install vdo ... Installed: vdo.x86_64 0:22.214.171.124-3.el7 Dependency Installed: PyYAML.x86_64 0:3.10-11.el7 kmod-kvdo.x86_64 0:126.96.36.199-5.el7 libyaml.x86_64 0:0.1.4-11.el7_0 Complete!
Note that installing VDO also installed several dependencies, namely PyYAML, kmod-kvdo, and libyaml; the YAML packages are required for VDO because the VDO configuration file is written in YAML.
Step 2: Create a VDO Device
Make sure that you have a spare disk – or at least a partition – available for use by VDO. Although it is possible to create a VDO volume on top of an LVM2 volume, you will almost certainly have boot order problems when you reboot your server. Also, one of the great benefits of VDO is that it will deduplicate data across filesystems if they are Logical Volumes on top of an LVM2 Volume Group, which is what will be demonstrated below.
In our demonstration environment, we have a 40GB spare disk called /dev/sdb:
# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 15G 0 disk ├─sda1 8:1 0 1G 0 part /boot └─sda2 8:2 0 14G 0 part ├─centos-root 253:0 0 12.5G 0 lvm / └─centos-swap 253:1 0 1.5G 0 lvm [SWAP] sdb 8:16 0 40G 0 disk
Next, we create the empty VDO volume on top of /dev/sdb:
# vdo create --name=vdolvm --device=/dev/sdb --vdoLogicalSize=120G --writePolicy=async Creating VDO vdolvm Starting VDO vdolvm Starting compression on VDO vdolvm VDO instance 0 volume is ready at /dev/mapper/vdolvm
Here is a breakdown of the various options above:
- create – As it sounds, this is telling the VDO command what operation we want to do. You would use “remove” to remove the VDO volume.
- name=vdolvm – This option tells VDO that the name we want to give to our volume is “vdolvm”. This can, of course, be any name you want to give it.
- device=/dev/sdb – This indicates on which underlying device we want to create the VDO volume.
- vdoLogicalSize=120G – Here we are telling VDO that the effective capacity we want to expose to the OS is 120GB. Remember from above that our physical device is only 40GB, so we are assuming that we will get at least a 3:1 reduction from deduplication. For most data, this is pretty conservative, but if your data does not deduplicate well, then your ratio should be different. Log files and other plain text files will generally deduplicate very well, and you may get 10:1 or even higher deduplication rates. But binary files, and especially pre-compressed data such as video, audio, or compressed archives, will get far less than 3:1 or even 1:1 in some cases! Do not use VDO for this type of data.
- writePolicy=async – This indicates that writes should be sent to the physical device asynchronously. This will improve performance but may risk data loss under certain circumstances. Here is a brief description of the possible write policies for VDO:
- sync – Writes to the VDO volume are only acknowledged after data is written to the physical device. If you are using this option, make sure your back-end storage is also synchronous, otherwise, you will lose the benefits of synchronous writes.
- async – Writes are acknowledged after the data has been written to the cache. If the cache is not flushed prior to a device or power failure, you may experience data loss.
- auto – In this default mode, VDO will inspect the storage device and determine if it supports flushing. If so, VDO will use async mode. If not, it will use sync mode.
Step 3: Investigate the New VDO Volume
As we saw in the output of the previous step, VDO has created a new Device Mapper device called /dev/mapper/vdolvm. When we create our volume group, this is the device we will use.
# ls -l /dev/mapper/vdolvm lrwxrwxrwx. 1 root root 7 Dec 17 13:56 /dev/mapper/vdolvm -> ../dm-2
Let’s see what kind of information we can get about the new volume with vdostats!
# vdostats --hu Device Size Used Available Use% Space saving% /dev/mapper/vdolvm 40.0G 4.0G 36.0G 10% N/A
The –hu flag passed to vdostats is shorthand for “–human-readable” and presents the data in a format that is a bit easier to read. From this output, we can see the Device Mapper name of the device, the size of the back-end storage device, how much data is used, how much capacity is available, and the percentage of space deduplication is saving us.
At this time, since we have not written any data to the volume, the “Space saving%” field is “N/A”. When we write some data later, you will see more helpful information there.
But wait! We haven’t written any data yet, but there is already 4GB, or 10%, of the volume in use! This is because the Universal Deduplication Index has already been written to disk. This is basically a database that keeps a record of slab fingerprints and their locations. This is what makes deduplication possible. You can see, then, that using VDO either on small back-end disks or with data that does not get at least 10% deduplication will actually be less efficient than using that storage as a regular volume.
The vdostats command also has a –verbose option that gives us a lot of information about our VDO volume. It is not really practical to show that output in its entirety here, but we should at least look at a few fields with this command:
# vdostats --verbose /dev/mapper/vdolvm | grep -B6 'saving percent' physical blocks : 10485760 logical blocks : 31457280 1K-blocks : 41943040 1K-blocks used : 4219792 1K-blocks available : 37723248 used percent : 10 saving percent : N/A
Here we see the same basic data we got from vdostats, but in a different format.
Step 4: Use the VDO Volume as a Normal Disk Device
Now that we have our VDO device created, we can partition it and put a filesystem on the partition, or as we will do in this demonstration, put an LVM2 Volume Group on top of it. The specifics of creating a Volume Group are out of scope for this demonstration, so we will move quickly through to the parts that are more pertinent.
# pvcreate /dev/mapper/vdolvm Physical volume "/dev/mapper/vdolvm" successfully created. # vgcreate vdovg /dev/mapper/vdolvm Volume group "vdovg" successfully created # vgdisplay vdovg --- Volume group --- VG Name vdovg System ID Format lvm2 Metadata Areas 1 Metadata Sequence No 1 VG Access read/write VG Status resizable MAX LV 0 Cur LV 0 Open LV 0 Max PV 0 Cur PV 1 Act PV 1 VG Size <120.00 GiB PE Size 4.00 MiB Total PE 30719 Alloc PE / Size 0 / 0 Free PE / Size 30719 / <120.00 GiB VG UUID RlWiOw-5eZ2-lOsc-FSS3-c8xp-jvFh-m22o7p
As you can see above, LVM2 thinks that our underlying disk is 120GB, even though we know it is only 40GB large. Since LVM2 has no idea what the size of the VDO back-end disk is, it is currently up to the system administrator to manage the disk capacity and ensure that the back-end disk does not fill up. In the event that the back-end disk does fill up with unique data, the LVM2 Logical Volumes will go offline.
Now let’s create three equally-sized Logical Volumes:
# lvcreate -n vdolv01 -L 35G vdovg Logical volume "vdolv01" created. # lvcreate -n vdolv02 -L 35G vdovg Logical volume "vdolv02" created. # lvcreate -n vdolv03 -L 35G vdovg Logical volume "vdolv03" created.
Throughout the LVM2 Volume Group and Logical Volume creation, there is nothing at all that is different than using an actual 120GB device. You can even put a Thin Provision (ThP) pool and volumes on a VDO device, but we will not discuss that in this demonstration.
Step 5: Create and Mount Filesystems
Normally, when a filesystem is created, it runs a trim operation on the device. When using VDO, this is not ideal since the disk capacity is allocated on-demand. So we want to tell mkfs to not discard blocks during filesystem creation. For XFS, use the -K option, and for EXT4, use “-E nodiscard”. In our demo, we will use XFS.
# mkfs.xfs -K /dev/vdovg/vdolv01 meta-data=/dev/vdovg/vdolv01 isize=512 agcount=4, agsize=2293760 blks = sectsz=4096 attr=2, projid32bit=1 = crc=1 finobt=0, sparse=0 data = bsize=4096 blocks=9175040, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=1 log =internal log bsize=4096 blocks=4480, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 # mkfs.xfs -K /dev/vdovg/vdolv02 ... # mkfs.xfs -K /dev/vdovg/vdolv03
When we mount the new filesystems to their mount points, we want to tell XFS to discard blocks, since this will greatly speed up file deletion.
# mount -o discard /dev/vdovg/vdolv01 /data/01 # mount -o discard /dev/vdovg/vdolv02 /data/02 # mount -o discard /dev/vdovg/vdolv03 /data/03
And now that we have written just a tiny bit of filesystem data to the devices, we can inspect the VDO volume again to see if things have changed.
# vdostats --hu Device Size Used Available Use% Space saving% /dev/mapper/vdolvm 40.0G 4.0G 36.0G 10% 98%
And sure enough, we can see now that our “Space saving%” has shot up to 98%! That’s a good start, but remember that all we have done so far is to write a tiny bit of filesystem metadata three times – one for each Logical Volume. That data is identical, so it deduplicates very well, driving up our savings.
Step 6: Put Some Data on the Filesystems and Inspect the VDO Volume
Now let’s put some data on the VDO volume and see what happens. Since this is a demonstration of deduplication technology, we will intentionally be using multiple copies of the same file to ensure that there is redundant data to deduplicate. Let’s use a copy of our favourite Linux distro, CentOS-7-x86_64-DVD-1810.iso!
# cp /root/CentOS-7-x86_64-DVD-1810.iso /data/01/ # df -h | grep data /dev/mapper/vdovg-vdolv01 35G 4.4G 31G 13% /data/01 /dev/mapper/vdovg-vdolv02 35G 33M 35G 1% /data/02 /dev/mapper/vdovg-vdolv03 35G 33M 35G 1% /data/03
Not surprisingly, the ‘df’ output shows that the first file system is now using 4.4GB of space. But what about vdostats?
# vdostats –hu Device Size Used Available Use% Space saving% /dev/mapper/vdolvm 40.0G 8.2G 31.8G 20% 3%
Now we see that the VDO back-end disk is using 8.2GB to store a file that is about 4.3GB large. That’s not such a great return for all of this work! But remember from before that VDO started with 10%, or 4GB, overhead, and now we have added another 4.3GB. This tells us that VDO is working since the 4.3GB file is only using 4.2GB of space!
Let’s copy more redundant data and see how things change. To prove that VDO works across filesystems that are in the same Volume Group, we will copy the same file to /data/02.
# cp /root/CentOS-7-x86_64-DVD-1810.iso /data/02/ # df -h | grep data /dev/mapper/vdovg-vdolv01 35G 4.4G 31G 13% /data/01 /dev/mapper/vdovg-vdolv02 35G 4.4G 31G 13% /data/02 /dev/mapper/vdovg-vdolv03 35G 33M 35G 1% /data/03 # vdostats –hu Device Size Used Available Use% Space saving% /dev/mapper/vdolvm 40.0G 8.2G 31.8G 20% 51%
That’s better! Now we can see from ‘df’ that two filesystems are both storing 4.3GB each, but the amount of space used on the VDO back-end disk has stayed at 8.2GB. All of this work is starting to pay off! Let’s do it again!
# cp /root/CentOS-7-x86_64-DVD-1810.iso /data/03/ # df -h | grep data /dev/mapper/vdovg-vdolv01 35G 4.4G 31G 13% /data/01 /dev/mapper/vdovg-vdolv02 35G 4.4G 31G 13% /data/02 /dev/mapper/vdovg-vdolv03 35G 4.4G 31G 13% /data/03 # vdostats --hu Device Size Used Available Use% Space saving% /dev/mapper/vdolvm 40.0G 8.2G 31.8G 20% 67%
With the same data now on three different filesystems on the same VDO volume, we can see that we are storing about 13GB of data across three filesystems, but only using 8.2GB. This gives us effective space savings of 67%.
But what happens when we delete the data? Let’s try!
# rm -f /data/*/CentOS-7-x86_64-DVD-1810.iso # df -h | grep data /dev/mapper/vdovg-vdolv01 35G 33M 35G 1% /data/01 /dev/mapper/vdovg-vdolv02 35G 33M 35G 1% /data/02 /dev/mapper/vdovg-vdolv03 35G 33M 35G 1% /data/03 # vdostats –hu Device Size Used Available Use% Space saving% /dev/mapper/vdolvm 40.0G 8.2G 31.8G 20% 64%
Here we can see that deleting the data from the filesystem did not remove it from the back-end disk, so we are still using 8.2GB of storage to store about 4GB of VDO metadata. When we deleted the files from the filesystem, VDO only deleted the pointers to the deduplicated blocks, which are still on the back-end storage. If we were to copy the same ISO file again, our used capacity in vdostats would not increase. But, if we copied a different file that had nothing in common with the original ISO, then we could end up storing even more data for that new data.
Clearly, this is not ideal.
To reclaim the capacity that has been orphaned by deleting the files, we use the command fstrim. The basic syntax is just ‘fstrim <mountpoint>’, but since we are dealing with multiple mount points and fstrim won’t allow us to use wildcards, we run it through a little script:
# for i in `ls /data/`; do > echo "Trimming /data/$i" > fstrim /data/$i > done Trimming /data/01 Trimming /data/02 Trimming /data/03
And now we see that our capacity on the VDO back-end volume has been reclaimed.
# vdostats –hu Device Size Used Available Use% Space saving% /dev/mapper/vdolvm 40.0G 4.0G 36.0G 10% 98%
Since it’s no fun to manually monitor vdostats and run fstrim, and since fstrim takes a while to run on each filesystem, this is the kind of thing you want to throw into cron or an external scheduler to run once a day or so, depending on how often files are deleted.
Step 7: Fill Up the File systems!
Why would we do this? We already know that copying the same file over and over again will not use more space, right? Our reasons are two-fold: 1) to prove that hypothesis true or false, and 2) because taking things to the extreme is more fun!
After using a quick little script to copy the same data a bunch of times, we can see the filesystems are quite nearly full.
# df -h | grep data /dev/mapper/vdovg-vdolv01 35G 35G 783M 98% /data/01 /dev/mapper/vdovg-vdolv02 35G 35G 783M 98% /data/02 /dev/mapper/vdovg-vdolv03 35G 35G 783M 98% /data/03
But what about the VDO back-end disk?
# vdostats –hu Device Size Used Available Use% Space saving% /dev/mapper/vdolvm 40.0G 8.3G 31.7G 20% 95%
We see that we are still storing over 100GB of data using a mere 8.3GB of actual disk space on a 40GB disk. Why 8.3GB instead of 8.2GB like when we were storing only three copies of the data? The UDI is a metadata database, so the more data that is stored on the filesystems, the more chunks, or slabs, of data need to be managed. This is the cause of the increase of around 100MB on the VDO back-end disk.
Step 8: The Boring (But IMPORTANT!) Stuff
As we said at the beginning, if you follow the guide above, you will have a nice new VDO volume working… until you reboot! Assuming you have put the filesystems built on the VDO volume into your /etc/fstab as you would a normal volume, you will be saddened to realize that when you reboot, you are left at the emergency mode prompt.
(NOTE: If you skipped this part before and find yourself at the emergency prompt, simply comment out the entries for the VDO filesystems in /etc/fstab and then reboot.)
The problem with a normal /etc/fstab entry is that the filesystems try to mount before the vdo.service has started. There are multiple ways around this, including adding ‘x-systemd.requires=vdo.service’ to the mount options in /etc/fstab. However, those options do not help with shutdown order, so you will likely experience hangs and maybe even unclean filesystems when rebooting. Therefore, we will create a systemd mount for the VDO based filesystems, which provides a better mechanism for boot and shutdown order.
# vi /etc/systemd/system/data-01.mount # cat /etc/systemd/system/data-01.mount [Unit] Description = Mount VDO file system on /data/01 Requires = vdo.service systemd-remount-fs.service After = vdo.service multi-user.target Conflicts = umount.target [Mount] What = /dev/vdovg/vdolv01 Where = /data/01 Type = xfs Options = discard [Install] WantedBy = multi-user.target # systemctl daemon-reload
Using the example above, you will need to modify the filename as well as the What= and Where= fields. In short, this file will ensure that the mount point will only be mounted after vdo.service and mutli-user.target. Likewise, it will be unmounted before multi-user.target and vdo.service during the shutdown.
Now, you can mount and unmount the filesystem just as you would start and stop a service.
# systemctl status data-01.mount ● data-01.mount - Mount VDO file system on /data/01 Loaded: loaded (/etc/systemd/system/data-01.mount; enabled; vendor preset: disabled) Active: inactive (dead) since Tue 2018-12-18 15:03:57 EST; 8s ago Where: /data/01 What: /dev/vdovg/vdolv01 Process: 6092 ExecUnmount=/bin/umount /data/01 (code=exited, status=0/SUCCESS) Process: 3918 ExecMount=/bin/mount /dev/vdovg/vdolv01 /data/01 -t xfs -o discard (code=exited, status=0/SUCCESS) # systemctl start data-01.mount # systemctl status data-01.mount ● data-01.mount - Mount VDO file system on /data/01 Loaded: loaded (/etc/systemd/system/data-01.mount; enabled; vendor preset: disabled) Active: active (mounted) since Tue 2018-12-18 15:04:10 EST; 2s ago Where: /data/01 What: /dev/mapper/vdovg-vdolv01 Process: 6092 ExecUnmount=/bin/umount /data/01 (code=exited, status=0/SUCCESS) Process: 6101 ExecMount=/bin/mount /dev/vdovg/vdolv01 /data/01 -t xfs -o discard (code=exited, status=0/SUCCESS)
We have seen how to install and perform a basic configuration of VDO on LVM2 on CentOS7. VDO provides native deduplication for Linux, reducing required storage capacity by eliminating redundant data from a sub-file level. Although VDO is surprisingly easy to use, it requires monitoring to ensure the back-end devices do not fill up, and it also requires special boot and shutdown order to ensure the server does not fail to boot.