Skip Ribbon Commands
Skip to main content
Cryptz.Com - Limited Access

Skip Navigation LinksESXi 6 to Ubuntu 56Gbps Infiniband ZFS Host




 
​​​

Background

   ​I have had a lab setup at home with 2 ESXi servers going back to a Illumos based ZFS server for some time (Server Rack). The ZFS server was presenting a zvol via comstar over 10Gb ISCSI (Directly Connected) to the ESXi hosts. Performance was adequate for my needs but after purchasing a few $40,000 SANs for a customer and using 10Gb links (and being less than impressed) I wanted to see what I could accomplish on my own.

  ZFS has a​ tiered caching mechanism, you can use SSDs in conjunction with rotating drives to cache your most frequently read data on the SSDs. You can read more about ARC and L2ARC here ZFS ARC.​ For this project I wanted to forego that and use 100% SSD based Pools. I ended up using 24 Samsung 840 Pros. To summarize my findings let me state a few things:

1.​​  It is not worth it for single guest access - Most corporate environments will find this irrelevant. In my situation I was more interested in how fast I could read and write to my fileserver VM. Total simultaneous IOPS to the storage infrastructure was not a concern for me (I am the only user), but it would be for your average company.

​  I found out very quickly that I could only do about 500MBps read and 500MBps write to a single guest. I was reading at about 5GBps and writing at about 3GBps locally on the SAN. I originally thought this to be a limitation of the ISCSI infrastructure (though only about half of the theoretical limit, I thought perhaps it just was the reality of the 10Gb direct link). This is what lead me to try 56Gbps Infiniband. I later came to realize that it is a limitation of ESXi. I was able to double single VM disk performance by switching from the LSI SAS driver to the VMWare Paravirtual driver. I confirmed overall infrastructure performance by running Multiple VM disk benchmarks simultaneously and seeing 1GBps Read and Write from 3 guests at the same time.​ I have not tested Hyper-V, but I am told people generally see better performance for single guests with Hyper-V. For me the pros of  VMWare outweigh this limitation. I am content waiting for ESXi 6 or whatever it takes to overcome this limitation.

 

2. Without an Infiniband switch Illumos was not an option. Infiniband requires you to run a subnet manager on the network. Most managed IB switches have a subnet manager built in. Because I am directly connecting my hosts to the SAN I was forced to run a software based SM. OpenSM works great on linux so I ended up switching to Ubuntu. I have seem some people mention porting OpenSM to Illumos but to date I have not found much information on this.

 

3. SSDs need bandwidth. My dell r720 server came configured with a 24 drive 2x 4pt SAS backplane. The backplane is likely fine for spinning disks, but a single SSD can basically saturate a 6Gbps bus. I ended up modifying the server (I will probably post that at a later time). I removed the Dell backplane and replaced it with 2 Adaptec 72405 raid cards. They each provide 24 ports of 6Gbps connectivity. Removing the Dell backplane does stop the power button from working. I am using an iDrac-Enterprise management card so turning the server off and on is not really a hassle (and rarely done). The backplane removal can be undone relatively easily if you would ever need to "send it back" again, not a very corporate undertaking.

  Illumos cannot boot off of an Adaptec series 7 card currently (I am not sure if this is a permanent issue or just a problem with the latest driver). That was another factor that lead me to use Ubuntu.

 

4. The Samsung 840 Pro drives are extremely over hyped. Their single drive benchmarks look good but they have problems in a raid array. Most of these were just fixed with their latest firmware revision. Prior to DXM05B0Q they were basically unusable in a raid array. This was attributed to the drives write cache being disabled in most raid situations. The drive has a large write cache and it is essentially needed for the drive to operate properly. Samsung's original stance was that they were consumer drives and were not supported in a raid array. Fortunately they did eventually resolve this problem.

 

*** Update. I have sinced removed the 24 samsung drives and switched to 3 micron p420m PCIE SSDS. I am currently limited by the 56Gbps Infiniband links and waiting for 100Gbps Infiniband to be more commonly available.

 

 Installation

  For the install I used Mellanox ConnectX-3 56Gbps cards. I chose to use the srp driver (SCSI host access) instead of ISCSI.

  • Installed Ubuntu 13.04 - Default Install + OpenSSH
  • ​​Patch Install​
    • apt-get update
    • apt-get upgrade

 

  •   Install Infiniband Related Items
    • apt-get install infiniband-diags
    • modprobe ib_umad
    • apt-get install srptools
    • modprobe ib_srpt
    • apt-get install opensm
    • /etc/init.d/opensm start

  Older versions of OpenSM only ran on the first Infiniband port. In my case I need to run OpenSM on both ports because each port has a direct connection to an ESXi server. The latest version of OpenSM did not have this limitation. Older versions can be altered to run on all ports, initial tests with Ubuntu 12.04 required that I edit /etc/init.d/opensm to enumerate all guids and run OpenSM in daemon mode for each guid. I believe this will vary depending on what distribution and release you are running. You can confirm that your Infiniband links are up as follows:

[email protected]:~# ibstat
CA 'mlx4_0'
        CA type: MT4099
        Number of ports: 2
        Firmware version: 2.11.500
        Hardware version: 0
        Node GUID: 0x0002c90300a4c990
        System image GUID: 0x0002c90300a4c993
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 56
                Base lid: 2
                LMC: 0
                SM lid: 2
                Capability mask: 0x0259486a
                Port GUID: 0x0002c90300a4c991
                Link layer: InfiniBand
        Port 2:
                State: Active
                Physical state: LinkUp
                Rate: 56
                Base lid: 1
                LMC: 0
                SM lid: 1
                Capability mask: 0x0259486a
                Port GUID: 0x0002c90300a4c992
                Link layer: InfiniBand

  Ibstat will report Initializing if you have physical link but no subnet manager is found. If the state is Active you are good to go.

 

  •  Install ZFS
    • add-apt-repository ppa:zfs-native/stable
    • apt-get update
    • sudo apt-get install zfs-dkms ubuntu-zfs

 

  • Configure ZFS Volume
    • zpool create PSC.Net mirror disk1 disk2
    • zpool add PSC.Net mirror disk3 disk4  
    • zpool add PSC.Net mirror disk5 disk6, etc..
    • zfs create -V 4098G PSC.Net/IB.PSC.Net

 

  • Install SCST

You may find many people shifting to targetcli and using the LIO target. LIO has gained support and has become the included linux target. Most people seem to feel it is still a bit unpolished and claim SCST is faster. I tried both SCST and LIO. I found performance to be the same. Targetcli is a bit easier to configure. I ended up using SCST instead of LIO simply because I was seeing a lot of errors in dmesg alerting to unsupported commands coming from ESXi while running LIO. I did not notice any symptoms of these alerts though. I will probably revisit LIO when Ubuntu 14.04 is released next year.

    • cd /root
    • cd /root/scst
    • make scst scst_install scstadm scstadm_install srpt srpt_install
    • update-rc.d scst defaults
    • sample /etc/scst.conf - Please note I have the target wide open due to the nature of the environment and the fact that there is no switch.

 

max_tasklet_cmd 10
setup_id 0x0
threads 16

HANDLER vdisk_blockio {
        DEVICE IB.PSC.Net {
                filename /dev/zvol/PSC.Net/IB.PSC.Net
                #threads_num 4

                # Non-key attributes
                blocksize 512
                nv_cache 0

                **nv_cache 1 improved performance when not link bound.
                read_only 0
                removable 0
                rotational 0
                t10_dev_id 46d13b96-IB.PSC.Net
                thin_provisioned 1
                threads_pool_type per_initiator
                usn 46d13b96
                write_through 0
       }
}

TARGET_DRIVER ib_srpt {
        TARGET ib_srpt_target_0 {
                enabled 1
                rel_tgt_id 1


  # Non-key attributes
                addr_method PERIPHERAL
                cpu_mask ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff
                io_grouping_type auto

                GROUP ESXi {
                        LUN 0 IB.PSC.Net {
                                # Non-key attributes
                                read_only 0
                        }

                        INITIATOR *

                        # Non-key attributes
                        addr_method PERIPHERAL
                        cpu_mask ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff
                        io_grouping_type auto
                }
        }
}

​​
​​​