12 Hardware
CloudLab can allocate experiments on any one of several federated clusters.
CloudLab has the ability to dispatch experiments to several clusters: three that belong to CloudLab itself, plus several more that belong to federated projects.
Additional information about these nodes can be found at https://www.cloudlab.us/portal-hardware.php
12.1 CloudLab Utah
The CloudLab cluster at the University of Utah is being built in partnership with HP and Dell. It currently consists of 4 Intel Ice Lake servers, 208 AMD EPYC Rome servers (two generations), 170 Intel Xeon E5 servers, 270 Xeon-D servers, and 270 64-bit ARM servers for a total of 9,336 cores. The cluster is housed in the University of Utah’s Downtown Data Center in Salt Lake City.
m400 |
| 270 nodes (64-bit ARM) |
CPU |
| Eight 64-bit ARMv8 (Atlas/A57) cores at 2.4 GHz (APM X-GENE) |
RAM |
| 64GB ECC Memory (8x 8 GB DDR3-1600 SO-DIMMs) |
Disk |
| 120 GB of flash (SATA3 / M.2, Micron M500) |
NIC |
| Dual-port Mellanox ConnectX-3 10 GB NIC (PCIe v3.0, 8 lanes (one port available for experiment use) |
m510 |
| 270 nodes (Intel Xeon-D) |
CPU |
| Eight-core Intel Xeon D-1548 at 2.0 GHz |
RAM |
| 64GB ECC Memory (4x 16 GB DDR4-2133 SO-DIMMs) |
Disk |
| 256 GB NVMe flash storage |
NIC |
| Dual-port Mellanox ConnectX-3 10 GB NIC (PCIe v3.0, 8 lanes (one port available for experiment use) |
For the m400 and m510 node types, there are 45 nodes in a chassis with a total of twelve chassis. Each chassis has two 45XGc switches; each node is connected to both switches, and each chassis switch has four 40Gbps uplinks, for a total of 320Gbps of uplink capacity from each chassis. One switch is used for control traffic, connecting to the Internet, etc. The other is used to build experiment topologies, and should be used for most experimental purposes.
All chassis are interconnected through a large HP FlexFabric 12910 switch which has full bisection bandwidth internally.
Upon request and with help from CloudLab staff, users can allocate an entire chassis. When allocated in this mode, it is possible to have complete administrator control over the experiment switch in addition to the nodes.
In phase two we added 50 Apollo R2200 chassis each with four HPE ProLiant XL170r server modules. Each server has 10 cores for a total of 2000 cores.
xl170 |
| 200 nodes (Intel Broadwell, 10 core, 1 disk) |
CPU |
| Ten-core Intel E5-2640v4 at 2.4 GHz |
RAM |
| 64GB ECC Memory (4x 16 GB DDR4-2400 DIMMs) |
Disk |
| Intel DC S3520 480 GB 6G SATA SSD |
NIC |
| Two Dual-port Mellanox ConnectX-4 25 GB NIC (PCIe v3.0, 8 lanes (two ports available for experiment use, one 10Gb and one 25Gb) |
Each server is connected via a 10Gbps control link (Dell switches) and a 25Gbps experimental link to Mellanox 2410 switches in groups of 40 servers. Each of the five groups’ experimental switches are connected to a Mellanox 2700 spine switch at 5x100Gbps. That switch in turn interconnects with the rest of the Utah CloudLab cluster via 6x40Gbps uplinks to the HP FlexFabric 12910 switch.
A unique feature of the phase two nodes is the addition of eight ONIE bootable "user allocatable" switches that can run a variety of Open Network OSes: six Dell S4048-ONs and two Mellanox MSN2410-BB2Fs. These switches and all 200 nodes are connected to two NetScout 3903 layer-1 switches, allowing flexible combinations of nodes and switches in an experiment. Note that links through the NetScout switches are only 10Gb, not 25Gb.
For phase two we also added 28 Dell AMD EPYC-based servers with dual 100Gb Ethernet ports.
d6515 |
| 28 nodes (AMD EPYC Rome, 32 core, 2 disk, 100Gb Ethernet) |
CPU |
| 32-core AMD 7452 at 2.35GHz |
RAM |
| 128GB ECC Memory (8x 16 GB 3200MT/s RDIMMs) |
Disk |
| Two 480 GB 6G SATA SSD |
NIC |
| Dual-port Mellanox ConnectX-5 100 GB NIC (PCIe v4.0) (both ports available for experiment use) |
NIC |
| Dual-port Broadcom 57414 25 GB NIC (one port available for experiment use) |
Each server is connected via a 25Gbps control link (Dell S5224F switch), 2 x 100Gbs experiment links (Dell Z9264F-ON switch), and a 25Gbps experiment link (Dell S5248F-ON switch). The experiment switches are connected to the "phase two" Mellanox 2700 spine switch at 4x100Gbps (Z9264F) and 2x100Gbps (S5248F).
In the initial installment of phase three (2021) we added 180 more AMD EPYC Rome servers in two configurations.
c6525-25g |
| 144 nodes (AMD EPYC Rome, 16 core, 2 disk, 25Gb Ethernet) |
CPU |
| 16-core AMD 7302P at 3.00GHz |
RAM |
| 128GB ECC Memory (8x 16 GB 3200MT/s RDIMMs) |
Disk |
| Two 480 GB 6G SATA SSD |
NIC |
| Two dual-port Mellanox ConnectX-5 25Gb GB NIC (PCIe v4.0) (two ports available for experiment use) |
c6525-100g |
| 36 nodes (AMD EPYC Rome, 24 core, 2 disk, 25/100Gb Ethernet) |
CPU |
| 24-core AMD 7402P at 2.80GHz |
RAM |
| 128GB ECC Memory (8x 16 GB 3200MT/s RDIMMs) |
Disk |
| Two 1.6 TB NVMe SSD (PCIe v4.0) |
NIC |
| Dual-port Mellanox ConnectX-5 25 GB NIC (PCIe v4.0) (one port available for experiment use) |
NIC |
| Dual-port Mellanox ConnectX-5 Ex 100 GB NIC (PCIe v4.0) (one port available for experiment use) |
The "-25g" variant nodes have 2 x 25Gb experiment links (Dell S5296F switches) and SATA-based SSDs, and are intended for general experimentation.
The "-100g" variant nodes have one 25Gb (Dell S5296) and one 100Gb (Dell Z9264) experiment link as well as two large NVMe-based SSDs and more cores, and are intended for network and storage intensive experimentation.
Each server is also connected via a 25Gbps control link (Dell S5296F switch).
The experiment switches are interconnected via a single Dell Z9332 using 4-8 100Gb links each.
In the second installment of phase three (early 2022), we added a small set of "expandable" nodes, 2U boxes with multiple PCIe slots available for add in devices such as GPUs, FPGA, or other accelerator cards.
d750 |
| 4 nodes (Intel Ice Lake, 16 core, 2 disk, 25Gb Ethernet) |
CPU |
| 16-core Intel Xeon Gold 6326 at 2.90GHz |
RAM |
| 128GB ECC Memory (16x 8 GB 3200MT/s RDIMMs) |
Disk |
| 480 GB SATA SSD (PCIe v4.0) |
Disk |
| 400 GB NVMe Optane P5800X SSD (PCIe v4.0) |
NIC |
| Quad-port BCM57504 NetXtreme-E 25 GB NIC (three ports available for experiment use) |
Each server is also connected via a 25Gbps control link (Dell S5296F switch) and three 25Gbps experiment links (via another Dell S5296F switch).
These machines have four available full-length double-wide PCIe v4 x16 slots and 2400W power supplies capable of handling four enterprise GPUs or other accelerator cards.
They also have a 400GB Optane write-intensive SSD providing another level of storage hierarchy for experimentation.
The Utah Cloudlab cluster includes a storage server for remote datasets. The server currently has 80TB available for allocation.
12.2 CloudLab Wisconsin
The CloudLab cluster at the University of Wisconsin is built in partnership with Cisco, Seagate, and HP. The cluster, which is in Madison, Wisconsin, has 568 servers with a total of 11,412 cores connected in a CLOS topology with full bisection bandwidth. It has 1,574 TB of storage, including SSDs on most nodes.
NOTE: In Early 2024 the c220g5, c240g5, and c4130 nodes had their experiment networking redone. c220g5 nodes now have 2x10Gb links (up from one link), c240g5s have 2x25Gb (up from 2x10Gb), and the c4130s have 2x100Gb (up from 2x10Gb). The changes are reflected below.
c220g1 |
| 88 nodes (Haswell, 16 core, 3 disks) |
CPU |
| Two Intel E5-2630 v3 8-core CPUs at 2.40 GHz (Haswell w/ EM64T) |
RAM |
| 128GB ECC Memory (8x 16 GB DDR4 1866 MHz dual rank RDIMMs) |
Disk |
| Two 1.2 TB 10K RPM 6G SAS SFF HDDs |
Disk |
| One Intel DC S3500 480 GB 6G SATA SSDs |
NIC |
| Dual-port Intel X520-DA2 10Gb NIC (PCIe v3.0, 8 lanes) (both ports available for experiment use) |
NIC |
| Onboard Intel i350 1Gb |
c240g1 |
| 5 nodes (Haswell, 16 core, 14 disks) |
CPU |
| Two Intel E5-2630 v3 8-core CPUs at 2.40 GHz (Haswell w/ EM64T) |
RAM |
| 128GB ECC Memory (8x 16 GB DDR4 1866 MHz dual rank RDIMMs) |
Disk |
| Two Intel DC S3500 480 GB 6G SATA SSDs |
Disk |
| Twelve 3 TB HDDs donated by Seagate |
NIC |
| Dual-port Intel X520-DA2 10Gb NIC (PCIe v3.0, 8 lanes) (both ports available for experiment use) |
NIC |
| Onboard Intel i350 1Gb |
c220g2 |
| 159 nodes (Haswell, 20 core, 3 disks) |
CPU |
| Two Intel E5-2660 v3 10-core CPUs at 2.60 GHz (Haswell EP) |
RAM |
| 160GB ECC Memory (10x 16 GB DDR4 2133 MHz dual rank RDIMMs) |
Disk |
| One Intel DC S3500 480 GB 6G SATA SSDs |
Disk |
| Two 1.2 TB 10K RPM 6G SAS SFF HDDs |
NIC |
| Dual-port Intel X520 10Gb NIC (PCIe v3.0, 8 lanes) (both ports available for experiment use) |
NIC |
| Onboard Intel i350 1Gb |
c240g2 |
| 3 nodes (Haswell, 20 core, 8 disks) |
CPU |
| Two Intel E5-2660 v3 10-core CPUs at 2.60 GHz (Haswell EP) |
RAM |
| 160GB ECC Memory (10x 16 GB DDR4 2133 MHz dual rank RDIMMs) |
Disk |
| Two Intel DC S3500 480 GB 6G SATA SSDs |
Disk |
| Two 1TB HDDs |
Disk |
| Four 3TB HDDs |
NIC |
| Dual-port Intel X520 10Gb NIC (PCIe v3.0, 8 lanes) (both ports available for experiment use) |
NIC |
| Onboard Intel i350 1Gb |
Phase two added 260 new nodes, 36 with one or more GPUs:
c220g5 |
| 219 nodes (Intel Skylake, 20 core, 2 disks) |
CPU |
| Two Intel Xeon Silver 4114 10-core CPUs at 2.20 GHz |
RAM |
| 192GB ECC DDR4-2666 Memory |
Disk |
| One 1 TB 7200 RPM 6G SAS HDs |
Disk |
| One Intel DC S3500 480 GB 6G SATA SSD |
NIC |
| Dual-port Intel X520-DA2 10Gb NIC (PCIe v3.0, 8 lanes) (both ports available for experiment use) |
NIC |
| Onboard Intel i350 1Gb |
c240g5 |
| 32 nodes (Intel Skylake, 20 core, 2 disks, GPU) |
CPU |
| Two Intel Xeon Silver 4114 10-core CPUs at 2.20 GHz |
RAM |
| 192GB ECC DDR4-2666 Memory |
Disk |
| One 1 TB 7200 RPM 6G SAS HDs |
Disk |
| One Intel DC S3500 480 GB 6G SATA SSD |
GPU |
| One NVIDIA 12GB PCI P100 GPU |
NIC |
| Dual-port Mellanox 25Gb NIC (PCIe v3.0, 8 lanes) (both ports available for experiment use) |
NIC |
| Onboard Intel i350 1Gb |
c4130 |
| 4 nodes (Intel Broadwell, 16 core, 2 disks, 4 GPUs) |
CPU |
| Two Intel Xeon E5-2667 8-core CPUs at 3.20 GHz |
RAM |
| 128GB ECC Memory |
Disk |
| Two 960 GB 6G SATA SSD |
GPU |
| Four NVIDIA 16GB Tesla V100 SMX2 GPUs |
NIC |
| Dual-port 100-Gigabit NIC (both ports available for experiment use) |
Phase three added 58 Dell and Supermicro nodes, including 28 with one or more GPUs and 200Gb networking:
sm110p |
| 20 nodes (Intel Ice Lake, 16 core, 5 disks) |
CPU |
| One Intel Xeon Silver 4314 16-core CPU at 2.40 GHz |
RAM |
| 128GB ECC DDR4-3200 Memory |
Disk |
| One 960 GB Intel SATA 6G SSD (SSDSC2KG960G8) |
Disk |
| Four 960 GB Samsung PCIe4 x4 NVMe (MZQL2960HCJR-00A07) |
NIC |
| Dual-port Mellanox ConnectX-6 LX 25Gb NIC (not available for experiment use) |
NIC |
| Dual-port Mellanox ConnectX-6 DX 100Gb NIC (both ports available for experiment use) |
sm220u |
| 10 nodes (Intel Ice Lake, 32 core, 9 disks) |
CPU |
| Two Intel Xeon Silver 4314 16-core CPU at 2.40 GHz |
RAM |
| 256GB ECC DDR4-3200 Memory |
Disk |
| One 960 GB Intel SATA 6G SSD (SSDSC2KG960G8) |
Disk |
| Eight 960 GB Samsung PCIe4 x4 NVMe (MZQL2960HCJR-00A07) |
NIC |
| Dual-port Mellanox ConnectX-6 LX 25Gb NIC (not available for experiment use) |
NIC |
| Dual-port Mellanox ConnectX-6 DX 100Gb NIC (both ports available for experiment use) |
d7525 |
| 24 nodes (AMD EPYC Rome, 16 core, 3 disks, 1 GPU) |
CPU |
| Two 16-core AMD 7302 at 3.00GHz |
RAM |
| 128GB ECC Memory (8x 16 GB 3200MT/s RDIMMs) |
Disk |
| Two 480 GB 6G SATA SSD |
Disk |
| One 1.6 TB PCIe4 x4 NVMe SSD |
NIC |
| Dual-port Mellanox ConnectX-6 DX 100Gb NIC (one port at 200Gb available for experiment use) |
GPU |
| One NVIDIA 24GB Ampre A30 GPU |
d8545 |
| 4 nodes (AMD EPYC Rome, 48 core, 3 disks, 1 GPU) |
CPU |
| Two 24-core AMD 7413 at 2.65GHz |
RAM |
| 512GB ECC Memory (16x 32 GB 3200MT/s RDIMMs) |
Disk |
| Two 480 GB 6G SATA SSD |
Disk |
| One 1.6 TB PCIe4 x4 NVMe SSD |
NIC |
| Dual-port Mellanox ConnectX-6 DX 100Gb NIC (one port at 200Gb available for experiment use) |
GPU |
| NVIDIA HGX A100 GPU (4x 40GB A100 SXM4 GPUs) |
All nodes are connected to two Ethernet networks:
The “control network”—
this network is used for remote access, experiment management, etc., and is connected to the public Internet. When you log in to nodes in your experiment using ssh, this is the network you are using. Phase three nodes have a 25 Gbps connection, all others are 1 Gbps. You should not use this network as part of the experiments you run in CloudLab. The “experiment network”–each node has one or two interfaces on this network. The nodes are directly connected to a number of HP and Dell leaf switches which in turn are connected to two interconnected spine switches. Most nodes have 10 Gbps connections, but some phase three nodes have 100 or 200 Gbps connections.
The Wisconsin Cloudlab cluster includes a storage server for remote datasets. The server currently has 160TB available for allocation.
12.3 CloudLab Clemson
The CloudLab cluster at Clemson University has been built partnership with Dell. The cluster so far has 345 servers with a total of 12,264 cores, 1,182TB of disk space, and 97TB of RAM. All nodes have at least 10GB Ethernet (some 25Gb or 100Gb) and some have QDR Infiniband. It is located in Clemson, South Carolina.
c8220 |
| 96 nodes (Ivy Bridge, 20 core) |
CPU |
| Two Intel E5-2660 v2 10-core CPUs at 2.20 GHz (Ivy Bridge) |
RAM |
| 256GB ECC Memory (16x 16 GB DDR4 1600MT/s dual rank RDIMMs |
Disk |
| Two 1 TB 7.2K RPM 3G SATA HDDs |
NIC |
| Dual-port Intel 10Gbe NIC (PCIe v3.0, 8 lanes |
NIC |
| Qlogic QLE 7340 40 Gb/s Infiniband HCA (PCIe v3.0, 8 lanes) |
c8220x |
| 4 nodes (Ivy Bridge, 20 core, 20 disks) |
CPU |
| Two Intel E5-2660 v2 10-core CPUs at 2.20 GHz (Ivy Bridge) |
RAM |
| 256GB ECC Memory (16x 16 GB DDR4 1600MT/s dual rank RDIMMs |
Disk |
| Eight 1 TB 7.2K RPM 3G SATA HDDs |
Disk |
| Twelve 4 TB 7.2K RPM 3G SATA HDDs |
NIC |
| Dual-port Intel 10Gbe NIC (PCIe v3.0, 8 lanes |
NIC |
| Qlogic QLE 7340 40 Gb/s Infiniband HCA (PCIe v3.0, 8 lanes) |
c6320 |
| 83 nodes (Haswell, 28 core) |
CPU |
| Two Intel E5-2683 v3 14-core CPUs at 2.00 GHz (Haswell) |
RAM |
| 256GB ECC Memory |
Disk |
| Two 1 TB 7.2K RPM 3G SATA HDDs |
NIC |
| Dual-port Intel 10Gbe NIC (X520) |
NIC |
| Qlogic QLE 7340 40 Gb/s Infiniband HCA (PCIe v3.0, 8 lanes) |
c4130 |
| 2 nodes (Haswell, 28 core, two GPUs) |
CPU |
| Two Intel E5-2680 v3 12-core processors at 2.50 GHz (Haswell) |
RAM |
| 256GB ECC Memory |
Disk |
| Two 1 TB 7.2K RPM 3G SATA HDDs |
GPU |
| Two Tesla K40m GPUs |
NIC |
| Dual-port Intel 1Gbe NIC (i350) |
NIC |
| Dual-port Intel 10Gbe NIC (X710) |
NIC |
| Qlogic QLE 7340 40 Gb/s Infiniband HCA (PCIe v3.0, 8 lanes) |
There is also one storage intensive (270TB!) node that should only be used if you need a huge amount of volatile storage. This node has only 10GB Ethernet.
dss7500 |
| 1 nodes (Haswell, 12 core, 270TB disk) |
CPU |
| Two Intel E5-2620 v3 6-core CPUs at 2.40 GHz (Haswell) |
RAM |
| 128GB ECC Memory |
Disk |
| Two 120 GB 6Gbps SATA SSDs |
Disk |
| 45 6 TB 7.2K RPM 6Gbps SATA HDDs |
NIC |
| Dual-port Intel 10Gbe NIC (X520) |
There are three networks at the Clemson site:
A 1 Gbps Ethernet “control network”—
this network is used for remote access, experiment management, etc., and is connected to the public Internet. When you log in to nodes in your experiment using ssh, this is the network you are using. You should not use this network as part of the experiments you run in CloudLab. A 10 Gbps Ethernet “experiment network”–each node has one interface on this network. This network is implemented using three Force10 S6000 and three Force10 Z9100 switches. Each S6000 switch is connected to a companion Z9100 switch via a 480Gbps link aggregate.
A 40 Gbps QDR Infiniband “experiment network”–each node has one connection to this network, which is implemented using a large Mellanox chassis switch with full bisection bandwidth.
Phase two added 18 Dell C6420 chassis each with four dual-socket Skylake-based servers. Each of the 72 servers has 32 cores for a total of 2304 cores.
c6420 |
| 72 nodes (Intel Skylake, 32 core, 2 disk) |
CPU |
| Two Sixteen-core Intel Xeon Gold 6142 CPUs at 2.6 GHz |
RAM |
| 384GB ECC DDR4-2666 Memory |
Disk |
| Two Seagate 1TB 7200 RPM 6G SATA HDs |
NIC |
| Dual-port Intel X710 10Gbe NIC |
Each server is connected via a 1Gbps control link (Dell D3048 switches) and a 10Gbps experimental link (Dell S5048 switches).
These Phase two machines do not include Infiniband.
Phase two also added 6 IBM Power System S822LC (8335-GTB) POWER8 servers. These machines are booted using the Linux-based OpenPOWER firmware (OPAL). They can run code in either little- or big-endian modes, but we only provide a little-endian standard system image (‘UBUNTU18-PPC64LE‘).
ibm8335 |
| 6 nodes (POWER8NVL, 20 core, 256GB RAM, 1 GPU) |
CPU |
| Two ten-core (8 threads/core) IBM POWER8NVL CPUs at 2.86 GHz |
RAM |
| 256GB 1600MHz DDR4 memory |
Disk |
| Two Seagate 1TB 7200 RPM 6G SATA HDDs (ST1000NX0313) |
NIC |
| One Broadcom NetXtreme II BCM57800 1/10 GbE NIC |
GPU |
| Two NVIDIA GP100GL (Tesla P100 SMX2 16GB) |
FPGA |
| One ADM-PCIE-KU3 (Xilinx Kintex UltraScale) |
You can find more info here.
Phase three added 15 Dell R7525 servers, each with dual-core AMD EPYX processors, two NVIDIA GPUs, and a Mellanox BlueField2 SmartNIC.
r7525 |
| 15 nodes (AMD EPYC Rome, 64 core, 512GB RAM, 2 x GPU) |
CPU |
| Two 32-core AMD 7542 at 2.9GHz |
RAM |
| 512GB ECC Memory (16x 32 GB 3200MHz DDR4) |
Disk |
| One 2TB 7200 RPM 6G SATA HDD |
NIC |
| Dual-port Mellanox ConnectX-5 25 Gb NIC (PCIe v4.0) |
NIC |
| Dual-port Mellanox BlueField2 100 Gb SmartNIC |
GPU |
| Two NVIDIA GV100GL (Tesla V100S PCIe 32GB) |
The nodes have a 1Gb control network connection, one 25Gb experiment connection, and 2 x 100Gb connections via the BlueField2 card.
The latest addition to the cluster includes 64 new machines.
r650 |
| 32 nodes (Intel Ice Lake, 72 core, 256GB RAM, 1.6TB NVMe) |
CPU |
| Two 36-core Intel Xeon Platinum 8360Y at 2.4GHz |
RAM |
| 256GB ECC Memory (16x 16 GB 3200MHz DDR4) |
Disk |
| One 480GB SATA SSD |
Disk |
| One 1.6TB NVMe SSD (PCIe v4.0) |
NIC |
| Dual-port Mellanox ConnectX-5 25 Gb NIC (PCIe v4.0) |
NIC |
| Dual-port Mellanox ConnectX-6 100 Gb NIC (PCIe v4.0) |
r6525 |
| 32 nodes (AMD EPYC Milan, 64 core, 256GB RAM, 1.6TB NVMe) |
CPU |
| Two 32-core AMD 7543 at 2.8GHz |
RAM |
| 256GB ECC Memory (16x 16 GB 3200MHz DDR4) |
Disk |
| One 480GB SATA SSD |
Disk |
| One 1.6TB NVMe SSD (PCIe v4.0) |
NIC |
| Dual-port Mellanox ConnectX-5 25 Gb NIC (PCIe v4.0) |
NIC |
| Dual-port Mellanox ConnectX-6 100 Gb NIC (PCIe v4.0) |
Each of these servers is connected via a 25Gbps control link and a 100Gbps experimental link.
Most recently, two NVIDIA Grace Hopper nodes have been added to the cluster.
nvidiagh |
| 2 nodes (GH200 Superchip, 72 core, 480GB RAM, 2.8TB NVMe) |
CPU |
| One 72-core Arm Neoverse V2 at 3.1GHz |
RAM |
| 480GB LPDDR5X (384GB/sec) |
RAM |
| 96GB HBM3 (4TB/sec) |
Disk |
| One 960 and one 1.9GB NVMe SSD (PCIe v4.0) |
NIC |
| Dual-port Mellanox ConnectX-7 100 Gb NIC (PCIe v4.0) (one port available for experiment use) |
GPU |
| One NVIDIA H100 (Hopper) |
The Clemson Cloudlab cluster includes a storage server for remote datasets. The server currently has 40TB available for allocation.
12.4 Apt Cluster
The main Apt cluster is housed in the University of Utah’s Downtown Data Center in Salt Lake City, Utah. It contains two classes of nodes:
r320 |
| 100 nodes (Sandy Bridge, 8 cores) |
CPU |
| 1x Xeon E5-2450 processor (8 cores, 2.1Ghz) |
RAM |
| 16GB Memory (4 x 2GB RDIMMs, 1.6Ghz) |
Disks |
| 4 x 500GB 7.2K SATA Drives (RAID5) |
NIC |
| 1GbE Dual port embedded NIC (Broadcom) |
NIC |
| 1 x Mellanox MX354A Dual port FDR CX3 adapter w/1 x QSA adapter |
c6220 |
| 56 nodes (Ivy Bridge, 16 cores) |
CPU |
| 2 x Xeon E5-2650v2 processors (8 cores each, 2.6Ghz) |
RAM |
| 64GB Memory (8 x 8GB DDR-3 RDIMMs, 1.86Ghz) |
Disks |
| 2 x 1TB SATA 7.2K RPM hard drives |
NIC |
| 4 x 1GbE embedded Ethernet Ports (Broadcom) |
NIC |
| 1 x Intel X520 PCIe Dual port 10Gb Ethernet NIC |
NIC |
| 1 x Mellanox FDR CX3 Single port mezzanine card |
All nodes are connected to three networks with one interface each:
A 1 Gbps Ethernet “control network”—
this network is used for remote access, experiment management, etc., and is connected to the public Internet. When you log in to nodes in your experiment using ssh, this is the network you are using. You should not use this network as part of the experiments you run in Apt. A “flexible fabric” that can run up to 56 Gbps and runs either FDR Infiniband or Ethernet. This fabric uses NICs and switches with Mellanox’s VPI technology. This means that we can, on demand, configure each port to be either FDR Inifiniband or 40 Gbps (or even non-standard 56 Gbps) Ethernet. This fabric consists of seven edge switches (Mellanox SX6036G) with 28 connected nodes each. There are two core switches (also SX6036G), and each edge switch connects to both cores with a 3.5:1 blocking factor. This fabric is ideal if you need very low latency, Infiniband, or a few, high-bandwidth Ethernet links.
A 10 Gbps Ethernet “commodity fabric”. One the r320 nodes, a port on the Mellanox NIC (permanently set to Ethernet mode) is used to connect to this fabric; on the c6220 nodes, a dedicated Intel 10 Gbps NIC is used. This fabric is built from two Dell Z9000 switches, each of which has 96 nodes connected to it. It is idea for creating large LANs: each of the two switches has full bisection bandwidth for its 96 ports, and there is a 3.5:1 blocking factor between the two switches.
There is no remote dataset capability at the Apt cluster.
12.5 Mass
UMass and the Mass Open Cloud host a cluster at the Massachusetts Green High Performance Compute Center in Holyoke, Massachusetts.
rs440 |
| 5 nodes (Skylake, 32 cores) |
CPU |
| 2 x Xeon Gold 6130 processors (16 cores each, 2.1Ghz) |
RAM |
| 192GB Memory (12 x 16GB RDIMMs) |
Disks |
| 1 x 240GB SATA SSD drives |
NIC |
| 2 x 10GbE embedded Ethernet Ports (Broadcom 57412) |
These nodes are connected via two 10Gbps ports to a Dell S4048-ON switch. One port is used for control traffic and connectivity to the public Internet, and the other is used for the experiment network.
rs620 |
| 38 nodes (Sandy Bridge, 16 or 20 cores) |
CPU |
| 2 x Xeon processors (8-10 cores each, 2.2Ghz or more) |
RAM |
| 128-384GB Memory (most have 256GB) |
Disks |
| 1 x 900GB 10K SAS Drive |
NIC |
| 1GbE Quad port embedded NIC (Intel) |
NIC |
| 1 x Solarflare Dual port SFC9120 10G Ethernet NIC |
rs630 |
| 38 nodes (Haswell, 20 cores) |
CPU |
| 2 x Xeon E5-2660 v3 processors (10 cores each, 2.6Ghz or more) |
RAM |
| 256GB Memory (16 x 16GB DDR4 DIMMs) |
Disks |
| 1 x 900GB 10K SAS Drive |
NIC |
| 1GbE Quad port embedded NIC (Intel) |
NIC |
| 1 x Solarflare Dual port SFC9120 10G Ethernet NIC |
There is some variation within the rs620 and rs630 nodes, primarily with the CPUs.
On these nodes, the control/Internet connections is a 1Gbps port and one of the 10Gbps interfaces on each node is used for the experiment network.
There is currently no remote dataset capability at the UMass cluster.
12.6 OneLab
The OneLab facility at Sorbonne University in Paris hosts a small cluster modeled after part of the Utah hardware, with one chassis of ARM64 servers. In addition to this cluster, which is available to all CloudLab users through the CloudLab interface, OneLab hosts a large number of other experiment environments, including clusters, IoT devices, and software defined networks. See the OneLab website for a complete list.
m400 |
| 45 nodes (64-bit ARM) |
CPU |
| Eight 64-bit ARMv8 (Atlas/A57) cores at 2.4 GHz (APM X-GENE) |
RAM |
| 64GB ECC Memory (8x 8 GB DDR3-1600 SO-DIMMs) |
Disk |
| 120 GB of flash (SATA3 / M.2, Micron M500) |
NIC |
| Dual-port Mellanox ConnectX-3 10 GB NIC (PCIe v3.0, 8 lanes |
There is no remote dataset capability at the OneLab cluster.