The GARGI HPC Integrated Smart Rack Data Centre Cluster is a state-of-the-art facility established to enhance the institute’s computational and research capabilities. Designed with smart rack infrastructure, the data centre supports advanced simulations, data-driven research, and high-performance computing.
Racks: The data centre is built using a Vertiv Smart IT Rack System, which includes five 42U smart racks integrated with six in-row precision air conditioning (PAC) units for efficient and reliable cooling.
General Compute Nodes: The cluster comprises 20 high-performance general compute nodes, based on the Tyrone Camarero SDA200A2N-18 system, powered by 2x AMD EPYC 9554 processors with 128 cores at 3.10 GHz, 256 MB of cache, and 768 GB of DDR4 RAM.
High-Memory Compute Nodes: The cluster comprises 3 high-performance High-Memory compute nodes, based on the Tyrone Camarero SDA200A2N-18 system, powered by 2x AMD EPYC 9554 processors with 128 cores at 3.10 GHz, 256 MB of cache, and 2,304 GB of RAM for memory-intensive applications.
Master Node: The cluster comprises master nodes, based on the Tyrone Camarero SDA200A2N-212 system, powered by 2x AMD EPYC 9354 processors with 64 cores at 3.25 GHz, 256 MB of cache, and 384 GB of DDR4 RAM.
Login Node: The cluster comprises 1 login nodes, based on the Tyrone Camarero SDA200A2N-18 system, powered by 2x AMD EPYC 9354 processors with 64 cores at 3.25 GHz, 256 MB of cache, and 384 GB of DDR4 RAM
Storage: The storage infrastructure is powered by a DDN EXAScaler 200NVX2 appliance equipped with 24 NVMe slots and four HDR / 200GbE ports for high-speed connectivity. It provides 120 TB of usable storage capacity configured in RAID 6 (8+2) for redundancy and reliability. The system supports a sustained write throughput of 15 GB/s, and the read performance is designed to deliver at least 15 GB/s, ensuring fast and efficient access to large datasets for data-intensive research workflows.
Network Interconnect: The compute infrastructure is interconnected using a 200 Gbps Mellanox InfiniBand fabric, enabled by an NVIDIA MQM8700-HS2F managed switch, which offers 40 HDR 200G InfiniBand ports.
Overall Impact: This integrated and energy-efficient HPC infrastructure significantly enhances the institute’s capacity to undertake cutting-edge computational and scientific research.
The High-Performance Computing (HPC) cluster, known as "Raman," is composed of the following components:
Racks: Raman is housed in 2 x Rital RDHx liquid-cooled racks, containing a total of 64 compute nodes, 1 GPU node, 1 high-memory node, 4 I/O nodes, 1 master node, and 1 login node. The cluster also features NetApp storage.
Compute Nodes: Each of the 64 compute nodes is a PRIMERGY CX400 M4 server equipped with Intel Xeon® Gold 6126 12-core CPUs running at 2.60 GHz and 96 GB of memory.
GPU Node: The GPU node is a PRIMERGY RX2540 M4 server with 2 x Intel Xeon® Gold 6126 12-core CPUs at 2.60 GHz and 96 GB of memory.
High-Memory Node: The high-memory node is a PRIMERGY RX4770 M4 server featuring Intel Xeon® Platinum 8160 24-core CPUs running at 2.10 GHz and 1152 GB of memory.
I/O Nodes: Each of the 4 I/O nodes is a PRIMERGY CX2560 M4 server with 2 x Intel Xeon® Silver 4116 CPUs at 2.10 GHz and 64 GB of memory.
Master and Login Nodes: Both the master and login nodes are PRIMERGY RX2530 M4 servers, each with 2 x Intel Xeon® Gold 6126 12-core CPUs at 2.60 GHz and 96 GB of memory.
Storage: The cluster is equipped with NetApp E5700 storage using the Lustre parallel file system, offering a performance of 6 GB/s and a usable storage capacity of 104 TB.
Interconnects: Raman is interconnected with a 100 Gbps network through 5 x Intel OPA 48-port Infiniband switches, and a 1 Gbps network via 3 x 48-port Gigabit Ethernet switches.