High-Performance Data-Analytics Cluster

Perfect Infrastructure for Big-Data and Machine Learning

Hardware

The CfADS Data-Analytic Cluster (DA-Cluster) consists of 18 physical server nodes that are internally connected over an InfiniBand network with a bandwith of 56 Gbit/s. Combined with the Hadoop framework, the DA-Cluster is designed to efficiently process large amounts of data and parallelize computations in a distributed fashion. In addition, several state-of-the-art AI tools are available. Thus, different approaches and requirements for a professional data science workflow can be fully covered. As a result, the DA-Cluster provides a highly adjustable environment for different types of data science projects.

Cluster CfADS

Hardware Facts

  • 1 file server:
    DELL Poweredge R730xd
    Intel(R) Xeon(R) CPUs: 20 Cores / 40 Threads, 128 GB RAM, 72 TB HDD

  • 3 virtualization server:
    2 DELL Poweredge R730, 1 DELL Poweredge R740
    Intel(R) Xeon(R) CPUs: 76 Cores / 152 Threads, 640 GB RAM, 40 TB HDD
    AMD GPU: 1792 GPU-Cores, 8 GB GPU-RAM

  • 7 data nodes:
    6 DELL Poweredge R730, 1 Fujitsu Primergy RX2540 M5
    Intel(R) Xeon(R) CPUs: 156 Cores / 312 Threads, 1.024 GB RAM, 216 TB HDD

  • 7 GPU nodes:
    4 DELL Poweredge R730, 3 Fujitsu Primergy RX2540 M5
    Intel(R) Xeon(R) CPUs: 192 Cores / 384 Threads, 2.560 GB RAM, 264 TB HDD
    nvidia GPUs (8 x TESLA P100, 6 x TESLA V100): 59.392 CUDA-Cores, 288 GB GPU-RAM

  • 1 NAS:
    Synology RS3617xs+
    Intel(R) Xeon(R) CPU: 6 Cores / 12 Threads, 8 GB RAM, 72 TB HDD

  • In-rack Infiniband network:
    FDR 56 Gbit/s


In total:

  • CPUs: 450 Cores / 900 Threads
  • RAM: 4,36 TB
  • HDD: 664 TB
  • GPU-RAM: 296 GB
  • GPU-Cores: 61.184 (CUDA Cores)


Design and Security

The DA-Cluster's architecture was designed to obtain a very high level of data security and integrity. For data protection purposes, 4 security tiers are set up, which span from the user interface to the backup layer. The data transfer into the cluster is fully encrypted and access permissions to stored data is maximally limited. Consequently, the DA-Cluster provides a responsible handling when it comes to highly sensitive data. Moreover, the distributed, redundant file system (Hadoop HDFS) and the attached backup layer (NAS) effectively protect against data losses.