

### DAPHNE: Integrated **D**ata **A**nalysis **P**ipelines for Large-Scale Data Management, **H**PC, and Machi**ne** Learning

### Patrick Damme

#### TU Graz & Know-Center GmbH

Oral communication @ ISPDC 2022, Basel, Switzerland, July 12, 2022



This project has received funding from the European Union's Horizon 2020 research and innovation programme under agreement number 957407.

https://daphne-eu.eu/

### Modern Data-driven Applications







# DM + ML + HPC

Data Management & query processing

Machine Learning training & scoring

High-Perf. Computing custom codes & simulations

#### **Example: ML-assisted simulation**



## Challenges

#### Deployment Challenges





BF16 BIIIIII

A1001

## Project Consortium



#### **13 partner institutions** from 7 countries

- DM, ML, HPC
- Academia & industry
- Different application domains



- Know-Center GmbH (coordinator), Austria
- AVL 🐝 AVL List GmbH, Austria
- Deutsches Zentrum fuer Luft- und Raumfahrt e.V., Germany
- Laurana Eidgenoessische Technische Hochschule Zuerich, Switzerland
- Hasso Hasso-Plattner-Institut for Digital Engineering gGmbH, Germany
  - Institute of Communication and Computer Systems, Greece
- (Infineon Infineon Technologies Austria AG, Austria
- intel. , Intel Technology Poland sp. z o.o., Poland
- IT-Universitetet i København, Denmark
- Kompetenzzentrum Automobil- und Industrieelektronik GmbH, Austria
- Dit Technische Universität Dresden, Germany
- Univerza v Mariboru, Slovenia
- ﷺ Universitate Basel, Switzerland

## Example Use Cases

- DLR Earth Observation
  - ESA Sentinel-1/2 datasets → 4PB/year
  - Training of local climate zone classifiers on So2Sat LCZ42 (15 experts, 400K instances, 10 labels each, ~55GB HDF5)
  - ML pipeline: preprocessing, ResNet-20, climate models

- IFAT Semiconductor Ion Beam Tuning
- KAI Semiconductor Material Degradation
- AVL Vehicle Development Process (ejector geometries, KPIs)
- ML-assisted simulations, data cleaning, augmentation
- Cleaning during exploratory query processing









AVL 炎

### System Architecture



### Language Abstractions

#### • Design Principles

- Frame and Matrix Operations (coarse-grained)
- Data Independence (abstract data types)
- Extensibility (data types, operations, HW)

#### • DSL Operations

- **Basic built-in** operations (RA, LA)
- High-level built-in operations (e.g., SQL, PS, map on frames/matrices)
- MLIR SCF (loops, branches)
- Typed and untyped functions (hierarchy of composite primitives)
- UDFs and external libraries

#### **Python API DaphneLib**

}

```
dc = DaphneContext()
G = dc.from_numpy(npG)
G = (G != 0)
c = components(G, 100, True).compute()
```

#### Domain-specific Language DaphneDSL

```
def components(G, maxi, verbose) {
  n = nrow(G); // get the number of vertexes
  maxi = 100;
  c = seq(1, n); // init vertex IDs
  diff = inf; // init diff to +Infinity
  iter = 1;
  // iterative computation of connected components
  while(diff>0 & iter<=maxi) {
    u = max(rowMaxs(G * t(c)), c); // neighbor prop
    diff = sum(u != c); // # of changed vertexes
    c = u; // update assignment
    iter = iter + 1;
  }
</pre>
```

#### Multiple dispatch of functions/kernels

## Optimizing Compilation Chain

- **Goal:** systematic lowering from DaphneIR to kernels and LLVM
- Optimization Passes
  - MLIR Programming Language Rewrites (CSE, constant propagation, constant folding, branch removal, code motion/loop hoisting, function inlining / unrolling)
  - **Type and Property Inference** (e.g., types/schema, shapes/sparsity, symmetry)
  - Inter-Procedural Analysis (function specialization)
  - Algebraic Simplification Rewrites (e.g., relational/linear algebra rewrites)
  - **Operator Ordering** (e.g., join ordering/enumeration, matrix multiplication chain optimization, sum-product optimizations, data-flow-graph linearization)
  - Generation of Fused Operator Pipelines (selection of fused operators in DAGs, vectorization/tiling, and splitting/merging strategies of inputs/results)
  - Memory Management (update-in-place, reuse of allocations, garbage collection)
  - Execution Type Selection (local vs distributed incl. primitives caching/partitioning)
  - **Device Placement** (e.g., CPU/GPU/FPGA, multiple devices)
  - Physical Operator Selection (e.g., different join/group-by/matmult operators)

### Data Representations



- Data Types: Matrix, Frame, Scalar, (Tensor, List)
- Value Types: e.g., SI8, SI32, SI64, UI8, UI32, UI64, FP32, FP64



### Vectorized (Tiled) Execution

(%9, %10) = fusedPipeline1(%X, %y, %colmu, %colsd) {



Default Parallelization Frame & Matrix Ops Locality-aware, Multi-device Scheduling **Fused Operator Pipelines** on Tiles/Scalars + Codegen

## Vectorized (Tiled) Execution, cont.

### • #1 Zero-copy Input Slicing

- Create view on sliced input (no-op)
- All kernels work on views
- #2 Sparse Intermediates
  - Reuse dense/sparse kernels
  - Sparse pipeline intermediates for free

### • #3 Fine-grained Control

- Task sizes (dequeue, data access) vs data binding (cache-conscious ops)
- Scheduling for load balance (e.g., sparse operations)

### • #4 Computational Storage

 Task queues connect eBPF programs, async I/O into buffers, and subsequent operator pipelines



(%9, %10) = fusedPipeline1(%X, %y, %colmu, %colsd) {



## **Distributed Vectorized Execution**

- Federated matrices/frames + distribution primitives
- Hierarchical vectorized pipelines and scheduling
- Node 1 Coordinator (%9, %10) = fusedPipeline1(%X, %y, %colmu, %colsd) (X) Combine (spawns distributed fused pipeline) GPU/FPGA Workers (GPU CPU Workers dsyrk 🕨 [1: • **#1** Prepare Inputs minus → div → cbind 🖌 dgemv 🕂 🛏 100M<sup>-</sup> ×⁻v slice (N/A, repartition, broadcasts, slices broadcasts as necessary) colmu colsd • #2 Coarse-grained Tasks Node 2 (tasks run vectorized pipeline) (%9, %10) = fusedPipeline1(%X, %y, %colmu, %colsd) Tiles → Task **GPU/FPGA Workers** Combine (Σ) #3 Combine Outputs Х CPU Workers dsyrk [100M: minus div cbind (N/A, all-reduce, rbind/cbind) 👌 dgemv X<sup>T</sup>V 200M1

X<sup>T</sup>X

X<sup>T</sup>Y

## Extensibility

#### Goals for Extensibility

- New data types and kernels (e.g., compressed, HW devices)
- New optimization passes and scheduling algorithms
- Integration with other MLIR dialects (e.g., linalg)

#### • #1 Extension Catalog

- Register kernels/data types as shared libraries
- Type hierarchy, cost functions, constraints

### #2 DSL-level Extensibility/Configuration

- Data representations, data/ops placement (constraints)
- Sideways Entry: daphnec takes DaphneDSL and DaphneIR

#### • #3 System Internals

• Extended DaphneIR, new optimization passes, custom compilation chains



| Artifact   | Туре      | Cost | Lib       |
|------------|-----------|------|-----------|
| compress   | K-Reorg   |      | ./clib.so |
| mm_asic    | K-Matmult |      | ./mma.so  |
| CompMatrix | D-Matrix  |      | ./clib.so |

X = sparse(Y); X = compress(Y); X = device(Y, "/GPU:0"); X = Y @\_gpu Z;

### **Experiments: Simple IDA Pipelines**

**Setup:** Single node w/ 2x Intel Xeon Gold 6238 (112 vcores, 7.7 TFLOP/s), 768 GB DDR4 RAM, 12x 2TB SSDs (data), NVIDIA **T4 GPU** (8.1 TFLOP/s, 16 GB), and Intel FPGA PAC D5005 (w/ Stratix **10SX FPGA**, 32 GB) since Dec 29

**P1:** TPC-H SF10 csv, query processing + linear regression training on CPUs



#### **P2:** So2Sat LCZ42 csv (testset), ResNet-20 scoring on GPU



### **Experiments: Vectorized Execution**



#### • Ongoing Experiments

- FPGA kernels on D5005, CPU+GPU vectorized pipelines
- Distributed sparse runtime operations on Vega supercomputer
- Sparse vectorized pipelines and scheduling algorithms

### Summary



# DM + ML + HPC

#### Current Status

- System architecture and design
- Initial DSL and Python API
- Prototype of MLIR-based compiler and runtime
- Vectorized execution (fused pipelines, scheduling)
- GPU (and FPGA) integration, BLAS/DNN libraries, I/O primitives
- Standalone distributed runtime w/ different distribution primitives

# DAPHNE Overall Objective: Open and extensible system infrastructure

#### • Joint Paper on System Architecture

• Published at CIDR 2022

#### DAPHNE: An Open and Extensible System Infrastructure for Integrated Data Analysis Pipelines

Patrick Damme<sup>1</sup>, Marius Birkenbach<sup>10</sup>, Constantinos Bitsakos<sup>6</sup>, Matthias Boehm<sup>1</sup>,
Philippe Bonnet<sup>9</sup>, Florina Ciorba<sup>12</sup>, Mark Dokter<sup>1</sup>, Pawel Dowgiallo<sup>8</sup>, Ahmed Eleliemy<sup>12</sup>,
Christian Faerber<sup>8</sup>, Georgios Goumas<sup>6</sup>, Dirk Habich<sup>11</sup>, Niclas Hedam<sup>9</sup>, Marlies Hofer<sup>2</sup>,
Wenjun Huang<sup>3</sup>, Kevin Innerebner<sup>1</sup>, Vasileios Karakostas<sup>6</sup>, Roman Kern<sup>1</sup>, Tomaž Kosar<sup>13</sup>,
Alexander Krause<sup>11</sup>, Daniel Krems<sup>2</sup>, Andreas Laber<sup>7</sup>, Wolfgang Lehner<sup>11</sup>, Eric Mier<sup>11</sup>,
Marcus Paradies<sup>3</sup>, Bernhard Peischl<sup>2</sup>, Gabrielle Poerwawinata<sup>12</sup>, Stratos Psomadakis<sup>6</sup>,
Tilmann Rabl<sup>5</sup>, Piotr Ratuszniak<sup>8</sup>, Pedro Silva<sup>5</sup>, Nikolai Skuppin<sup>3b</sup>, Andreas Starzacher<sup>7</sup>,
Benjamin Steinwender<sup>10</sup>, Ilin Tolovski<sup>5</sup>, Pinar Tözün<sup>9</sup>, Wojciech Ulatowski<sup>8</sup>,
Yuanyuan Wang<sup>3b</sup>, Izajasz Wrosz<sup>8</sup>, Aleš Zamuda<sup>13</sup>, Ce Zhang<sup>4</sup>, Xiao Xiang Zhu<sup>3b</sup>

<sup>1</sup> Know-Center GmbH/TU Graz, Austria; <sup>2</sup> AVL List GmbH, Austria; <sup>3</sup> DLR, <sup>3b</sup> DLR/TU Munich, Germany;
 <sup>4</sup> ETH Zurich, Switzerland; <sup>5</sup> HPI/Uni Potsdam, Germany; <sup>6</sup> ICCS/NTUA, Greece; <sup>7</sup> Infineon, Austria;
 <sup>8</sup> Intel, Poland; <sup>9</sup> ITU Copenhagen, Denmark; <sup>10</sup> KAI GmbH, Austria; <sup>11</sup> TU Dresden, Germany;
 <sup>12</sup> University of Basel, Switzerland; <sup>13</sup> University of Maribor, Slovenia

#### ABSTRACT

Integrated data analysis (IDA) pipelines-that combine data man-

often include data access via open formats, data pre-processing and cleaning, ML model training and scoring, HPC libraries and

### **Further Information**

### • DAPHNE is open-source software

- <u>https://github.com/daphne-eu/daphne</u>
- Apache v2 license
- Towards an inclusive dev community
- ➔ Potential for collaboration in 2022-2024



Enable researchers to experiment with new prototypes and extensions

- Check out our website
  - <u>https://daphne-eu.eu</u>
- Follow us on twitter
  - <u>@daphne\_eu</u>

| Product $\lor$ Te | am Enterprise Explore ~ Marketplace Prici<br>Iaphne (Public) |                   | ifications V Fork 19 🗘 Sign in Sign                                                                             |  |
|-------------------|--------------------------------------------------------------|-------------------|-----------------------------------------------------------------------------------------------------------------|--|
| <> Code ⊙ Issue   | s 118 IN Pull requests 23 ⓒ Actions                          | 🗄 Projects 🕮 Wiki | ③ Security L Insights                                                                                           |  |
| រូវ main 🖌 🐉 ទ    | 7 branches 📀 0 tags Go                                       | to file Code -    | About                                                                                                           |  |
| pdamme [MING      | DR] Shape inference for MatM decal3f 7 days age              | 0 🕚 542 commits   | DAPHNE: An Open and Extensible Syster<br>Infrastructure for Integrated Data Analys<br>Pipelines                 |  |
| 📄 deploy          | [DAPHNE-#236] Scripts for deploying DAPH                     | 3 months ago      | 🕮 Readme                                                                                                        |  |
| 🖿 doc             | [MINOR] Updated package dependencies to                      | . 13 days ago     | <ul> <li>Inconne</li> <li>Inconne</li> <li>Apache-2.0 license</li> <li>30 stars</li> <li>In watching</li> </ul> |  |
| scripts           | [BUGFIX] Pipelines without outputs caused s                  | . 20 days ago     |                                                                                                                 |  |
| src src           | [MINOR] Shape inference for MatMulOp.                        | 7 days ago        |                                                                                                                 |  |
| 🖿 test            | [BUGFIX] Casting of scalar inputs of vectoriz                | 16 days ago       | 😵 19 forks                                                                                                      |  |
| thirdparty        | [DAPHNE-241,DAPHNE-359] Build system i                       | 2 months ago      |                                                                                                                 |  |
| 🗋 .gitignore      | [DAPHNE-241,DAPHNE-359] Build system i                       | 2 months ago      | 5                                                                                                               |  |
| .gitmodules       | Using MLIR/LLVM as a submodule.                              | 17 months ago     | No releases published                                                                                           |  |
| CMakeLists.txt    | [DAPHNE-210] File meta data in JSON                          | 2 months ago      | P. J                                                                                                            |  |
|                   | .md [MINOR] Clarified contribution guidelines                | 3 months ago      | Packages<br>3 months ago<br>No packages published                                                               |  |
|                   |                                                              |                   |                                                                                                                 |  |



# Backup

