The aim of this one-day workshop is to bring together researchers who are interested in optimizing database performance on modern computing infrastructure by designing new data management techniques and tools.
The continued evolution of computing hardware and infrastructure imposes new challenges and bottlenecks to program performance. As a result, traditional database architectures that focus solely on I/O optimization increasingly fail to utilize hardware resources efficiently. Multi-core CPUs, GPUs, FPGAs, new memory and storage technologies (such as flash and non-volatile memory), and low-power hardware imposes a great challenge to optimizing database performance. Consequently, exploiting the characteristics of modern hardware has become an important topic of database systems research.
The goal is to make database systems adapt automatically to the sophisticated hardware characteristics, thus maximizing performance transparently to applications. To achieve this goal, the data management community needs interdisciplinary collaboration with computer architecture, compiler, operating systems and storage researchers. This involves rethinking traditional data structures, query processing algorithms, and database software architectures to adapt to the advances in the underlying hardware infrastructure.
We seek submissions bridging the area of database systems to computer architecture, compilers, and operating systems. In particular, submissions covering topics from the following non-exclusive list are encouraged:
We invite submissions to two tracks:
Full papers: A full paper must be no longer than 6 pages excluding the bibliography. There is no limit on the length of the bibliography. Full papers describe a complete work in the area of data management for new hardware. Accepted papers will be given 10 pages (plus bibliography) for the camera-ready version and a long presentation slot during the workshop.
Short Papers: Short papers must not exceed 2 pages excluding the bibliography. Short papers describe very early stage works or summaries of mature systems. Short papers will be included in the proceedings, given 4 pages (plus bibliography) for the camera-ready version, and may be given a short presentation slot during the workshop.
All accepted papers (full and short) will also be presented as posters during a workshop poster session.
This year all accepted DaMoN papers will be considered for a best paper award.
We intend to invite extended versions of a selection of DaMoN'24 papers for submission to the VLDB Journal. Extended papers that are accepted by the VLDB Journal will appear in a special “Best of DaMoN 2024” section within one of the regular VLDBJ issues.
Paper submission: March 15th, 2024 March 22nd, 2024 (11:59pm PST)
Notification of acceptance: April 26th, 2024 April 29th, 2024
Camera-ready copies: May 10th, 2024 May 15th, 2024 (23:59pm PST)
Workshop: June 10th, 2024
Authors are invited to submit original, unpublished research papers that are not being considered for publication in any other forum. Manuscripts should be submitted electronically as PDF files using the latest ACM paper format consistent with the ACM SIGMOD formatting guidelines to the DaMoN 2024 CMT site, at https://cmt3.research.microsoft.com/DaMoN2024. Submissions will be reviewed in a single-blind manner. Submissions that are 2 pages or shorter excluding the bibliography will be reviewed as short papers. Submissions that are 6 pages or shorter excluding the bibliography will be reviewed as full papers. Submissions that are longer than 6 pages excluding the bibliography will be desk-rejected.
Accepted papers will be included within the informal online proceedings at the website. Additionally, all accepted papers will be published online in the ACM Digital Library. Therefore, the papers must include the standard ACM copyright notice on the first page.
10:10-11:00 Keynote (Philippe Bonnet)
11:00-11:15 SFVInt: Simple, Fast and Generic Variable-Length Integer Decoding using Bit Manipulation Instructions. Gang Liao (University of Maryland); Ye Liu (Bytedance); Yonghua Ding (Bytedance); Le Cai (Bytedance); Jianjun Chen (Bytedance).
11:15-11:30 NULLS!: Revisiting Null Representation in Modern Columnar Formats. Xinyu Zeng (Tsinghua University); Ruijun Meng (Tsinghua University); Andrew Pavlo (Carnegie Mellon University); Wes McKinney (Posit PBC); Huanchen Zhang (Tsinghua University).
11:30-11:45 Efficient Data Access Paths for Mixed Vector-Relational Search. Viktor Sanca (EPFL); Anastasia Ailamaki (EPFL).
11:45-12:00 Simple, Efficient, and Robust Hash Tables for Join Processing. Altan Birler (TU Munich); Tobias Schmidt (TU Munich); Philipp Fent (CedarDB); Thomas Neumann (TU Munich).
12:00-12:10 In situ Neighborhood Sampling for Large-Scale GNN Training. Yuhang Song (Boston University); Po Hao Chen (Brown University); Yuchen Lu (Boston University); Naima Abrar Shami (Boston University); Vasiliki Kalavri (Boston University).
14:00-14:30 Invited Talk (David Patterson)
14:30-14:45 So Far and yet so Near - Accelerating Distributed Joins with CXL. Alexander Baumstark (TU Ilmenau); Marcus Paradies (TU Ilmenau); Kai-Uwe Sattler (TU Ilmenau); Steffen Kläbe (Actian); Stephan Baumann (Actian).
14:45-14:55 Seamless: Transparent Storage Access Through Smart Switches. Simon Binder (TU Darmstadt); Matthias Jasny (TU Darmstadt); Tobias Ziegler (TU Darmstadt).
14:55-15:10 How Does Software Prefetching Work on GPU Query Processing?. Yangshen Deng (Southern University of Science and Technology); Shiwen Chen (Southern University of Science and Technology); Zhaoyang Hong (Southern University of Science and Technology); Bo Tang (Southern University of Science and Technology).
15:10-15:25 How to Be Fast and Not Furious: Looking Under the Hood of CPU Cache Prefetching. Roland Kühn (TU Dortmund); Jan Mühlig (TU Dortmund); Jens Teubner (TU Dortmund).
16:00-16:30 Fresh Thinking Talk (Manos Athanassoulis)
16:30-16:45 The Price of Privacy: A Performance Study of Confidential Virtual Machines for Database Systems. Lina Qiu (Boston University); Rebecca Taft (Cockroach Labs); Alexander Shraer (Cockroach Labs); George Kollios (Boston University).
16:45-16:55 DuckDB-SGX2: The Good, The Bad and The Ugly within Confidential Analytical Query Processing. Ilaria Battiston (CWI); Charlotte C Felius (CWI); Sam Ansmink (DuckDB Labs); Laurens Kuiper (CWI, DuckDB Labs); Peter Boncz (CWI).
16:55-17:10 Heterogeneous Intra-Pipeline Device-Parallel Aggregations. Artem Kroviakov (TU Munich); Petr Kurapov (Intel); Christoph Anneser (TU Munich); Jana Giceva (TU Munich).
17:10-17:20 Performance or Efficiency? A Tale of Two Cores for DB Workloads. Rathijit Sen (Microsoft).
17:20-17:35 Accelerating GPU Data Processing using FastLanes Compression. Azim Afroozeh (CWI); Charlotte C Felius (CWI); Peter Boncz (CWI).
Panelists: Anastasia Ailamaki, Manos Athanassoulis, Peter Boncz, Philippe Bonnet
Abstract: NVMe is synonymous with modern storage. It was introduced as a means to efficiently expose Solid-State Drives as PCIe 3.0 peripherals. With NVMe, I/Os were no longer the bottleneck. Initially, the challenge for operating system and database system designers was to accomodate radically faster storage devices. Then, SSDs evolved to meet a range of cost/performance requirements. Accordingly, NVMe 2.0 introduced new transport models, storage models and cross-layer optimizations. This diversity introduced new challenges. Today, NVMe passthru and Flexible Data Placement enable data systems designers to shape how data is stored, instead of designing their systems around the characteristics of opaque storage devices. Computational storage was supposed to further improve the ability of system designers to specialize storage devices to fit their workloads. However, device memory management became a challenge. We discuss the proposed standard and speculate on the role NVMe may play in future data systems, in a context where CXL emerges, PCIe 7.0 is being standardized and power consumption is the bottleneck.
Abstract: We start with a review of the instability of modern hardware, given the
Data is becoming more critical than compute due to its increasing cost and slowing capacity curves for memory and storage. Data location and movement are now central to cost and performance. To build robust systems in light of these changes, we must shift the focus of hardware and software design from processing to the memory, storage, and network components.
Abstract: What if we could access any layout and ship only the relevant data through the memory hierarchy by transparently converting rows to (arbitrary groups of) columns? We capitalize on the reinvigorated trend of hardware specialization to propose Relational Fabric, a near-data vertical partitioner that allows memory or storage components to perform on-the-fly transparent data transformation. By exposing an intuitive API, Relational Fabric pushes vertical partitioning to the hardware, which has a profound impact on the process of designing and building data systems. (A) There is no need for data duplication and layout conversion, making hybrid systems viable using a single layout. (B) It simplifies the memory and storage manager. (C) It reduces unnecessary data movement through the memory hierarchy allowing for better hardware utilization and, ultimately, better performance. In this talk, I will introduce the Relational Fabric vision and present our initial results on in-memory systems. I will also share some of the challenges of building this hardware and the opportunities it brings for simplicity and innovation in the data system software stack, including physical design, query processing, and concurrency control, and conclude with ongoing work for data transformation for general workloads including matrix and tensor processing.
SFVInt: Simple, Fast and Generic Variable-Length Integer Decoding using Bit Manipulation Instructions
Gang Liao (University of Maryland); Ye Liu (Bytedance); Yonghua Ding (Bytedance); Le Cai (Bytedance); Jianjun Chen (Bytedance)
The Price of Privacy: A Performance Study of Confidential Virtual Machines for Database Systems
Lina Qiu (Boston University); Rebecca Taft (Cockroach Labs); Alexander Shraer (Cockroach Labs); George Kollios (Boston University)
Heterogeneous Intra-Pipeline Device-Parallel Aggregations
Artem Kroviakov (TU Munich); Petr Kurapov (Intel); Christoph Anneser (TU Munich); Jana Giceva (TU Munich)
Simple, Efficient, and Robust Hash Tables for Join Processing
Altan Birler (TU Munich); Tobias Schmidt (TU Munich); Philipp Fent (CedarDB); Thomas Neumann (TU Munich)
How Does Software Prefetching Work on GPU Query Processing?
Yangshen Deng (Southern University of Science and Technology); Shiwen Chen (Southern University of Science and Technology); Zhaoyang Hong (Southern University of Science and Technology); Bo Tang (Southern University of Science and Technology)
Efficient Data Access Paths for Mixed Vector-Relational Search
Viktor Sanca (EPFL); Anastasia Ailamaki (EPFL)
So Far and yet so Near - Accelerating Distributed Joins with CXL
Alexander Baumstark (TU Ilmenau); Marcus Paradies (TU Ilmenau); Kai-Uwe Sattler (TU Ilmenau); Steffen Kläbe (Actian); Stephan Baumann (Actian)
Accelerating GPU Data Processing using FastLanes Compression
Azim Afroozeh (CWI); Charlotte C Felius (CWI); Peter Boncz (CWI)
How to Be Fast and Not Furious: Looking Under the Hood of CPU Cache Prefetching
Roland Kühn (TU Dortmund); Jan Mühlig (TU Dortmund); Jens Teubner (TU Dortmund)
NULLS!: Revisiting Null Representation in Modern Columnar Formats
Xinyu Zeng (Tsinghua University); Ruijun Meng (Tsinghua University); Andrew Pavlo (Carnegie Mellon University); Wes McKinney (Posit PBC); Huanchen Zhang (Tsinghua University)
In situ Neighborhood Sampling for Large-Scale GNN Training
Yuhang Song (Boston University); Po Hao Chen (Brown University); Yuchen Lu (Boston University); Naima Abrar Shami (Boston University); Vasiliki Kalavri (Boston University)
Performance or Efficiency? A Tale of Two Cores for DB Workloads
Rathijit Sen (Microsoft)
Seamless: Transparent Storage Access Through Smart Switches
Simon Binder (TU Darmstadt); Matthias Jasny (TU Darmstadt); Tobias Ziegler (TU Darmstadt)
DuckDB-SGX2: The Good, The Bad and The Ugly within Confidential Analytical Query Processing
Ilaria Battiston (CWI); Charlotte C Felius (CWI); Sam Ansmink (DuckDB Labs); Laurens Kuiper (CWI, DuckDB Labs); Peter Boncz (CWI)
TU Darmstadt, Germany
carsten.binnig@cs.tu-darmstadt.de
Intel Labs and MIT, USA
tatbul@csail.mit.edu
EPFL, Switzerland
anastasia.ailamaki@epfl.ch
CWI, Netherlands
boncz@cwi.nl
CWI, Netherlands
stefan.manegold@cwi.nl
Columbia University, USA
kar@cs.columbia.edu