Reaching consensus despite faulty or corrupted nodes is a central question in distributed computing; it has received renewed attention over the last years because of its importance for cryptocurrencies and blockchain networks. Modern consensus protocols in this space have relied on a number of different methods for the nodes to influence protocol decisions. Such assumptions include (1) traditional voting, where each node has one vote, (2) weighted voting, where voting power is proportional to stake in an underlying asset, and (3) proof-of-X, which demonstrates a cryptographically verifiable investment of a resource X, such as storage space, time waited, or computational work.
This talk will first give an overview of blockchain consensus methods. Then it presents recent results on the "Snow" consensus protocol family, which is used by the Avalanche blockchain. Avalanche and its AVAX token are among the top-10 of all cryptocurrencies today. Snow consensus differs from all other protocols used in cryptocurrencies and has strong connections to consensus dynamics, a research question analyzed earlier in the theory of distributed computing.
This talk is based on joint work with Ignacio Amores Sesar, Philipp Schneider, and Enrico Tedeschi and publications at OPODIS 2022 [2] and SIROCCO 2024 [1].
Combinatorial Multi-Armed Bandit (CMAB) techniques have been widely used to solve many real-world problems. This paper presents a geometric version of a stochastic facility allocation problem and proposes a novel approach using CMAB to solve it. This problem is concerned with optimally locating facilities in a 2-dimensional space with spatially variant population density, where the aim is to maximize the expected attraction of the population to the facility points over multiple rounds. We propose an algorithm based on the Upper Confidence Bound (UCB) principle and leverage Voronoi diagrams to model the attraction of the population to facilities. We consider both the Manhattan and Euclidean distances in the attraction model. The model has a broad range of potential applications, making the solution a feasible approach for similar optimization problems in dynamic and uncertain environments. To the best of our knowledge, this is the first work that applies CMAB models on 2-dimensional spaces with uncertain underlying distributions. Our findings show the potential of machine learning-based solutions, such as the CMAB approach, in advancing the design and implementation of distributed computing systems. Furthermore, we derive the regret bounds of the proposed algorithm. This analysis is complemented by numerical simulations demonstrating the efficiency and effectiveness of our method. The simulation was done on both real-world traces and synthesized data.
Blockchain governance plays a crucial role in projects utilizing blockchain technology, as it defines decision-making and the evolution of the project over time. In this paper, we are particularly interested in liquid democracy governance models which have recently gained attention and which allow to delegate votes. We empirically study existing deployments of these models on Gitcoin and the Internet Computer, and measure the concentration of the voting power. We observe quite a high skew, which contrasts with the otherwise highly decentralized nature of the application. We hope that our preliminary insights can lead to follow-up work in the community and nourish the discussion on the different governance designs.
Recently, a new communication abstraction called Mutual Broadcast has been proposed for message-passing distributed systems where some processes may fail by crashing. It is a one-to-all broadcast abstraction providing an ordering property that allows it to be computationally equivalent to atomic registers. This paper proposes an adaptation of this abstraction, Causal Mutual Byzantine Broadcast (in short CMB-Broadcast) for message-passing systems where some processes may experience Byzantine faults. Byzantine faults are a more severe failure model compared to crash failures. A Byzantine process can behave arbitrarily. After defining this new communication abstraction, we show how it can be used to emulate atomic registers and also how it can be implemented using quorums and the famous Byzantine reliable broadcast abstraction of Bracha. We also prove a necessary condition on the size of the quorums.
The advent of the computing continuum, i.e., the blending of all existing computational tiers, calls for novel techniques and methods that consider its complex dynamics. This work presents the DeepSLO as a novel design paradigm to define and structure Service Level Objectives (SLOs) for distributed computing continuum systems. Hence, when multiple stakeholders are involved, the DeepSLO allows them to plan the overarching behaviors of the system. Further, the techniques employed (Bayesian networks, Markov blanket, Active inference) provide autonomy and decentralization to each SLO while the DeepSLO hierarchy remains to account for objectives dependencies. Finally, DeepSLOs are represented graphically, as well as individual SLOs enabling a human interpretation of the system performance.
In this short paper we present our ongoing research on a lightweight blockchain based on a solution for state machine replication with proofs. More precisely, we show how to adapt a simple median rule for the stabilizing consensus problem [4] to obtain a stabilizing solution for state machine replication and how to extend it with a Merkle hash forest so that clients can easily prove for any of their committed commands that it has indeed been committed.
To reach the goal of zero traffic fatalities a year, one building block is the proposition to develop advanced assistance systems for vulnerable road users (VRUs) such as bicyclists. We focus on the dooring problem, i.e., car doors being opened inattentively in the way of an approaching cyclist. We extended our vehicle to everything (V2X) communication-enabled virtual cycling environment for dooring experiments. Our system extends toolkits that are widely used in the V2X research community. We showcase how such a system may be used to realize and evaluate distributed algorithms for VRU safety solutions such as dooring prevention.
Concurrent data structures are essential building blocks for applications in almost every domain. However, differences among domains make it difficult to share results. Some testing frameworks model specific workloads, while others emphasize stress tests, but there is no easy way to evaluate if a new data structure will accelerate an application based only on such benchmarks. A confounding factor is the huge space of configuration options.
We address these issues with a new benchmarking tool for concurrent data structures in Java and C++. Our tool emphasizes a declarative description of workloads, a modular approach to defining components, and heterogeneity throughout.
As preliminary evidence of the effectiveness of our tool, we show that it can implement important and realistic workloads. We present six different workloads that lead to all six possibilities of relevant performance of the three most popular binary search trees written in Java.
The rapid advancements in machine learning (ML) techniques have led to significant achievements in various robotic tasks. Deploying these ML approaches on real-world robots requires fast and energy-efficient inference of their deep neural network (DNN) models. To our knowledge, distributed inference, which involves inference across multiple powerful GPU devices, has emerged as a promising optimization to improve inference performance in modern data centers. However, when deployed on real-world robots, existing parallel methods can not simultaneously meet the robots' latency and energy requirements and raise significant challenges.
This paper reveals and evaluates the problems hindering the application of these parallel methods in robotic IoT, including the failure of data parallelism, the unacceptable communication overhead of tensor parallelism, and the significant transmission bottlenecks in pipeline parallelism. By raising awareness of these new problems, we aim to stimulate research toward finding a new parallel method to achieve fast and energy-efficient distributed inference in robotic IoT.
This study focuses on methods to improve the reliability of persistent memory systems. By utilizing Montage (ICPP'21), we identify areas that have potential for enhancement, particularly in relation to the vulnerability of data loss in certain failure situations.
Our research directed our attention towards the idea of snapshotting and its role in enhancing system resilience. We explore different consistency models for snapshotting methods and provide a novel definition for snapshotting consistency called Buffered-Durable Consistency. Montage provides resilience against system-wide crash failures. Nevertheless, persistent memory failures can result in significant data loss. In response to this challenge, we provide two methods for creating snapshots of memory-mapped files: stop-the-world and online snapshotting mechanisms. These approaches replicate only the updated data parts since the last snapshot, thereby greatly lowering the amount of data copying required during snapshot operations. Our online snapshotting approach enables modifications to chunks that are not being copied by the snapshotter, hence enhancing system responsiveness. In addition, we developed an algorithm that deactivates the reader locks while snapshotting is not ongoing. In order to demonstrate the efficacy of our methods, we provide an experimental examination that shows the throughput across several scenarios.
In summary, our work enhances the fault tolerance capabilities of Montage and provides valuable insights and new avenues for further study and optimization in the area of persistent memory, particularly for memory-mapped file replication.
Distributed tracing is a method used to monitor applications by tracking and visualizing requests as they move across various components and services in a distributed system. Despite being widely adopted in major cloud-computing applications, to the best of our knowledge, distributed tracing has not been employed in Distributed Shared Memory (DSM) emulations. In such emulations, typically, a set of networked nodes (servers) maintain copies of the memory data, and a set of clients (readers/writers) access the data by sending messages to the servers. The main challenge in this environment is to maintain the consistency of the data despite asynchrony and failures. Traditionally, the latency of operations in DSM implementations has been evaluated through simple log-based strategies providing a high-level performance analysis.
This paper introduces distributed tracing to DSM, in an attempt to provide a fine-grain performance analysis, helping to identify performance bottlenecks. To this respect, we use Ares as a case study. Ares is a crash-tolerant DSM algorithm, providing atomic consistency and supporting dynamic participation of networked nodes. Our approach employs a set of tracing tools: Opentelemetry for code instrumentation, Jaeger for telemetry data collection, and Grafana for visualization.
In recent times, the importance of efficiently mapping software threads to hardware threads has been steadily increasing to maximize the performance of multi-threaded applications. Systematically combining computational resources, flexible runtime thread redistribution, and selective processor dedication improves the performance of multi-threaded applications. This paper introduces a novel pinning strategy: OCC pinning (Optimizing Concurrent Computations through Thread Pinning). It leverages the operating system's integration and granular affinity binding to distribute threads across cores. Furthermore, it introduces a reusable pattern to bridge the gap between language-level concurrency and hardware. To study the performance of OCC pinning, we consider three concurrent data structures (Contention Adapting Binary Search Tree, Striped Hash Set, and Lazy List) and three pinning strategies (NUMA pinning, No pinning, and OCC pinning). We observe that OCC pinning across different workloads exhibits an average speedup of 1.16X and 1.26X compared to NUMA pinning and No pinning, respectively. Our technique also optimizes cache locality, helping achieve significant performance gains compared to traditional methods. This, in turn, helps unlock the full potential of concurrent platforms by effectively utilizing the entire spectrum of computing resources.
This paper explores the limitations of failure atomicity in Intel's Double-width Compare-and-Swap (DWCAS), an essential instruction for lock-free and wait-free algorithms in concurrent computing. DWCAS, known as cmpxchg16b on the Intel platform, can manipulate a pair of consecutive (and properly aligned) 8-byte words in a manner that makes updates to these memory locations visible atomically, at least in traditional systems with volatile main memories. At the same time, Intel's technical documents warn that only 8-byte stores are guaranteed to be power-fail atomic in systems equipped with non-volatile memories, meaning that such operations cannot be torn by a power failure. We review the implications of this limitation on the integrity of algorithms that assume DWCAS is both atomic in the traditional sense and also failure-atomic. Through this analysis, we aim to clarify misconceptions and advocate for robust software-based alternatives to ensure true power-fail atomicity in DWCAS operations.
Increasing amounts of data are sensed at the edge of the Edge-to-Cloud (E2C) continuum, enabling the rapid development of data-driven applications based on, e.g., Machine Learning. This is especially true for Vehicular Cyber-Physical Systems (VCPSs), networks of connected vehicles equipped with high-bandwidth sensors, where Big Data originating on the vehicles is crucial for the advancement of autonomous drive, developing new cars, and more. Limited bandwidth and storage mean that moving this vehicular Big Data from the edge to central processing increasingly poses challenges. In this work, we present our research on how to alleviate these through efficiently localizing data on the edge, selecting relevant data in a data stream, and distributing the processing of data in a VCPS.