At first sight, distributed computing and machine learning are two distant areas in computer science. However, there are many connections, for instance in the area of graphs, which are the focus of my talk. Distributed computing has studied distributed graph algorithms for many decades. Meanwhile in machine learning, graph neural networks are picking up steam. When it comes to dealing with graphical inputs, one can almost claim that graph neural networks are an application of distributed algorithms. I will introduce central concepts in learning such as underreaching and oversquashing, which have been known in the distributed computing community for decades, as local and congest models. In addition I am going to present some algorithmic insights, and a software framework that helps with explaining learning. Generally speaking, I would like to present a path to learning for those who are familiar with distributed message passing algorithms. This talk is based on a number of papers recently published at learning conferences such as ICML and NeurIPS, co-authored by Pál András Papp and Karolis Martinkus.
Intelligent IoT is a prerequisite for societal priorities such as a smart power grid, smart urban infrastructures and smart highways. These applications bring requirements such as real-time guarantees, data and action consistency, fault-tolerance, high availability, temporal data indexing, scalability, and even self-organization and self-stabilization. Existing platforms are oriented towards asynchronous, out of band upload of data to the cloud: Important functionality, but not enough to address the need. Cornell's Cascade project seeks to close the gap by creating a new platform for hosting ML and AI, optimized to achieve sharply lower delay and substantially higher bandwidth than in any existing platform. At the same time, Cascade introduces much stronger guarantees - a mix that we believe will be particularly appealing in applications where events should trigger a quick and trustworthy response. This short paper is intended as a brief overview of the effort, with details to be published elsewhere.
The use of approximation is fundamental in computational science. Almost all computational methods adopt approximations in some form in order to obtain a favourable cost/accuracy trade-off and there are usually many approximations that could be used. As a result, when a researcher wishes to measure a property of a system with a computational technique, they are faced with an array of options. Current computational workflow frameworks focus on helping researchers automate a sequence of steps on a particular platform. The aim is often to obtain a computational measurement of a property. However these frameworks are unaware that there may be a large number of ways to do so. As such, they cannot support researchers in making these choices during development or at execution-time.
We argue that computational workflow frameworks should be designed to beapproximation-aware - that is, support the fact that a given workflow description represents a task thatcould be performed in different ways. This is key to unlocking the potential of computational workflows to accelerate discovery tasks, particularly those involving searches of large entity spaces. It will enable efficiently obtaining measurements of entity properties, given a set of constraints, by directly leveraging the space of choices available. In this paper we describe the basic functions that an approximation-aware workflow framework should provide, how those functions can be realized in practice, and illustrate some of the powerful capabilities it would enable, including approximate memoization, surrogate model support, and automated workflow composition.
IoT (Internet of Things) sensor devices are becoming ubiquitous in diverse smart environments, including smart homes, smart cities, smart laboratories, and others. To handle their IoT sensor data, distributed edge-cloud infrastructures are emerging to capture, distribute, and analyze them and deliver important services and utilities to different communities. However, there are several challenges for these IoT-edge-cloud infrastructures to provide efficient and effective services to users: (1) how to deliver real-time distributed services under diverse IoT devices, including cameras, meteorological and other sensors; (2) how to provide robustness and resilience of distributed services within the IoT-edge-cloud infrastructures to withstand failures or attacks; (3) how to handle AI workloads are in an efficient manner under constrained network conditions. To address these challenges, we present DARTS, which is composed of different IoT, edge, cloud services addressing application portability, real-time robust data transfer and AI-driven capabilities. We benchmark and evaluate these services to showcase the practical deployment of DARTS catering to application-specific constraints.
Rapid technological developments in autonomous unmanned aerial vehicles (or drones) could soon lead to their large-scale implementation in the last-mile delivery of products. However, drones have a number of problems such as limited energy budget, limited carrying capacity, etc. On the other hand, trucks have a larger carrying capacity, but they cannot reach all the places easily. Intriguingly, last-mile delivery cooperation between drones and trucks can synergistically improve delivery efficiency.
In this paper, we present a drone-truck co-operated delivery framework under time-varying dynamics. Our framework minimizes the total delivery time while considering low energy consumption as the secondary objective. The empirical results support our claim and show that our algorithm can help to complete the deliveries time efficiently and saves energy.
We are reaching the limits of the von Neumann computing architectures (also called Moore's law era) as there is no free ride of the performance growth from simply shrinking the transistor features. As one of the consequences, we experience the rise of highly specialized architectures ranging from neuromorphic to quantum computing, exploiting completely different physical phenomena and demanding the development of entirely new architectures - that, however, can perform the computations within a fraction of the energy needed by the von Neumann architecture. Thus, we experience the paradigm shift from generalized architectures of the Von Neumann era to highly specialized architectures in the Post-Moore era where we expect the coexistence of multiple types of architectures specialized for different types of computation. In this paper, we discuss the implications of the post-Moore era for distributed systems.
Many researchers reported considerable delay of up to a few seconds when invoking serverless functions for the first time. This phenomenon, which is known as a cold start, affects even more when users are running multiple serverless functions orchestrated in a workflow. However, in many cases users need to instantly spawn numerous serverless functions, usually as a part of parallel loops. In this paper, we introduce the spawn start and analyze the behavior of three Function-as-a-Service (FaaS) providers AWS Lambda, Google Cloud Functions, and IBM Cloud Functions when running parallel loops, both as warm and cold starts. We conducted a series of experiments and observed three insights that are beneficial for the research community. Firstly, cold start on IBM Cloud Functions, which is up to 2s delay compared to the warm start, is negligible compared to the spawn start because the latter generates additional 20s delay. Secondly, Google Cloud Functions' cold start is "warmer" than the warm start of the same serverless function. Finally, while Google Cloud Functions and IBM Cloud Functions run the same serverless function with low concurrency faster than AWS Lambda, the spawn start effect on Google Cloud Functions and IBM Cloud Functions makes AWS the preferred provider when spawning numerous serverless functions.
We present QUANTAS: a simulator that enables quantitative performance analysis of distributed algorithms. It has a number of attractive features. QUANTAS is an abstract simulator, therefore, the obtained results are not affected by the specifics of a particular network or operating system architecture. QUANTAS allows distributed algorithms researchers to quickly investigate a potential solution and collect data about its performance. QUANTAS programming is relatively straightforward and is accessible to theoretical researchers working in this area. To demonstrate QUANTAS capabilities, we implement and compare the behavior of two representative examples from four major classes of distributed algorithms: blockchains, distributed hash tables, consensus, and reliable data link message transmission.
Virtual and Augmented Reality is one of the key driving technologies of the 4th Industrial Revolution, which is expected to radically disrupt almost every business sector and transform the way we live and interact with our environment and each other. End-user devices will soon enable users to immerse in 3D Virtual Environments (VEs) that offer access to remote services, such as health care, training and education, entertainment and social interaction. The advent of fast highly-available network connectivity in combination with afford- able 3D hardware (GPUs, VR/AR HMDs, etc.) has enabled making Networked Virtual Environments (NVEs) possible and available to multiple simultaneous end-users beyond the confines of expensive purpose-built 3D facilities and laboratories.
However, the algorithms making possible the NVEs of today are already reaching their limits, proving unreliable, suffer asynchronies and deployed over an inherently fault-prone network infrastructure. Current developments of distributed architectures used in NVEs handle concurrency by either providing weak consistency guarantees (e.g., eventual consistency), or by relying on the bounded life span of inconsistent states. Although sufficient for non-critical, yet time sensitive applications, those solutions will be incapable of handling the next generation of interactive Virtual Environments, where precise consistency guarantees will be required. Thus, new scalable, robust, and responsive strategies that can support the needs of the NVEs of tomorrow are necessary.
Recent scientific works are shifting the viewpoint around the practicality of strongly consistent distributed storage spaces by proposing latency-efficient algorithms of atomic R/W Distributed Shared Memory (DSM) with provable consistency guarantees. In this work we focus on transforming the theoretical findings of DSMs into tangible implementations and in investigating the practicality of those algorithmic solutions in Virtual Environments.
Research on multi-core algorithms is adapting rapidly to the new opportunities and challenges posed by persistent memory. One of these challenges is the fundamental problem of formalizing the behaviour of concurrent objects in the presence of crash failures, and giving precise meaning to the semantics of recovery from such failures. Li and Golab (DISC'21) recently proposed a sequential specification for such recoverable objects, called the detectable sequential specification or DSS. Building on their work, we explore examples of how DSS-based objects can be used by a sample application, and examine more closely the division of labour between the application's environment, the application code, and the recoverable object used by the application. We also propose an alternative formal definition of correctness, called the unified detectable sequential specification (UDSS), that simplifies both the object's interface and the application code. Using a black box transformation, we show how a UDSS-based object can be implemented from one that conforms to Li and Golab's specification. Finally, we present experiments conducted using Intel Optane persistent memory to quantify the performance overhead of our transformation.
The vast amounts of data collected and processed by technologies such as Cyber-Physical Systems require new processing paradigms that can keep up with the increasing data volumes. Edge computing and stream processing are two such paradigms that, combined, allow users to process unbounded datasets in an online manner, delivering high-throughput, low-latency insights. Moving stream processing to the edge introduces challenges related to the heterogeneity and resource constraints of the processing infrastructure. In this work, we present state-of-the-art research results that improve the facilities of Stream Processing Engines (SPEs) with data provenance, custom scheduling, and other techniques that can support the usability and performance of streaming applications, spanning through the edge-cloud contexts, as needed.