The LTN cloud is a specialized global cloud providing real-time, broadcast-quality video transport and processing services supporting diverse workflows for the media industry (www.ltnglobal.com). The system is governed by a persistent control plane that relies on database replication with strict consistency semantics, while also adhering to stringent latency requirements. Notably, the end-to-end latency, from the moment an update is initiated anywhere in the world to the point at which it is reflected globally, is approximately a quarter of a second.
The talk describes the path from an abstract algorithm to a deployed service. This includes (1) the overlay network and group communication architecture that effectively provides a near-perfect communication service over the asynchronous and error-prone Internet, (2) the algorithmic refinements that were necessary to support the latency requirement in the face of server recoveries and network partition repairs, and (3) a thoughtful setup of data centers hosting replicas, striking a balance between strong resilience and low-latency total ordering.
Since its deployment in 2019, the persistent control plane service has experienced a three orders of magnitude increase in the volume of updates. This growth is attributed to the broader adoption of the service as an essential building block for various LTN cloud applications, surpassing its original intended use case.
In this study, we show the advantages of incorporating multi-source knowledge from publicly available sources, such as ChatGPT and Wikipedia, into existing datasets to enhance the performance of machine learning models for routine tasks, such as classification. specifically, we propose the utilization of supplementary data from external sources and demonstrate the utility of widely accessible knowledge in the context of the Forest Cover Type Prediction task launched by the Roosevelt National Forest of Northern Colorado. Additionally, we exhibit an improvement in classification accuracy for the Isolated Letter Speech Recognition dataset when incorporating information on regional accents in the prediction of spoken English letter names.
This article will take you on a journey to the core of blockchains, their Byzantine consensus engine, where HotStuff emerged as a new algorithmic foundation for the classical Byzantine generals consensus problem. The first part of the article underscores the theoretical advances HotStuff enabled, including several models in which HotStuff-based solutions closed problems which were opened for decades. The second part focuses on HotStuff performance in real life setting, where its simplicity drove adoption of HotStuff as the golden standard for blockchain design, and many variants and improvements built on top of it. Both parts of this document are meant to describe lessons drawn from HotStuff as well as dispel certain myths.
The current method for building software infrastructure in disaggregated data centers involves creating new virtual machine monitors or operating system kernels that make the underlying hardware appear as a group of logical servers. Although this approach is effective for maintaining backward compatibility, we suggest that it would be more beneficial to invest in redesigning the applications themselves to be disaggregated along physical boundaries. We propose uniservices, a new programming paradigm that utilizes the actor model and is highly specialized for a single type of hardware resource for disaggregated architecture. Applications would be built from these uniservices, communicating over fast interconnects and a shared log. This approach simplifies operating systems while making better use of available hardware resources.
In the Hydro project we are designing a compiler toolkit that can optimize for the concerns of distributed systems, including scale-up and scale-down, availability, and consistency of outcomes across replicas. This invited paper overviews the project, and provides an early walk-through of the kind of optimization that is possible. We illustrate how type transformations as well as local program transformations can combine, step by step, to convert a single-node program into a variety of distributed design points that offer the same semantics with different performance and deployment characteristics.
Oblivious Transfer (OT) protocol is nowadays recognised as a generic primitive for all cryptographic protocols since it is one of the strongest primitives for secure multiparty computation (MPC). To design hardware-assisted OT protocols, Physically Unclonable Functions (PUFs) have been thoroughly studied in the literature. PUF-based OT protocols, however, necessitate the physical transfer of PUF devices, rendering them inappropriate in situations where the physical transfer of devices is not secure. Additionally, two-party secure computation protocols, in general, are performed between two distrusting parties. Yet, the PUF-based OT protocols require the PUF receiver to trust the PUF sender to behave honestly. Recent works have considered malicious PUF models resulting in more complex protocols. To avoid these drawbacks, we in this work rely on the concept of Physically Related Functions (PReFs), a pair of PUF devices that allow two PUF devices to generate similar responses over a "related" input set. XOR composition of PReFs, also called XOR_PReF, is an improved construction that eliminates the need for any trusted party and generates a "related" input set securely and independently. With the help of this new primitive, we propose 1-out-of-2 OT protocols using XOR_PReF in semi-honest as well as malicious settings keeping the protocol secure even after switching the roles of OT-sender and OT-receiver. Our protocols can be integrated into hardware-assisted MPC protocols and provide a safe and valuable solution for all OT-based techniques, notably Private Set Intersection (PSI), Private Information Retrieval (PIR), and Password Authenticated Key Exchange (PAKE).
Ever since the advent of "multiserver jobs" (jobs that require more than one server or core simultaneously), practitioners have been faced with the question of how to pack these jobs into a compute cluster. While many policies have been proposed, including First-Come-First-Served (FCFS), BackFilling, MaxWeight, and Most Servers First, it is not well understood which policies simultaneously achieve (1) throughput-optimality and also (2) both low and theoretically predictable mean queueing times.
This paper reviews some very recent work from [9, 10] on an alternative packing policy called ServerFilling (SF) and some extensions of this policy. The SF policy achieves both goals (1) and (2) above. This paper discusses and evaluates existing policies in comparison to SF, in order to prompt discussion on the tradeoffs between different scheduling policies.
Distributed applications on the cloud are being developed and deployed as microservices as opposed to the monolithic architecture. Service Meshes have emerged as a way of specifying communication policies between microservices. Service Meshes have the potential to abstract the networking requirements of distributed applications from the application logic. However, current service mesh frameworks introduce significant performance and resource overheads. We study the overheads of service meshes and make a case for redesigning both the control plane and data plane for service meshes. First, we propose the notion of Application Defined Middleboxes, which makes it possible for the mesh control planes to reduce the overheads by optimizing where to implement application policies. Second, we demonstrate preliminary ideas on accelerating the data plane to further reduce the overheads.
Byzantine fault tolerant consensus protocols are implemented with consecutive broadcasts but suffer from a low throughput at large geographical scale or planetary scale. A reason for this inefficiency is believed to be their all-to-all communication complexity, which led researchers to design new consensus protocols with more consecutive one-to-all broadcasts but cumulatively fewer messages.
We show, through a step-by-step evaluation, ranging from LAN/WAN broadcast benchmarks to a state machine replication (SMR) application, that this intuition can be misleading. In particular, we identify two underestimated factors that can impact consensus performance much more at a large scale: (i) the good-put of the broadcast as the rate at which bits are delivered to the application and (ii) the hiccup or waiting time between consecutive broadcast phases. Finally, we show that a leaderless SMR with O(n4) complexity can outperform a leader-based SMR with O(n3) complexity by 20x.
This work promotes a new family of byzantine consensus protocols exclusively based on all-to-all broadcasts that take into account these two factors. Our result promises to impact the design of blockchain systems that aim at performing well in WANs at a planetary scale.
In this paper, we investigate the interplay between early exit mechanisms in deep neural networks and privacy preservation in the context of federated learning. Our primary objective is to assess how early exits impact privacy during the learning and inference phases. Through experiments, we demonstrate that models equipped with early exits perceivably boost privacy against membership inference attacks. Our findings suggest that the inclusion of early exits in neural models can serve as a valuable tool in mitigating privacy risks while, at the same time, retaining their original advantages of fast inference.
HotStuff is a Byzantine fault-tolerant state machine replication protocol that incurs linear communication costs to achieve consensus. This linear scalability promoted the protocol to be adopted as the consensus mechanism in permissioned blockchains. This paper discusses the architecture, testing, and evaluation of our extensible framework to implement HotStuff and its variants. The framework already contains three HotStuff variants and other interchangeable components for cryptographic operations and leader selection.
Inspired by the Twins approach, we also provide a testing framework for validating protocol implementations by inducing Byzantine behaviors. Test generation is protocol-agnostic; new protocols can execute the test suite with little-to-no modifications. We report relevant insights on how we benefited from Twins for validation and test-driven development. Leveraging our deployment tool, we evaluated our implementation in various configurations.
Reliable distributed systems require replication and consensus among distributed processes to tolerate process and communication failures. Understanding and assuring the correctness of protocols for replication and consensus have been a significant challenge. This paper describes the precise specification and runtime checking of Derecho, a more recent, sophisticated protocol for fast replication and consensus for cloud services.
A precise specification must fill in missing details and resolve ambiguities in English and pseudocode algorithm descriptions while also faithfully following the descriptions. To help check the correctness of the protocol, we also performed careful manual analysis and increasingly systematic runtime checking. We obtain a complete specification that is directly executable, and we discover and fix a number of issues in the pseudocode. These results were facilitated by the already detailed pseudocode of Derecho and made possible by using DistAlgo, a language that allows distributed algorithms to be easily and clearly expressed and directly executed.
We present the first performance comparison of EdDSA and BLS signatures in committee-based consensus protocols through large-scale geo-distributed benchmarks. Contrary to popular beliefs, we find that small deployments (less than 40 validators) can benefit from the small storage footprint of BLS multi-signatures while larger deployments should favor EdDSA to improve performance. As an independent contribution, we present a novel way for committee-based consensus protocols to verify BLS multi-signed certificates by manipulating the aggregated public key using pre-computed values.
Due to the increasing complexity of robot swarm algorithms, analyzing their performance theoretically is often very difficult. Instead, simulators are often used to benchmark the performance of robot swarm algorithms. However, we are not aware of simulators that take advantage of the naturally highly parallel nature of distributed robot swarms. This paper presents ParSwarm, a parallel C++ framework for simulating robot swarms at scale on multicore machines. We demonstrate the power of ParSwarm by implementing two applications, task allocation and density estimation, and running simulations on large numbers of agents.