arrow-up icon

White paper

Condensation: a distributed data system with conflict-free synchronization
and end-to-end security.

Inspired by the blockchain system, the email system, and git versioning, Condensation is a unique solution to develop scalable and modern applications while providing the features to protect digital rights. Having a system that doesn't need to trust the Cloud enables to ensure the security and the ownership of data.

Condensation extends the Cloud with distributed storage servers

The emergence of condensed systems

While being entirely distributed, Condensation can store and retrieve data like a database or a file system, send data to other users or devices like a messaging system, and share and synchronize data like cloud services. Thereby, Condensation follows a distributed actor-message-passing approach and encrypts all data end-to-end.

Condensation builds the bridge between the simplicity of immutable data and the easiness of implementation of mutable documents. On the client, documents are split efficiently into smaller immutable units that can be transferred freely across the network. Then, they can be condensed back when they arrive on the receiver's device. Condensed systems are free from trusting a third-party server and many new features come by design to create trust between clients such as data certification.

This shift from merging data on the server to doing it on the client side, opens many new possibilities for application design. Furthermore, it is significantly more efficient if compared to existing solutions which need to build several application layers on top of existing systems to achieve the same functionalities. Before to start explaining the mechanics, this section explains the history of databases and what makes Condensation the next step in the evolution of databases.

Bridging the gap between mutable and immutable data structures

The structure of today's file and database systems dates back to the 1970s, when storage space was extremely scarce and computers were few. These systems were designed to run on a single machine, and mostly on a single disk.

While both databases and file systems have greatly evolved over time, their main structure has hardly changed. Database systems are based on tables with mutable records (rows), while file systems use a hierarchy of folders with mutable files inside. In both systems, data can be modified with little effort, and at any time. It also has the advantage of being very efficient with regards to storage space needs. Data synchronization, however, is notoriously difficult and error prone.

In todays' connected world, data is used on different devices, or is shared with other people. And, for most applications, storage space is not a limiting factor any more. Hence, efficient data synchronization is key.

Image
A historical evolution of data systems

Aside of file and database systems, revision control systems have been developed and used since the 1980s. Some of them, such as git or hq, are fully distributed and do not require any central server whatsoever. Each user has their own version of the data and can merge changes from other users. Such systems allow for efficient and provably correct data synchronization.

While they are great for source code management, current version control systems are not suited as general purpose data systems. In order to benefit from such systems, the user needs to have a certain understanding of branches, merging, and conflict resolution, which is far beyond the knowledge of an average computer user. In addition, occasional merge conflicts are inevitable, and prevent such systems from being used in a transparent way.

Condensation has been designed from the ground up to address this. The result is a general-purpose data system with lightweight transactions and efficient data synchronization in a completely distributed setting. Merge conflicts are impossible by design, hence no user intervention is required during the synchronization process. The data itself is end-to-end encrypted and may be spread across multiple storage systems.

The evolution of architectures from online to offline and serverless systems.

The structure of the data has a direct influence on the dependency from and the role of a server. With SQL and noSQL databases, the centralized server is needed to synchronize data. Accordingly, it is the place where the application logic occurs. As a result, the system only works when being online. Also, as the data is read and processes by the server, it is vulnerable to data breaches.

Working offline became possible later, by storing documents in the application and defining schemas for synchronization when the application is turned back to online mode. However, this process is complex and requires a handling logic on both the client and the server side. Moreover, scaling a central database was a major issue. Many systems developed distributed systems for horizontal scaling but in a controlled data center setup where all servers are entrusted and available.

As described previously, Condensation only transfer immutable data on the network, which allows to build fully distributed but yet very simple systems. The following scheme summarizes the comparison between existing systems types.

Architectures
Condensation shifts intelligence to the client-side and makes servers a simple encrypted storage‚Äč
Online-only
2000-2015
Offline-first
2015-2020
Condensated
2021
Code base Small Large Small
Architecture Simple architecture, easy to understand Relatively complex architecture Simple architecture but requires "distributed mindset"
Structure Centralized with full trust in cloud Centralized with full trust in cloud Distributed/Federated
Synchronization No synchronization necessary Correct data synchronization (two-way) is hard. Potentially different database schemas on client and server. Based on synchronization
Direct device-to-device sync possible
Security Transport encryption only Transport encryption and SPOF engineered mitigation End-to-end encryption
Authentication Login required Login required Login not necessary (but sometimes desired)

Condensation leverages the advantages of immutable objects. You never have to lock them, which extremely improves concurrency also it improves simplicity as persistance to certify the data isn't compromised and exactly the same as the source. Furthermore, it allows to reduce the memory usage as objects can be reused to create new trees.

Developers can benefit from features available by design such as: data certification with user signature, versioning with transaction history, and conflict free merge based on CRDTs.

So, Condensation has a hybrid data structure, it merge data into mutable documents stored locally and transfers immutable objects through stores. The stores can be managed in a single server or in a purely distributed manner without introducing complexities.

In the following sections, first the data structure is technically described to explain how Condensation transform a document into an immutable merkle tree. Next, to better understand how Condensation manage securely data on the network, the flow of the data through a decentralized network will be described.

Data structure technical specifications

Learn more about how Condensation transform a database document into an immutable Merkle tree.

coming soon, first notes available here

Data flow technical specifications

Learn more about how Condensation send data. From creating an actor with his key pairs, creating a store, and passing data using envelopes.

coming soon, first notes available here
Technical specifications are coming by the next weeks,
subscribe to get notified
arrow-up icon