Since its genesis in late 2008 [@Nakamoto2008], Bitcoin had a rapid growth in terms of participation, number of transactions and market value. This success is mostly due to innovative use of existing technologies for building a trusted ledger called blockchain. A blockchain system allows its participants (agents) to collectively build a distributed economic, social and technical system where anyone can join (or leave) and perform transactions in-between without needing to trust each other, having a trusted third party and having a global view of the system. It does so by maintaining a public, immutable and ordered log of transactions, which provides an auditable trusted ledger accessible by anyone.
Technically speaking, all agents store unconfirmed transactions in their memory pools and confirmed transactions in their blockchains. Users agents create transactions with a fee and then broadcast them across the blockchain network to be confirmed (i.e., totally ordered and cryptographically linked to the block-chain). After receiving a certain number of transactions, block creator agents try to confirm them as a block by using a consensus algorithm (e.g., a hash-based proof-of-work by solving a computational puzzle of pre-defined difficulty or a practical byzantine fault-tolerance protocol). The successful block creator agent(s) broadcast(s) the next block to the network to be chained to the blockchain. However, this process is not trivial because a block-chain system is open and dynamic. Hence, both types of agents have to take into account uncertain constraints (e.g., the global merit, the transaction confirmation times, the transaction fees, the delays in the network, and the topology of the network) during their decision-making process for carefully balancing their objectives, otherwise this can lead to important consequences; where for instance, a trend on a growing number of unconfirmed transactions may create a service degradation and may result decreased participation of users agents [@Gurcan2017], which in result may make no user agent to stay in the system, and thus make block creator agents to have no transactions to confirm and eventually make the whole system to be confined to end. This is a challenging task and has not been covered by formal studies conducted so far [@Garay2015;@Eyal2014;@Sapirshtein2016;@Carlsten2016;@Pass2016analysis].
A promising approach to tackle such kind of problems is reinforcement learning [@Tan1993], where agents learn how to behave in an unknown environment by performing actions and seeing the results, in order to maximize their individual cumulative returns (rewards) [@Busoniu2010]. While reinforcement learning agents have achieved some successes in a variety of domains [@Riedmiller2009;@Diuk2008;@Tesauro1995], their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces [@Mnih2015]. In other words, traditional reinforcement learning algorithms have difficulty with domains featuring
Blockchain systems, however, are environments that are too complex for humans to predetermine the correct actions using hand-designed solutions. Furthermore, the agents performing in these systems have limited observability, and the state and parameter spaces are vast and changing dynamically. Consequently, agents that can learn to tackle such complex real-world domains are needed.
Based on this observation, we hypothesize that deep reinforcement learning [@Gupta2017], where each agent is with a deep neural network; hold the key to scaling reinforcement learning towards complex tasks for agents acting in blockchain systems. Deep reinforcement learning had a great growth during the last decade [@Silver2016;@Mnih2013]. However, most of its successes have been in single agent domains, where behavior of other agents is not so relevant. In blockchain systems, on the other hand, the interaction between multiple agents, which can cooperate or compete, is critical. Few studies focused on such topics so far [@Hou2019;@Wang2019;@Zheng2020].
This thesis seeks to answer the following two research questions:
Concretely, the objective of this thesis is to investigate the uncertain constraints of blockchain systems and to propose a deep reinforcement learning decision-making approach based on utility and rewards for both user and block creator agents.
Academic Positions
CEA INSTN
Institut Carnot
CEA DRT
PhD in Computer Engineering, 2023
Université de Montpellier