Master's Thesis Proposal (Graph Analysis and Blockchain)

Master's Thesis Proposal (Graph Analysis and Blockchain)

by Marco Viviani -
Number of replies: 0

Ethereum is one of the most widely used blockchain platforms, where every transaction generates complex economic interactions among different actors. With the adoption of Proposer-Builder Separation (PBS), these dynamics have become even more intricate, particularly in the context of Maximal Extractable Value (MEV), where builders and validators play a central role in transaction ordering and block construction. To better understand concentration, fairness, and efficiency within this ecosystem, it is useful to reconstruct economic flows as a directed, weighted graph representing "who pays whom" relationships. Graph-based community detection techniques, such as Leiden, then make it possible to uncover dominant roles, clusters of actors, and recurring value extraction patterns.

The thesis aims to reconstruct, for the month of February 2024, the network of economic flows related to Ethereum transactions, with particular attention to interactions among users (EOA), builders, and validators under the PBS regime. The objective is to generate a directed and weighted "who-pays-whom" graph in USD, distinguish PBS payments (builder → validator) from ordinary transfers, and analyze the community structure to measure flow concentration, dominant roles, and possible MEV-related patterns. The expected contribution is both methodological (a reproducible pipeline for a specific month) and empirical (clear quantitative indicators on concentration and centrality of actors).

To this end, the thesis will extract executed transactions from the Ethereum blockchain for February 2024 (including logs and internal traces), normalize addresses by distinguishing EOAs from smart contracts, and reconstruct the net value flow of each transaction in USD using historical prices. PBS payments (builder → validator) will be identified through known fee-recipient/relay addresses in order to separate them from ordinary transfers. From the cleaned dataset, a directed and weighted graph will be built (nodes = actors; edges = frequency of interactions), on which centrality metrics will be calculated, and Leiden will be applied to discover communities. The analysis will cover top edges, top revenues (in-strength), concentration, and a comparison between PBS and non-PBS flows. Expected outputs include tables of edges/nodes with features, a reproducible notebook, and visualizations (e.g., Sankey diagrams, heatmaps, stacked bar charts).

Students interested in this thesis topic are welcome to contact both Marco Viviani (marco.viviani@unimib.it) and Davide Mancino (davide.mancino@unimib.it) for further information.