Skip to content

robyBorelli/CS-Master-Portfolio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Publications, Academic Projects and Thesis

In this repository, I showcase my academic journey during my Master's degree in Computer Science. It includes my thesis, publications, and various projects. The material spans across multiple areas of computer science such as algorithms, automata theory, logic and parallel computing.

Publications


DOI

Slides
On cascades of Reset Automata Type: Conference Paper (STACS 2025)
Language: English
Authors: Roberto Borelli, Luca Geatti, Marco Montali, Angelo Montanari
Abstract: The Krohn-Rhodes decomposition theorem is a pivotal result in automata theory. It introduces the concept of cascade product, where two semiautomata, that is, automata devoid of initial and final states, are combined in a feed-forward fashion. The theorem states that any semiautomaton can be decomposed into a sequence of permutation-reset semiautomata. For the counter-free case, this decomposition consists entirely of reset components with two states each. This decomposition has significantly impacted recent research in various areas of computer science, including the identification of a class of transformer encoders equivalent to star-free languages and the conversion of Linear Temporal Logic formulas into past-only expressions (pastification). The paper revisits the cascade product in the context of reset automata, thus considering each component of the cascade as a language acceptor. First, we give regular expression counterparts of cascades of reset automata. We then establish several expressiveness results, identifying hierarchies of languages based on the restriction of the height (number of components) of the cascade or of the number of states in each level. We also show that any cascade of reset automata can be transformed, with a quadratic increase in height, into a cascade that only includes two-state components. Finally, we show that some fundamental operations on cascades, like intersection, union, negation, and concatenation with a symbol to the left, can be directly and efficiently computed by adding a two-state component.

DOI
The kth nearest neighbor method for estimation of entropy changes from molecular ensembles Type: Publication
Language: English
Authors: Federico Fogolari, Roberto Borelli, Agostino Dovier, Gennaro Esposito
Abstract: All processes involving molecular systems entail a balance between associated enthalpic and entropic changes. Molecular dynamics simulations of the end-points of a process provide in a straightforward way the enthalpy as an ensemble average. Obtaining absolute entropies is still an open problem and most commonly pathway methods are used to obtain free energy changes and thereafter entropy changes. The kth nearest neighbor (kNN) method has been first proposed as a general method for entropy estimation in the mathematical community 20 years ago. Later, it has been applied to compute conformational, positional–orientational, and hydration entropies of molecules. Programs to compute entropies from molecular ensembles, for example, from molecular dynamics (MD) trajectories, based on the kNN method, are currently available. The kNN method has distinct advantages over traditional methods, namely that it is possible to address high-dimensional spaces, impossible to treat without loss of resolution or drastic approximations with, for example, histogram-based methods. Application of the method requires understanding the features of: the kth nearest neighbor method for entropy estimation; the variables relevant to biomolecular and in general molecular processes; the metrics associated with such variables; the practical implementation of the method, including requirements and limitations intrinsic to the method; and the applications for conformational, position/orientation and solvation entropy. Coupling the method with general approximations for the multivariable entropy based on mutual information, it is possible to address high dimensional problems like those involving the conformation of proteins, nucleic acids, binding of molecules and hydration.

DOI

Code
Data Structures and Algorithms for k-th Nearest Neighbours Conformational Entropy Estimation Type: Publication
Language: English
Authors: Roberto Borelli, Agostino Dovier and Federico Fogolari
Abstract: Entropy of multivariate distributions may be estimated based on the distances of nearest neighbours from each sample from a statistical ensemble. This technique has been applied on biomolecular systems for estimating both conformational and translational/rotational entropy. The degrees of freedom which mostly define conformational entropy are torsion angles with their periodicity. In this work, tree structures and algorithms to quickly generate lists of nearest neighbours for periodic and non-periodic data are reviewed and applied to biomolecular conformations as described by torsion angles. The effect of dimensionality, number of samples, and number of neighbours on the computational time is assessed. The main conclusion is that using proper data structures and algorithms can greatly reduce the complexity of nearest neighbours lists generation, which is the bottleneck step in nearest neighbours entropy estimation.

Thesis


Thesis

Slides
Master's thesis: Learning from Answer Sets via ASP Encodings Type: Master's thesis
Language: English
Abstract: In recent years, Learning from Answer Sets (LAS) has gained increasing attention as a symbolic AI technique well-suited for explainable machine learning. The need for explainability has become a key requirement in modern AI applications, making LAS an appealing framework due to its strong logical foundations and interpretability. LAS provides a formal approach to inductive logic programming (ILP), where the goal is to learn hypotheses in the form of logic programs that explain given positive and negative examples under the answer set semantics. This thesis presents a comprehensive study of LAS, analyzing its fundamental algorithms and complexity results. A particular focus is given to the satisfiability problem of an LAS task- that is, an instance of the problem of Learning from Answer Sets- which is known to be ΣP 2-complete. To address this problem, we develop novel Answer Set Programming (ASP) encodings that translate an LAS task into an ASP program whose stable models correspond one-to-one with the solutions of the original task. We propose two alternative encodings. The first, simpler but less efficient, relies on an ASP program of exponential size. The second encoding, which introduces disjunctive rules, makes use of the saturation technique by Eiter and Gottlob. This approach ensures a linear encoding size when the background knowledge combined with the hypothesis space forms a tight program. A key contribution of this work is the development of LASCO (LAS to ASP Compiler), a tool that implements both encodings and translates LAS tasks into ASP code executable by standard solvers. LASCO supports ground and non-ground tasks and handles common ASP extensions such as arithmetic expressions and cardinality constraints. Finally, we evaluate our approach on a suite of representative benchmarks and compare its performance to ILASP, the state-of-the-art LAS solver. While ILASP generally demonstrates superior performance, largely due to its maturity and extensive optimizations, our newly developed tool LASCO outperforms all versions of ILASP on certain benchmark instances (in particular on tight tasks) and remains competitive in many others. These results highlight the potential of our approach, suggesting that the disjunctive encoding is not only of theoretical interest but also has practical relevance, and motivates further exploration.

Thesis

Slides
Bachelor's thesis: Algorithms for neighbourhood searches Type: Bachelor's thesis
Language: Italian
Abstract:

Academic Projects


Slides
On The Expressiveness Of Masked Hard-Attention Transformers Type: Seminar for the course "Foundations of Neural Networks"
Language: English
Abstract: This work presents the paper by Yang et al. which characterizes the expressiveness of a particular class of transformers with hard attention, where attention is focused on exactly one position at a time. It is shown how to compile a transformer model into the language B-RASP. Furthermore, it is established that B-RASP is equivalent to star-free languages. The proof proceeds in two directions, employing two distinct characterizations of star-free languages: linear temporal logic over finite traces and cascades of reset automata. This study offers a deeper understanding of the transformer formalism, revealing that (i) both the feed-forward and self-attention sublayers play crucial roles, and (ii) increasing the number of layers in a transformer enhances its expressive power. This latter result contrasts with the universal approximation theorem for standard feedforward neural networks.

Report

Slides
The 3SUM problem admits subquadratic solutions Type: Seminar for the course "Advanced Algorithms"
Language: English
Abstract: In this work, I consider the 3sum problem. Recent years’ studies have shown that the problem admits a subquadratic solution. The 3sum problem has been used in the area of fine-grained complexity to establish lower bounds to a wide range of other problems (which have shown to be 3sum-hard) for example in the computational geometry area. In this paper, I examine the Freund approach to obtain a subquadratic algorithm. To obtain a saving in the complexity, several tricks have been applied and in particular it has been shown how to efficiently enumerate the so-called chunks through a correspondence with paths in a matrix and then all pairs of blocks agreeing with such derived chunks are obtained through a reduction to the dominance-merge problem.

Report

Slides
Expressiveness of temporal logic Type: Seminar for the course "Automatic system Verification: Theory and Applications"
Language: English
Abstract: In this work, I consider the expressive power of various temporal logics. First, I recall some basic results about expressiveness of first order logic. Then I consider the case of LTL and I show a theorem that can be used to prove that the concept of parity is not definable in this context. I discuss a counterexample that proves that the mentioned theorem doesn’t directly apply to LTL+P and I briefly highlight how a possible investigation may lead to a generalization of the theorem to the LTL+P case. Next, I relate first order definable languages with LTL ones and I present an extension to LTL which allows us to increase the expressive power and capture regular languages without changing the complexity of the decision procedure. Finally, I move to the more interesting case of interval logic. I introduce the notion of bisimulation and its use in modal logic and, in particular, I show how to apply it to prove that the logic PNL is strictly more expressive than its future fragment A.

Slides
Logic for context-free languages Type: Seminar for the course "Logic for Computer Science"
Language: Italian
Abstract:

Report

Code
Parallel Automata Minimization Type: Project for the course "Programming on Parallel Architectures"
Language: Italian
Authors: Roberto Borelli (OpenMP), Stefano Rocco (CUDA)
Abstract: The minimization problem of an automaton is central in automata theory and has various practical implications. In this work, we aim to develop a parallel version of the well-known Moore's algorithm, which classically runs in quadratic time. We will review the fundamental concepts and problems in the field, analyze the serial algorithm by examining its code, theoretical properties, and time complexity. Using the OpenMP programming model, we will develop six different parallel versions of the algorithm. The first four, more efficient versions, are based on dividing the main loop into parallel tasks. The fifth version addresses the issue of merging multiple iterations of the refinement loop. The sixth and most scalable version will attempt an approach based on the parallelization of RadixSort and, ultimately, CountingSort, which will then be further developed in CUDA. We will divide CountingSort into three phases, proposing various implementation solutions for each phase using this programming model. We will test and compare the OpenMP and CUDA implementations on a significant set of instances.

Report

Code
Scheduling of competitions with automated reasoning techniques Type: Final project for the course "Automated Reasoning"
Language: Italian
Abstract:

Report

Slides
XGBoost Type: Seminar for the course "Applied Statistics and Data Analysis"
Language: Italian
Abstract:

About

Projects I'm working on during my master degree.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published