Here's my Google Scholar.
Jump to selected publications.

Selected Publications

PREDILECT: Preferences Delineated with Zero-Shot Language-based Reasoning in Reinforcement Learning

Simon Holk*, Daniel Marta*, and Iolanda Leite
HRI'24

This paper aims to utilize the zero-shot capabilities of LLM in order to increase the granularity of preference-based feedback by extending my previous work. By letting the user give optional auxiliary textual descriptions of their preference we ensure that the reward function align with what the human actually wants as well as improve the sample-efficiency by avoiding over-sampling.

POLITE: Preferences Combined with Highlights in Reinforcement Learning

Simon Holk, Daniel Marta, and Iolanda Leite
ICRA'24 - Nominated for best HRI paper, best student paper, and best conference paper.

This paper aims to improve the granularity of preference-based feedback by adding temporal trajectory segmentation to highlight positive and negative parts. This helps avoid the uniform assignment of responsibility across the whole trajectory while it might just be some part that was actually preferred. The highlights are then optimized as an auxiliary task ensuring a improved shared representation.

SEQUEL: Semi-Supervised Preference-based RL with Query Synthesis via Latent Interpolation

Daniel Marta*, Simon Holk*, Christian Pek, Jana Tumova, and Iolanda Leite
ICRA'24

By utilizing semi-supervised learning we can improve the sample-efficiency by augmenting the preferences provided by humans. We train a VAE to learn a latent representation and thereafter synthesise new queries from existing ones by interpolating the trajectory pairs. The idea is that if a user prefers traj A over traj B, even if they are a bit closer (like 10%) they would still prefer it. This helps the feedback generalize better.

VARIQuery: VAE Segment-based Active Learning for Query Selection in Preference-based Reinforcement Learning

Daniel Marta*, Simon Holk*, Christian Pek, Jana Tumova, and Iolanda Leite
IROS'23

This paper addresses the often-overlooked aspect of query selection, which is closely related to active learning (AL). We propose a novel query selection approach that leverages variational autoencoder (VAE) representations of state sequences. In this manner, we formulate queries that are diverse in nature while simultaneously taking into account reward model estimations.

Aligning Human Preferences with Baseline Objectives in Reinforcement Learning

Daniel Marta, Simon Holk, Christian Pek, Jana Tumova, and Iolanda Leite
ICRA'23

By considering baseline objectives to be designed beforehand, we are able to narrow down the policy space, solely requesting human attention when their input matters the most. To allow for control over the optimization of different objectives, our approach contemplates a multi-objective setting. We achieve human-compliant policies by sequentially training an optimal policy from a baseline specification and collecting queries on pairs of trajectories.