We consider the problem of stochastic linear bandit with multiple users where a user graph characterizes the affinity between users is available. The goal is to design an asymptotic optimal and computational light algorithm with improved finite-time regret guarantee. The question to answer is: On the basis of existing provably asymptotic optimal algorithms, could the user graph be exploited to improve the finite-time behaviour of asymptotic optimal algorithms, while keeping the computational complexity low.
Laplacian-regularized Estimator Error Analysis
We provide a theoretical analysis of the representation learning problem aimed at learning the latent variables (design matrix) Θ of observations Y with the knowledge of the coefficient matrix X. The design matrix is learned under the assumption that the latent variables Θ are smooth with respect to a (known) topological structure G. To learn such latent variables, we study a graph Laplacian regularized estimator, which is the penalized least squares estimator with penalty term proportional to a Laplacian quadratic form.
Optimal Graph Bandit