Improved Regret Bounds for Tracking Experts with Memory,  J. Robinson, M. Herbster, NeurIPS 2021.  
Online Learning of Facility Locations,  S. Pasteris, T. He, F. Vitale, S. Wang, and M. Herbster, ALT 2021.   [paper] 
Online Multitask Learning with Long-Term Memory,  M. Herbster, S. Pasteris, L. Tse, NeurIPS 2020.  
Online Matrix Completion with Side Information,  M. Herbster, S. Pasteris, L. Tse, NeurIPS 2020.  
Online Prediction of Switching Graph Labelings with Cluster Specialists,   M. Herbster, and J. Robinson, NeurIPS19.   [paper]  [poster]
MaxHedge: Maximising a Maximum Online,   S. Pasteris, F. Vitale, K. Chan, S. Wang, M. Herbster, AISTATS 2019.
Service Placement with Provable Guarantees in Heterogeneous Edge Computing Systems.   S. Pasteris, S. Wang, M. Herbster, T. He, INFOCOM 2019.  
On Pairwise Clustering with Side Information,  S. Pasteris, F. Vitale, C. Gentile, M. Herbster, ALT 2018.  
Mistake Bounds for Binary Matrix Completion,  M. Herbster, S. Pasteris, M. Pontil, NIPS 2016.  
Online Prediction at the Limit of Zero Temperature,  M. Herbster, S. Pasteris, S. Ghosh, NIPS 2015.  
Predicting
a Switching Sequence of Graph Labelings, M. Herbster, S. Pasteris,
M. Pontil, Journal of Machine Learning Research pp 2003-2022,
September 2015  
Online Similarity Prediction of Networked
Data
from Known and Unknown Graphs, 
C. Gentile, M. Herbster, S. Pasteris, COLT 2013.  [paper]
Online Sum-Product Computation over Trees, 
M. Herbster, S. Pasteris, F. Vitale, NIPS 2012.  [paper]
Efficient Prediction for Tree Markov Random
Fields in a Streaming Model,
M.Herbster, S. Pasteris, F. Vitale, NIPS Workshop on Discrete Optimization in Machine Learning (DISCML)
2011: Uncertainty, Generalization and Feedback.   [paper] 
A Triangle Inequality for p-Resistance , M. Herbster, Networks Across Disciplines: Theory and
Applications : Workshop @ NIPS 2010.   [paper] 
Predicting the Labelling of a Graph via Minimum p-Seminorm
Interpolation, M. Herbster and G. Lever, COLT 2009.   [paper]  [slides]1
Fast Prediction on a Tree, M. Herbster, M. Pontil, S. Rojas Galeano,
NIPS 22, 2008.   [paper]  [slides]
Online Prediction on Large Diameter Graphs, M. Herbster, G. Lever, and
M. Pontil, NIPS 22, 2008.   [paper]  [1-slide]
Exploiting cluster-structure to predict the
labeling of a graph, M.Herbster, Proceedings of The 19th International Conference
on Algorithmic Learning Theory (ALT'08), 2008.   [paper]  [slides]
A Linear Lower Bound for the Perceptron for
Input Sets of Constant Cardinality, M. Herbster, Research Note, Dept. of Computer
Science, UCL, March 2008. (Updated 18 May 08)
A fast method to predict the labeling of a
tree S.R. Galeano, and M.Herbster, Graph
Labeling Workshop (ECML-2007), 2007.
Prediction on a graph with a perceptron
M. Herbster, and M. Pontil, NIPS 20, 2006.  
Combining graph laplacians for semi--supervised learning
A. Argyriou, M. Herbster, and M. Pontil, NIPS 19, 2005.  
Online learning over
graphs M. Herbster, M. Pontil, and L. Wainer, Proc. 22nd
Int. Conf. Machine Learning (ICML'05), 2005.  [paper]  [slides]
Relative Loss Bounds and Polynomial-time Predictions for
the K-LMS-NET Algorithm  M. Herbster, Proceedings of The 15th International Conference
on Algorithmic Learning Theory, October 2004.
Relative loss bounds for
predicting almost as well as any function in a union of Gaussian
reproducing kernel spaces with varying widths  M. Herbster,
Poster at Mathematical Foundations Learning Theory, June 2004
Tracking
the best linear predictor Mark Herbster and Manfred Warmuth, Journal
of Machine Learning Research pp 281-309, September 2001.
Learning additive models online with fast evaluating kernels
Mark Herbster, An abstract
appeared in Proceedings 14th Annual Conference on Computational
Learning Theory pp 444-460, July 2001.
An online algorithm is given whose hypothesis class is a union
of parameterized kernel spaces, for example the set of spaces induced by
Gaussian kernel when the width is varied. We give relative loss
bounds and a tractable algorithm for specific kernels.
We extend the results of "Tracking the best expert" (see below) to
linear combinations.
We show that for a single neuron with the logistic function
as the transfer function the number of local minima of the error function
based on the square loss can grow exponentially in the dimension.
We generalize the recent worst-case loss bounds for on-line
algorithms where the additional loss of the algorithm on the whole sequence
of examples over the loss of the best expert is bounded. The generalization
allows the sequence to be partitioned into segments and the goal is to
bound the additional loss of the algorithm over the sum of the losses of
the best experts of each segment. This is to model situations in which
the examples change and different experts are best for certain segments
of the sequence of examples. In the single expert case the additional loss
is proportional to $\log n$, where $n$ is the number of experts and the
constant of proportionality depends on the loss function. When the number
of segments is at most $k+1$ and the sequence of length $\ell$ then we
can bound the additional loss of our algorithm over the best partitioning
by $O(k \log n + k \log(\ell/k))$. Note that it takes the same order of
bits to denote the sequence of experts and the boundaries of the segments.
When the loss per trial is bounded by one then we obtain additional loss
bounds that are independent of the length of the sequence. The bound becomes
$O(k\log n+ k \log(L/k))$, where $L$ is the loss of the best partition
into $k+1$ segments. Our algorithms for tracking the best expert are simple
adaptations of Vovk's original algorithm for the single best expert case.
These algorithms keep one weight per expert and spend $O(1)$ time per weight
in each trial.
A new method of discovering the common secondary structure
of a family of homologous RNA sequences using Gibbs sampling and stochastic
context-free grammars is proposed. Given an unaligned set of sequences,
a Gibbs sampling step simultaneously estimates the secondary structure
of each sequence and a set of statistical parameters describing the common
secondary structure of the set as a whole. These parameters describe a
statistical model of the family. After the Gibbs sampling has produced
a crude statistical model for the family, this model is translated into
a stochastic context-free grammar, which is then refined by an Expectation
Maximization (EM) procedure to produce a more complete model. A prototype
implementation of the method is tested on tRNA, pieces of 16S rRNA and
on U5 snRNA with good results.
maintained by Mark Herbster / M.Herbster@cs.ucl.ac.uk