...or what I do between surfing the internet, ferrying around kids,
and drinking coffee.
In order to avoid having to
deal with my proposal about 10 more times (2 kids :D), the Austrian Science Fund (FWF)
granted me a START-project called 'Optimisation Principles,
Models & Algorithms for Dictionary Learning' (Y760), which
started 1st of June 2015.
For more details have a look at the START project page.
Before that I was investigating another FWF-project called 'Dictionary Learning for Biometric Data',
see Dictionary Learning.
If dictionary learning is not your thing I'm still happy to talk about
or the Old Sparse Stuff
if this is more interesting to you.
In general, I'm quite happy to mess around with linear algebra, probability/measure theory,
functional analysis, optimisation and numerics, severely traumatised (but sometimes
strangely fascinated) by combinatorics, differential geometry and topology, and absolutely
useless for all other areas of math.
If you have read any compressed sensing or sparse approximation papers, you will most likely have run into the statement
'you have a signal y which is sparse in a given basis/dictionary'. The question is just who gave you that basis/dictionary :D?
In dictionary learning the tasks is exactly this. You have a bunch of signals, let's say K of them stored in a matrix Y,
and you want to find a dictionary D with N atoms, to approximate all your signals with S atoms. So you want to find a
factorisation of the matrix Y into the dictionary D and coefficients X, with S nonzeros in every column, such that
the error in E is small.
Y=D*X + E, Y...(dxK), D....(dxN), X...(NxK), E...(dxK)
Then you can vary the question, how many atoms do I need for a given sparsity level and error,
which sparsity level can I achieve, given the number of atoms and error? Also you can become more
philosophic, how is the coherence of the atoms related to this, how can we characterise the usefulness
of an atom?
As mentioned before these are all hard problems! Why? They are hard to treat theoretically because
they are so non-linear and they are even hard to treat numerically as at some point you need to find
sparse approximations to calculate the error, which is time consuming and depends on the algorithm you used.
The good news is that more and more people are interested, eg.
the partners of the SMALL Project,
and that the theory is catching up with the algorithms. So there are now theoretical
results for l_1 minimisation (our old paper,
John Wright et al., and
Remi's new paper)
and, I have to admit I'm proud of that, a theoretical result for the K-SVD minimisation principle,
Karin's new paper.
If you read the new papers you can probably guess
that a lot of the techniques can be recycled for other minimisation principles,
which is what I'm working on now, and
who knows maybe we can also tackle MOD, MAP, etc.
Dictionary identification has officially become a hot topic with lots of
new developments, first this June'12 paper by
Spielman et al.,
which I missed last time I updated the research page,
and then we have Arora et al. from August'13,
Agarwal et al. from September'13,
Agarwal et al., from October'13,
Arora et al., from January'14,
and last week I finally managed to write up
my May results.
Dictionary identification results are popping up like mushrooms.
I have made one last attempt at collecting results (guaranteed incompleteness):
A personal introduction to theoretical
dictionary learning explains the topic
and can be enjoyed with the whole family.
This is what Massimo
was paying me to work on, more details should still be available at
START-Project - "Sparse Approximation and Optimization in High Dimensions".
Assume you want to learn a very high-dimensional function f(x) that maps the d-dimensional Euclidean ball
to the real line. Well, without any other assumption you are in trouble and you will need an amount of samples
exponential in d to approximate it - not good.
However, if you assume some additional structure on the function, things look at
lot better. So if your function can be written as f(x)= g(Ax), where A is a kxd matrix,
k much smaller than d, with
orthonormal, compressible rows, and g some function from the k-dimensional Euclidean ball to the real line, you
can find some suggestions for learning it here.
If you are in the lucky case that your A is actually part of the identity matrix, ie. g depends only on k out
of d coordinates, an even more efficient strategy can be found
If you are interested in the things I did during my PhD, you are now obliged to read my thesis
since EPFL restructured their webpage and got rid of junk like my old research page.