|
Important dates
Submission: 10 June 09
Notification: 30 June 09
Final camera ready: 15 August 09
|
Objective
This workshop addresses the problem of learning from data that are
not independently and identically distrbuted (IID), knowing that
IIDness is a common assumption made in statistical machine learning. If
this assumption helps to study the properties of learning procedures
(e.g. generalization ability), and also guides the building of new
algorithms, there are many real world situations where it does not
hold. This is particularly the case for many challenging tasks of
machine learning that have recently received much attention such as
(but not limited to): ranking, active learning, hypothesis testing,
learning with graphical models, prediction on graphs, mining (social)
networks, multimedia or language processing. The goal of this workshop
is to bring together research works aiming at identifying problems
where either the assumption of identical distribution or independency,
or both, is violated, and where it is anticipated that carefully taking
into account the non-IIDness is of primary importance.
Examples of such problems are:
- Bipartite ranking or, more generally, pairwise
classification, where pairing up IID variables entails non-IIDness:
while the data may still be identically distributed, it is no longer
independent;
- Active learning, where labels for specific data are requested by the learner: the independence assumption is also violated;
- Learning with covariate shift, where the training
and test marginal distributions of the data differ: the identically
distributed assumption does not hold.
- Online learning with streaming data, when the
distribution of the incoming examples changes over time: the examples
are not identically distributed.
We see the workshop as a venue not only for the presentation
of papers focusing on carefully dealing with non-IID data, but also as
a forum for sharing ideas across different application domains.
Henceforth, it will be an opportunity for discussions on methods that
address non-IIDness from the following standpoints:
- Theoretical: results on generalization bounds and
learnability, contributions that mathematically formalize the types of
non-IIDness encountered, results on the extent to which non-IIDness
does not harm the validity of theoretical results build on the IID
assumption, helpfulness of the online learning framework,
- Algorithmic: theoretically motivated algorithms
designed to handle non-IID data, approaches that make it possible for
classical learning results to carry over, online learning procedures,
- Practical: successful applications of non-IID
learning methods to learning from streaming data, web data, biological
data, multimedia, natural language, social network mining.
Submission format
Please send to lniid09[at]liste.lif.univ-mrs.fr, in PDF or postscript in the LNCS format, a full paper of at most 8 pages for one of the tracks below:
- Oral presentation,
- Poster splotlights,
- Posters.
Organizers
Massih-Reza Amini, National Research Council, Canada
Amaury Habrard, University of Marseille, France
Liva Ralaivola, University of Marseille, France
Nicolas Usunier, University Pierre et Marie Curie, France
Program Committee
Shai Ben-David, University of Waterloo, Canada
Gilles Blanchard, Fraunhofer FIRST (IDA), Germany
Stéphan Clémençon, Télécom ParisTech, France
François Denis, University of Provence, France
Claudio Gentile, University dell'Insubria, Italy
Balaji Krishnapuram, Siemens Medical Solutions, USA
François Laviolette, Université Laval, Canada
Xuejun Liao, Duke University, USA
Richard Nock, University Antilles-Guyane, France
Daniil Ryabko, Institut National de Recherche en Informatique et Automatique, France
Marc Sebban, University of Saint-Etienne, France
Ingo Steinwart, Los Alamos National Labs, USA
Masashi Sugiyama, Tokyo Institute of Technology, Japan
Nicolas Vayatis, École Normale Supérieure de Cachan, France
Zhi-Hua Zhou, Nanjing University, China
Keynote Speakers
Shai Ben-David, University of Waterloo, Canada
Title: Towards theoretical understanding of domain adaptation learning
Abstract: Machine learning enjoys a deep and powerful theory that has led to a wide variety of highly successful practical tools. However, most of this theory is developed under some simplifying assumptions that clearly fail in the real world. In particular, a fundamental assumption of the theory is that the data available for training and the data of the target application come from the same source. When this assumption fails, the learner is faced with a “domain adaptation” challenge.
In the past few years, the range of machine learning applications have been expanded to include various tasks requiring domain adaptation. Such application have been addressed by several heuristic paradigms. However, the common theoretical models fall short of providing useful analysis of these techniques.
The key to domain adaptation is the similarity between the training and target domains. In this talk I will discuss several parameters along which task similarity can be defined and measured and discuss to what extent can they be utilized to direct learning algorithms and guarantee their success. Recent work can provide theoretical justification to some existing practical heuristics, as well as guide the development of novel algorithms for handling some types of data discrepancies. However, our current understanding leaves much
to be desired. I shall devote the last part of the talk to describing some of the challenges and open questions that will have to be addressed before one can claim satisfactory understanding of learning in the presence of training-test discrepancies. The talk is based on joint works with John Blitzer, Koby Crammer and Fernando Pereira and with my students, David Pal, Teresa Luu and Tyler Lu.
Nicolas Vayatis, École Normale Supérieure de Cachan, France
Title: Empirical risk minimization with statistics of higher order with examples from bipartite ranking
Abstract: Statistical learning theory was mainly developed in the framework of binary classification under the assumption that observations in the training set form an i.i.d. sample. The techniques involved in order to provide statistical guarantees for state-of-the-art learning algorithms are borrowed from the theory of empirical processes. This is made possible not only because of the "i.i.d." assumption on the data but also because of the nature of the performance measures, such as classification error or margin error, which are statistics of order one. In the talk, I will discuss a variety of questions which arise in the theory when more involved criteria are considered. The problem of bipartite ranking through ROC curve optimization provides a prolific source of optimization functionals which are statistics of order strictly larger than one and several examples will be presented.
Important Dates
Paper submission deadline: June 10, 2009 - Extended deadline: June 21, 2009
Notification of acceptance: June 30, 2009
Final camera ready submissions: August 15, 2009
Workshop: September 7, 2009
|
|
|