Research

I am interested in the general problem of accessing, mining and learning from large (text) collections, through machine learning models and methods, and work on both fundamental problems (through the development of new models that explain different chracteristics of large-scale collections/networks) and applications related to computational linguistics and information retrieval. In computational linguistics, I developed machine learning and corpus based approaches by designing probabilistic models for bilingual lexicon extraction from corpora as well as for information extraction and machine translation. In information retrieval (IR), I have studied the theoretical foundations of the main IR models and their relation to properties of textual collections. I also proposed the first analytical version of heuristic IR constraints, in different settings (ad hoc IR, cross-lingual IR and relevance feedback). I have also worked on several aspects of machine learning (ML) for text mining, in particular on latent topic models and their link to matrix factorization. I have also contributed to many different methods and models in machine learning for text mining, including a theoretical analysis of hierarchical text categorization and extensions of latent topic models using copulas to explicitly model dependencies between variables (e.g. words and topics). I have also developed, with George Paliouras, a series of challenges and workshops on Large Scale Hierarchical Text Classification (LSHTC). Lastly, I have recently explored different causal graphs for representing time series (as the extended summary causal graphs) as well as the associated causal discovery and inference methods one can deploy on them.

I am currently focusing on overcoming the limitations of current deep neural networks and on learning causally grounded representations of textual sequences.

  • Here's a short video to explain what textual information access is (in French).
Scientific animation

I have been a member of the Executive Board of the European Association for Computational Linguistics from 2007 to 2010, a member of the Computer Science panel of the European Research Council for Starting/Consolidator Grants, from 2007 to 2013, and I am a member of the Advisory Board of SIGDAT since 2005.
I have also served as:

  • Program co-chair for ICTIR 2022,
  • Chair for CNIA 2022,
  • Co-chair for ACM SIGIR 2019,
  • Co-chair for ECIR 2018,
  • Co-chair for IEEE DSAA 2015,
  • Program chair for CORIA 2015,
  • Workshop co-chair for EMNLP 2014,
  • Program co-chair for EMNLP 2006,
  • Area chair during several years for SIGIR and ECIR,
  • Co-chair and co-organizer of several international workshops (as LSHC1, within ECIR 2010, LSHC2, within ECML 2011 and LSHC3, within ECML 2013).
Lastly, I am a member of the editorial boards of the journals Information Retrieval and Document Numérique, and a past member of the editorial boards of the journals Information, International Journal of Data Science and Analytics, Traitement automatique des langues, Computational Linguistics and International Journal of Corpus Linguistics.

Projects
  • CSPR (regional project) (July 2021-Aug. 2023)
  • MIAI (3IA ANR project) (Sept. 2019-Dec. 2023)
  • COST (ANR project) (March 2019-March 2023)
  • AI4EU, European project (Jan. 2019-Dec. 2021)
  • AISUA (regional project) (April 2017-March 2022)
  • Locust (ANR project) (April 2015-September 2020)
  • Smart Support Centers (FUI) (March 2015-Sept. 2018)
  • Graphical models for modeling the dynamics of content networks (regional project) (Sept. 2014-Sept. 2017)
  • New theoretical frameworks in metric learning (regional project) (Sept. 2013-Sept. 2016)
  • Khronos, Persyval (labex) project on data mining of temporal data (Sept. 2013-Sept. 2017)
  • CNRS Mastodons project (2012-2014)
  • CLASS-Y (ANR project) (February 2011-February 2015)
  • BioASQ (Eur. project) (Oct. 2012-Oct. 2014)
  • PASCAL2 European network of excellence (2009-2013)
  • MeTRICC (ANR project) (December 2008-December 2011)
  • FRAGRANCES (ANR project) (December 2008-December 2011)
  • LASCAR (LArge Scale CAtegoRization - UJF project) (January 2008-December 2009)
  • INFOM@GIC (French project) (2005-2006 pour ma participation)
  • PASCAL European Network of Excellence (2004-2006)
  • REVEAL THIS (European project) (2004-2007)
  • KerMIT (European project) (2001-2004)
  • Outiller les Alliances (French project) (2001-2003)
  • MuchMore (European project) (1999-2002)
  • EUROTRA-63 (European project) (1992-1995)
PhD students
  • Anna Laskina, co-supervised with G. Calvary, funding from Univ. Grenoble Alpes; Text mining; (2021-)
  • Anis Shami, co-supervised with V. Blum, CIFRE Dawex; Data double materiality rating; (2021-)
  • Lei Zan, co-supervised with E. Devijver, CIFRE Coservit; Causal discovery; (2021-)
  • Minghan Li, grant from Chinese gvt; Information retreival, machine learning (2020-)
  • Mariia Garavenko, co-uspervised with H. Mirisaee, CIFRE Skopai; Text mining, computational linguistics (2019-2022)
  • Quentin Grail, co-supervised with J. Perez, CIFRE NaverLabs Europe; Computationl linguistics; (2018-2021)
  • Karim Assaad, co-supervised with E. Devijver, CIFRE Coservit; Causal discovery; (2018-2021)
  • Maziar Moradi Fard, co-supervised with A. Douzal and T. Thonet, ANR funding; Text mining, machine learning; (2017-2020)
  • Diana Popa, co-supervised with J. Henderson and J. Perez, CIFRE XRCE/NaverLabs; Computational linguistics, machine learning; (2014-2019)
  • Yagmur Cinar, FUI funding; Information retrieval, machine learning; (2015-2018)
  • Adrien Dulac, co-supervised with C. Largeron, regional funding; Social network, machine learning; (2014-2018)
  • Hesam Amoualian, co-supervised with M.-R. Amini and M. Clausel, French national funding MESR; Machine learning; (2014-2017)
  • Théo Trouillon, co-supervised with G. Bouchard, CIFRE XRCE; Machine learning; (2014-2017)
  • Abdelkader El Mahdaouy, co-supervised with S. Ouatik, co-tutelle Univ. de Fès, Maroc; Computational linguistics, information retrieval; (2013-2017)
  • Irina Nicolae, co-supervised with M. Sebban, regional funding; Machine learning; (2013-2016)
  • Saeid Soheily Khah, co-supervised with A. Douzal, industrial funding; Data analysis; (2013-2016)
  • Hamid Mirisaee, co-supervised with A. Termier, French national funding MESR; Data mining, social network analysis and mining; (2012-2015)
  • François Kawala, co-supervised with A. Douzal, CIFRE Best of Media; Social network analysis and mining; (2011-2015)
  • Rohit Babbar, co-supervised with M.-R. Amini, ANR funding; Machine learning; (2011-2014)
  • Parantapa Goswami, co-supervised with M.-R. Amini, French national funding MESR; Information retrieval, machine learning; (2011-2014)
  • Cédric Lagnier, French national funding MESR; Social network analysis and mining; (2009-2013)
  • Clément Grimal, co-supervised with G. Bisson, ANR funding; Machine learning; (2009-2012)
  • Bo Li, ANR funding; Computational linguistics; (2009-2012)
  • Franck Meyer, Orange Labs; Machine learning; (2007-2012)
  • Stéphane Clinchant, CIFRE XRCE; Information retrieval; (2008-2011)
  • Ali Mustafa Qamar, French national funding MESR; Machine learning; (2007-2010)
  • Leila Kefi, co-supervised with C. Berrut, French national funding MNRT; Information retrieval; (2002-2006)
  • François Trouilleux, co-supervised with G. Bes and A. Zaenen, CIFRE XRCE; Linguistics; (1998-2001)
Postdoctoral researchers
  • Diana Popa (2020-2021)
  • Mahdieh Khosravi (2019-2020)
  • Thibaut Thonet (2018-2019), currently at NaverLabs Europe
  • Yagmur Cinar (2018-2019), currently at Amazon UK
  • Hamid Mirisaee (2017-2018), currently at Skopaï
  • Parantapa Goswami (2016-2017), currently at Rakuten
  • Ioannis Partalas (2012-2014), currently at Expedia
  • Sujeevan Aseervatham (2008-2009), currently at Scoiété Générale
Publications

Most of my publications are either available on DBLP, or on Google Scholar (recent ones are also available on HAL); most of my patents are available on the USPTO site.

  • The paper A Probabilistic Interpretation of Precision, Recall and F-Score, with Implication for Evaluation, co-written with C. Goutte, received the Test of Time award at ECIR (European Conference on Information Retrieval) 2016
  • The paper Task Composition in Crowdsourcing, co-written with S. Amer-Yahia, V. Leroy, J. Pilourdault, R. M. Borromeo and M. Toyama, received an honorable mention award at DSAA (International IEEE Conference on Data Science and Advanced Analytics) 2016
  • The paper A Theoretical Analysis of Pseudo-Relevance Feedback Models, co-written with S. Clinchant, received the best paper award at ICTIR (International Conference on the Theory of Information Retrieval) 2013
  • The paper Modèles d'information pour la recherche multilingue, co-written with B. Li, received the best paper award at CORIA (Conférence pour la Recherche d'information et ses Applications) 2012
  • The paper Information-based models for ad-hoc IR, co-written with S. Clinchant , was nominated for the best paper award at SIGIR 2010
  • The French version of this paper, entitled Modèles de RI fondés sur l'information, received the best paper award at CORIA (Conférence pour la Recherche d'information et ses Applications) 2010
Books