Marie Candito - Research/Publications

Université Paris Cité - LLF

Universite Paris Cité

I work at the LLF laboratory, in the area of natural language processing.
More precisely my current topics are syntactic and semantic resources and parsing, semantic-syntactic interface.


Current and past PhD students

  • Maria Andueza Rodriguez (supervision with Richard Huyghe, Université de Fribourg)
  • Anna Mosolova (supervision with Carlos Ramisch, Aix-Marseille Université)
  • David Kletz (supervision with Pascal Amsili, Université Sorbonne Nouvelle)
  • Vincent Segonne (defended in Dec 2021, supervision with Benoît Crabbé, Paris Diderot)
  • Hazem Al Saied (defended in Dec 2019, supervision with Mathieu Constant, Université de Lorraine)
  • Marianne Djemaa (defended in June 2017)
  • Enrique Henestroza Anguiano (defended in June 2013, supervision with Alexis Nasr).



  • co-chair of the 18th International Workshop on Treebanks and Linguistic Theories (TLT 2019), within the first SyntaxFest ever! (August 26-30 2019)


(the full list of publications is available HERE)
2023 David Kletz, Pascal Amsili and Marie Candito. 2023,
The Self-Contained Negation Test Set, Proceedings of the 6th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP, pp. 212-221. Singapore.
David Kletz, Marie Candito and Pascal Amsili. 2023,
Probing structural constraints of negation in pretrained language models, Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa 2023), pp. 541-554. Tórshavn, Faroe Islands, 2023.
2022 Marie Candito. 2022,
Auxiliary tasks to boost Biaffine Semantic Dependency Parsing, Findings of ACL 2022 (short paper), pp. 2422-2429.
2020 Marie Candito, Mathieu Constant, Carlos Ramisch, Agata Savary, Bruno Guillaume, Yannick Parmentier and Silvio Ricardo Cordeiro. 2020,
Journal of Language Modelling, A French corpus annotated for multiword expressions and named entities, 8(2), pp. 415-479.
2019 Marie Candito, Mark Liberman. 2019,
Revue TAL, Numéro spécial sur les corpus annotés - Special issue on annotated corpora, 60(2), pp. 7-17.
Silvio Ricardo Cordeiro, Marie Candito. 2019,
Syntax-based identification of light-verb constructions, Proceedings of the 22nd Nordic Conference on Computational Linguistics (NoDaLiDa 2019), Turku, Finland, 2019.
Hazem Al Saied, Marie Candito, Mathieu Constant. 2019,
Comparing linear and neural models for competitive MWE identification, Proceedings of the 22nd Nordic Conference on Computational Linguistics (NoDaLiDa 2019), Turku, Finland, 2019.
Lucie Barque, Marie Candito et Richard Huyghe. 2019,
La classification des verbes réfléchis à l’épreuve d’une annotation en corpus, Revue Langages, 2019/4 (n° 216).
Behrang QasemiZadeh, Miriam R. L. Petruck, Regina Stodden, Laura Kallmeyer, Marie Candito. 2019,
SemEval-2019 Task 2: Unsupervised Lexical Frame Induction, Proceedings of *SEMEVAL, Minneapolis, USA, 2019.
Vincent Segonne, Marie Candito, Benoît Crabbé. 2019,
Using Wiktionary as a resource for WSD : the case of French verbs, Proceedings of the 13th International Conference on Computational Semantics (IWCS), Gothenburg, Sweden, 2019.
2018 Agata Savary, Marie Candito, Verginica Barbu Mititelu, Eduard Bejček, Fabienne Cap et al.
PARSEME multilingual corpus of verbal multiword expressions, Multiword expressions at length and in depth: Extended papers from the MWE 2017 workshop, 2018
Djamé Seddah, Éric Villemonte de La Clergerie, Benoît Sagot, Hector Martinez Alonso, Marie Candito. 2018,
Cheating a Parser to Death: Data-driven Cross-Treebank Annotation Transfer , Proceedings of the eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan, 2018.
2017 Candito M., Guillaume B., Perrier G. and Seddah D. 2017,
Enhanced UD Dependencies with Neutralized Diathesis Alternation, Proceedings of the Fourth International Conference on Dependency Linguistics (Depling 2017), Pisa, Italy, 2017.
[pdf] [slides]
Enhanced treebanks and conversion rules
Al Saied H, Candito M. and Constant M. 2017,
The ATILF-LLF System for the PARSEME Shared Task: a transition-based verbal multiword expression tagger, Proceedings of the 13th workshop on multiword expressions (MWE 2017) - shared task, Valencia, Spain, 2017.
Agata Savary, Carlos Ramisch, Silvio Cordeiro, Federico Sangati, Veronika Vincze, Behrang QasemiZadeh, Marie Candito, Fabienne Cap, Voula Giouli, Ivelina Stoyanova and Antoine Doucet, 2017,
The PARSEME Shared Task on Automatic Identification of Verbal Multi-word Expressions, Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), Valencia, Spain, 2017.
Candito M., Constant M., Ramisch C., Savary A., Parmentier Y., Pasquer C. et Antoine J.-Y. 2017,
Annotation d'expressions polylexicales verbales en français, Actes de TALN 2017 - articles courts, Orléans, 2017.
French dataset
2016 Michalon O., Ribeyre C., Candito M. and Nasr A. 2016,
Deeper syntax for better semantic parsing, Proceedings of the 26th International Conference on Computational Linguistics (Coling), Osaka, Japan, 2016.
Djemaa, M., Candito, M., Muller P. and Vieu L. 2016,
Corpus annotation within the French Framenet: methodology and results, Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC), Portorož, Slovenia, 2016.
Vieu L., Muller P. Djemaa, M., Candito, M., Muller P. and Vieu L. 2016,
A General Framework for the Annotation of Causality Based on FrameNet, Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC), Portorož, Slovenia, 2016.
Seddah D. and Candito M. 2016,
Hard Time Parsing Questions: Building a QuestionBank for French, Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC), Portorož, Slovenia, 2016.
[corpus in original scheme]
[corpus in UD format]
2014 Candito M. and Constant M., 2014,
Strategies for Multiword Expression Analysis and Dependency Parsing, Proceedings of the 52th Annual Meeting of the Association for Computational Linguistics (ACL'14), Baltimore, USA.
Ribeyre C., Candito M. and Seddah D., 2014,
Semi-Automatic Deep Syntactic Annotations of the French Treebank. Proceedings of the 13th International Workshop on Treebanks and Linguistic Theories (TLT13), Tübingen, Germany.
Marie Candito, Guy Perrier, Bruno Guillaume, Corentin Ribeyre, Karën Fort, Djamé Seddah and Eric de la Clergerie, 2014,
Deep Syntax Annotation of the Sequoia French Treebank, Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC), Reykjavik, Iceland, 2014.
Online Annotation guide
Download from corpus site
Candito, M. Amsili, P., Barque, L., Benamara, F., Chalendar, G., Djemaa, M., Haas, P., Huyghe, R., Mathieu, Y., Muller, P., Sagot, B. & Vieu, L., 2014,
Developing a French FrameNet: methodology and first results, Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC), Reykjavik, Iceland, 2014.
Seddah D., Candito M. and Henestroza Anguiano E., 2014,
A word clustering approach to domain adaptation: Robust parsing of source and target domains, In Journal of Logic and Computation (2013) 24(2): 395-411.

2013 Constant M., Candito M. and Seddah D., 2013,
The LIGM-Alpage architecture for the SPMRL 2013 Shared Task: Multiword Expression Analysis and Dependency Parsing, Proceedings of the Fourth SPMRL Workshop, Seattle, USA.
Djamé Seddah; Reut Tsarfaty; Sandra Kübler; Marie Candito; Jinho D. Choi; Richárd Farkas; Jennifer Foster; Iakes Goenaga; Koldo Gojenola Galletebeitia; Yoav Goldberg; Spence Green; Nizar Habash; Marco Kuhlmann; Wolfgang Maier; Yuval Marton; Joakim Nivre; Adam Przepiórkowski; Ryan Roth; Wolfgang Seeker; Yannick Versley; Veronika Vincze; Marcin Woliński; Alina Wróblewska, 2013,
Overview of the SPMRL 2013 Shared Task: A Cross-Framework Evaluation of Parsing Morphologically Rich Languages, Proceedings of the Fourth SPMRL Workshop, Seattle, USA.
2012 Candito M. and Seddah D., 2012,
Effectively long-distance dependencies in French : annotation and parsing evaluation, Proceedings of TLT'11, Lisbon, Portugal.
Seddah D., Sagot B. and Candito M., 2012,
The Alpage Architecture at the SANCL 2012 Shared Task: Robust Pre-Processing and Lexical Bridging for User-Generated Content Parsing., in in Notes of the first workshop of Syntactic Analysis of Non Canonical Languages (SANCL'2012), colocated with NAACL'2012, Montreal, Canada.

Seddah D., Candito M., Crabbé B. and Henestroza Anguiano E., 2012,
Ubiquitous Usage of a Broad Coverage French Corpus: Processing the Est Republicain corpus, Proceedings of LREC 2012, Istanbul, Turkey.
Candito M.-H. and Djamé Seddah, 2012,
Le corpus Sequoia : annotation syntaxique et exploitation pour l’adaptation d’analyseur par pont lexical, Proceedings of TALN'2012, Grenoble, France
Download Sequoia Treebank [pdf]
2011 Candito M.-H., Henestroza Anguiano E. and Seddah D., 2011,
A Word Clustering Approach to Domain Adaptation: Effective Parsing of Biomedical Texts, Proceedings of the 12th International Conference on Parsing Technologies (IWPT'2011) - short paper, Dublin, Ireland
Henestroza Anguiano E. and Candito M.-H., 2011,
Resolving Difficult Syntactic Attachments with Parse Correction, Proceedings of EMNLP'2011 (poster session), Edimburg, Scottland
2010 Candito M.-H., Nivre J., Denis P. and Henestroza Anguiano E., 2010,
Benchmarking of Statistical Dependency Parsers for French, Proceedings of COLING'2010 (poster session), Beijing, China
Candito M.-H. and Seddah D., 2010,
Parsing word clusters, Proceedings of the NAACL/HLT First Workshop on Statistical Parsing of Morphologically Rich Languages (SPMRL 2010), Los Angeles, USA
Tsarfaty R., Seddah D., Goldberg Y., Kuebler S., Versley Y., Candito M., Foster J., Rehbein I. and Tounsi L., 2010,
Statistical Parsing of Morphologically Rich Languages (SPMRL) What, How and Whither, Proceedings of the NAACL/HLT First Workshop on Statistical Parsing of Morphologically Rich Languages (SPMRL 2010), Los Angeles, USA
Seddah D. and ChrupaŁa G. and Cetinoglu O. and van Genabith J. and Candito M.-H., 2010,
Lemmatization and Statistical Lexicalized Parsing of Morphologically-Rich Languages. Proceedings of the NAACL/HLT Workshop on Statistical Parsing of Morphologically Rich Languages (SPMRL 2010), Los Angeles, USA
Candito M.-H., Crabbé B., and Denis P., 2010,
Statistical French dependency parsing: treebank conversion and first results, Proceedings of LREC'2010, La Valletta, Malta
2009 Seddah, D., Candito M.-H. and Crabbé B., 2009,
Crossparser evaluation and tagset variation: a French treebank study. Proceedings of IWPT'09, Paris, France
Candito M.-H. and Crabbé B., 2009,
Improving generative statistical parsing with semi-supervised word clustering. Proceedings of IWPT'09 - short paper, Paris, France
Candito M.-H., Crabbé B., Denis P. and Guérin F., 2009,
Analyse syntaxique du français : des constituants aux dépendances. Proceedings of TALN 2009, Senlis, France
Candito M.-H., Crabbé B. and Seddah D., 2009,
On statistical parsing of French with supervised and semi-supervised strategies. Proceedings of the EACL 2009 workshop : Grammatical Inference for computational linguistics, Athens, Greece
2008 Crabbé B. et Candito M.-H., 2008,
Expériences d'analyses syntaxique statistique du français. Proceedings of TALN 2008, Avignon, France
some time ago... Candito, M.-H., 1999,
Organisation modulaire et paramétrable de grammaires électroniques lexicalisées. Application au français et à l'italien. Thèse de doctorat de l'université Paris 7.
Candito M.-H. and Kahane S. 1998,
Can the derivation tree represent a semantic graph? An answer in the light of Meaning-Text Theory. Proceedings of TAG+4. Philadelphia, USA