, 109 6.3.2 Extensions: towards joint alignment and segmentation, p.110
,
2017) 15 in order to integrate the extensions described in Section 6.3.2. The last version of our code, which handles mini-batches efficiently, heavily borrows from Joost Basting's code. 16 Source sentences include an end-of-sentence (EOS) special symbol 12 We count here the end-of-sentence token corresponding to the last column in the attention matrices, vol.13, 2018. ,
124 7.1.2 Synthesis of the main results for Mboshi, p.125 ,
,
Automatic Understanding of Unwritten Languages, 2017. ,
Inducing Bilingual Lexicons from Small Quantities of Sentence-Aligned Phonemic Transcriptions, 12th International Workshop on Spoken Language Translation (IWSLT), 2015. ,
Learning a Translation Model from Word Lattices, Proceedings of INTERSPEECH, pp.2518-2522, 2016. ,
Learning a Lexicon and Translation Model from Phoneme Lattices, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp.2377-2382, 2016. ,
Evaluating Phonemic Transcription of Low-Resource Tonal Languages for Language Documentation, 2018. ,
URL : https://hal.archives-ouvertes.fr/halshs-01709648
Breaking the Unwritten Language Barrier: The Bulb Project, Proceedings of SLTU (Spoken Language Technologies for Under-Resourced Languages), 2016. ,
URL : https://hal.archives-ouvertes.fr/halshs-01428027
Biasing Attention-Based Recurrent Neural Networks Using External Alignment Information, Proceedings of the Second Conference on Machine Translation, pp.108-117, 2017. ,
On The Alignment Problem In Multi-Head Attention-Based Neural Machine Translation, Proceedings of the Third Conference on Machine Translation: Research Papers, pp.177-185, 2018. ,
Méthodes statistiques pour la traduction automatique, éditeurs: Modèles statistiques pour l'accès à l'information textuelle, vol.7, pp.271-356, 2011. ,
, , 1998.
Éléments de Description de l'orungu, Langue Bantu Du Gabon (B11b), 2007. ,
Computational Tools for Endangered Language Documentation, 2019. ,
Tied Multitask Learning for Neural Speech Translation, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp.82-91, 2018. ,
Spoken Term Discovery for Language Documentation using Translations, Proceedings of the Workshop on Speech-Centric Natural Language Processing, pp.53-58, 2017. ,
Leveraging translations for speech transcription in low-resource settings, 2018. ,
Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems, Ann. Statist, vol.2, issue.6, pp.1152-1174, 1974. ,
Word Formation in Generative Grammar, 1976. ,
Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond, 2018. ,
Unsupervised Neural Machine Translation, 2018. ,
Automatic annotation of media field recordings, Proceedings of the ECAI 2010 Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, 2010. ,
The Cambridge Handbook of Endangered Languages. Cambridge Handbooks in Language and Linguistics, 2011. ,
Neural machine translation by jointly learning to align and translate, 2014. ,
Towards speechto-text translation without speech recognition, Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, vol.2, pp.474-479, 2017. ,
Low-Resource Speech-to-Text Translation, Interspeech 2018, pp.1298-1302, 2018. ,
Pre-training on high-resource speech recognition improves low-resource speech-to-text translation, 2018. ,
The Mboshi noun class system, Journal of West African Languages, vol.26, issue.1, pp.27-47, 1996. ,
Painless Unsupervised Learning with Features, Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp.582-590, 2010. ,
Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures, Proceedings of the 30th International Conference on Machine Learning, vol.28, pp.115-123, 2013. ,
Towards Speech Translation of Non Written Languages, Spoken Language Technology Workshop, pp.222-225, 2006. ,
Factored Language Models and Generalized Parallel Backoff, Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol.2, pp.4-6, 2003. ,
Unsupervised decomposition of phoneme strings into variable-length sequences, by multigrams, International Conference of PHonetic Sciences (ICPHS), 1995. ,
A Scalable Method for Preserving Oral Literature from Small Languages, 2010. ,
Bootstrapping the language archive: New prospects for natural language processing in preserving linguistic heritage, Linguistic Issues in Language Technology, vol.6, pp.1-16, 2011. ,
Machine Translation for Language Preservation, The COLING 2012 Organizing Committee, pp.125-134, 2012. ,
Aikuma: A mobile app for collaborative language documentation, 2014. ,
Parallel Speech Collection for Under-resourced Language Studies Using the LIG-AIKUMA Mobile Device App, Procedia Computer Science, vol.81, pp.61-66, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01350065
Ferguson Distributions Via Polya Urn Schemes, Ann. Statist, vol.1, issue.2, pp.353-355, 1973. ,
, Linguarum Africae Australis, Copticae, Semiticarum Aliarumque Sexualium... apud Adolphum Marcum, 1851
, Latent Dirichlet Allocation. the Journal of machine Learning research, vol.3, pp.993-1022, 2003.
Word-Based Morphology, Journal of Linguistics, vol.42, issue.03, pp.531-573, 2006. ,
A small Griko-Italian speech translation corpus, Proceedings of SLTU, 2018. ,
Exploring the Role of Stress in Bayesian Word Segmentation using Adaptor Grammars, Transactions of the Association of Computational Linguistics, vol.2, pp.93-104, 2014. ,
Adaptor Grammars for Learning Non-Concatenative Morphology, EMNLP, pp.345-356, 2013. ,
An Efficient, Probabilistically Sound Algorithm for Segmentation and Word Discovery, Machine Learning, vol.34, pp.71-105, 1999. ,
Language Diversity Endangered, Trends in Linguistics. Studies and Monographs ,
, , 2008.
Groupe d'experts spécial de l'UNESCO sur les langues en danger), 2003. ,
The mathematics of statistical machine translation: Parameter estimation, Computational linguistics, vol.19, issue.2, pp.263-311, 1993. ,
Alignment Models and Algorithms for Statistical Machine Translation, 2010. ,
Morphology-Aware Alignments for Translation to and from a Synthetic Language, Proceedings of the International Workshop on Spoken Language Translation, IWSLT'15, pp.188-195, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01635005
Learning Morphological Normalization for Translation from and into Morphologically Rich Languages, Prague Bulletin of Mathematical Linguistics, issue.108, pp.49-60, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01618382
A Teacher-Student Framework for Zero-Resource Neural Machine Translation, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, vol.1, pp.1925-1935, 2017. ,
On the Properties of Neural Machine Translation: Encoder-Decoder Approaches, Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, pp.103-111, 2014. ,
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp.1724-1734, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01433235
Empirical evaluation of gated recurrent neural networks on sequence modeling, NIPS 2014 Workshop on Deep Learning, 2014. ,
Unsupervised Tokenization for Machine Translation, Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pp.718-726, 2009. ,
Unsupervised CrossModal Alignment of Speech and Text Embedding Spaces, Advances in Neural Information Processing Systems, vol.31, pp.7365-7375, 2018. ,
Variational inference for adaptor grammars, Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp.564-572, 2010. ,
Incorporating structural alignment biases into an attentional neural translation model, 2016. ,
Character-based Neural Machine Translation, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, vol.2, pp.357-361, 2016. ,
Elements of Information Theory, 2006. ,
Unsupervised Segmentation of Words Using Prior Distributions of Morph Length and Frequency, Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, pp.280-287, 2003. ,
Unsupervised Discovery of Morphemes, Proceedings of the ACL-02 Workshop on Morphological and Phonological Learning, pp.21-30, 2002. ,
Induction of a Simple Morphology for HighlyInflecting Languages, Proceedings of the Seventh Meeting of the ACL Special Interest Group in Computational Phonology, pp.43-51, 2004. ,
Inducing the morphological lexicon of a natural language from unannotated text, Proceedings of the International and Interdisciplinary Conference on Adaptive Knowledge Representation and Reasoning (AKRR'05), 2005. ,
Unsupervised Models for Morpheme Segmentation and Morphology Learning, ACM Trans. Speech Lang. Process, vol.4, issue.1, 2007. ,
Language Death, 2000. ,
On the impact of morphology in English to Spanish statistical MT, Speech Communication, vol.50, issue.11-12, pp.1034-1046, 2008. ,
URL : https://hal.archives-ouvertes.fr/hal-00499216
Minimum Bayes Risk Combination of Translation Hypotheses from Alternative Morphological Decompositions, Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers, pp.73-76, 2009. ,
Linguistic Structure As Composition and Perturbation, Proceedings of the 34th Annual Meeting on Association for Computational Linguistics, ACL '96, pp.335-341, 1996. ,
Morphemes as Necessary Concept for Structures Discovery from Untagged Corpora, Proceedings of the Workshop on Paradigms and Grounding in Natural Language Learning, pp.295-299, 1998. ,
Language Modeling by Variable Length Sequences: Theoretical Formulation and Evaluation of Multigrams, Acoustics, Speech, and Signal Processing, vol.1, pp.169-172, 1995. ,
Variable-Length Sequence Matching for Phonetic Transcription Using Joint Multigrams, Fourth European Conference on Speech Communication and Technology, 1995. ,
Selection of Multiphone Synthesis Units and Grapheme-to-Phoneme Transcription Using Variable-Length Modeling of Strings, Data-Driven Techniques in Speech Synthesis, pp.125-147, 2001. ,
Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal statistical Society, vol.39, issue.1, pp.1-38, 1977. ,
Why Generative Phrase Models Underperform Surface Heuristics, Proceedings on the Workshop on Statistical Machine Translation, pp.31-38, 2006. ,
Sampling Alignment Structure under a Bayesian Translation Model, Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pp.314-323, 2008. ,
HMM Word and Phrase Alignment for Statistical Machine Translation, Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, pp.47-49, 2005. ,
Fast and Robust Neural Network Joint Models for Statistical Machine Translation, Proceedings of the 52nd Annual Meeting of the Association for BIBLIOGRAPHY Computational Linguistics, vol.1, pp.1370-1380, 2014. ,
A high speed transcription interface for annotating primary linguistic data, Proceedings of the 6th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, pp.7-12, 2012. ,
Practical Language Development: Whose Mission? Language, vol.85, pp.619-629, 2009. ,
On the ambiguous segmental status of nasals in homorganic NC sequences, The Internal Organization of Phonological Segments, pp.183-216, 2005. ,
Graphical Models over Multiple Strings, Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, vol.1, pp.101-110, 2009. ,
Discovering Morphological Paradigms from Plain Text Using a Dirichlet Process Mixture Model, Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp.616-627, 2011. ,
Latent-variable Modeling of String Transductions with Finite-state Methods, Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP '08, pp.1080-1089, 2008. ,
The Zero Resource Speech Challenge, Automatic Speech Recognition and Understanding, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01687504
An attentional model for speech translation without transcription, Proceedings of NAACL-HLT, pp.949-959, 2016. ,
The pay-offs of preprocessing for GermanEnglish Statistical Machine Translation, Proceedings of the Seventh International Workshop on Spoken Language Translation (IWSLT), pp.251-258, 2010. ,
Using a maximum entropy model to build segmentation lattices for MT, Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp.406-414, 2009. ,
A Simple, Fast, and Effective Reparameterization of IBM Model 2, Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp.644-648, 2013. ,
Noisier Channel": Translation from Morphologically Complex Languages, Proceedings of the Second Workshop on Statistical Machine Translation, pp.207-211, 2007. ,
Linguistics for Students of Asian and African Languages, 2004. ,
Extending the Use of Adaptor Grammars for Unsupervised Morphological Segmentation of Unseen Languages, Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp.900-910, 2016. ,
The myth of language universals: Language diversity and its importance for cognitive science, Behavioral and Brain Sciences, vol.32, issue.05, p.429, 2009. ,
Cultural Constraints on Grammar and Cognition in Pirahã: Another Look at the Design Features of Human Language, Current Anthropology, vol.46, issue.4, pp.621-646, 2005. ,
Simultaneous Word-Morpheme Alignment for Statistical Machine Translation, HLT-NAACL, pp.32-40, 2013. ,
Implicit Distortion and Fertility Models for Attention-based Encoder-Decoder NMT Model, 2016. ,
Linguistically Motivated Unsupervised Segmentation for Machine Translation, Proceedings of the Language Ressources and Evaluation Conference, 2010. ,
Reversing Language Shift: Theoretical and Empirical Foundations of Assistance to Threatened Languages. Multilingual Matters. Multilingual Matters, 1991. ,
, Proc. of CMCL, 2013.
Measuring Word Alignment Quality for Statistical Machine Translation, Computational linguistics, vol.33, issue.3, pp.293-303, 2007. ,
Modeling Inflection and Word-Formation in SMT, Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, pp.664-674, 2012. ,
Posterior regularization for structured latent variable models, The Journal of Machine Learning Research, vol.99, 2001. ,
, Convolutional Sequence to Sequence Learning, 2017.
What does Attention in Neural Machine Translation Pay Attention to?, 2017. ,
Understanding the difficulty of training deep feedforward neural networks, Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, vol.9, pp.249-256, 2010. ,
Typologie Pour l'alignement Multilingue, vol.3, 2014. ,
Enlightening the Bulb : Unsupervised learning of morphology for word and subword alignments, 2016. ,
Annie Rialland, and François Yvon. Preliminary Experiments on Unsupervised Word Discovery in Mboshi, Proceedings of Interspeech, 2016. ,
François Yvon, and Marcely Zanon Boito. A Very Low Resource Language Speech Corpus for Computational Language Documentation Experiments, Proceedings of LREC, 2018. ,
Adaptor Grammars for the Linguist: Word Segmentation Experiments for Very Low-Resource Languages, Proceedings of the 15th Meeting of the ACL Special Interest Group on Computational Morphology and Phonology (SIGMORPHON), 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01910757
Unsupervised Learning of Word Segmentation: Does Tone Matter?, Proceedings of the 19th International Conference on Computational Linguistics and Intelligent Text Processing (CICLING), 2018. ,
Unsupervised Word Segmentation from Speech with Attention, Proceedings of Interspeech, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01818092
Unsupervised Learning of the Morphology of a Natural Language, Computational Linguistics, vol.27, issue.2, pp.153-198, 2001. ,
Nonparametric Bayesian Models of Lexical Acquisition, 2006. ,
Improving Statistical MT through Morphological Analysis, Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, pp.676-683, 2005. ,
Contextual Dependencies in Unsupervised Word Segmentation, Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pp.673-680, 2006. ,
Interpolating Between Types and Tokens by Estimating Power-Law Generators, Advances in Neural Information Processing Systems, vol.18, pp.459-466, 2006. ,
A Bayesian Framework for Word Segmentation: Exploring the Effects of Context, Cognition, vol.112, issue.1, pp.21-54, 2009. ,
Improving bilingual sub-sentential alignment by sampling-based transpotting, Proceedings of the International Workshop on Spoken Language Translation, 2013. ,
Learning tractable word alignment models with complex constraints, Computational Linguistics, vol.36, issue.3, pp.481-504, 2010. ,
Expectation Maximization and Posterior Constraints, NIPS, vol.20, pp.569-576, 2007. ,
Morfessor FlatCat: An HMM-Based Method for Unsupervised and Semi-Supervised Learning of Morphology, Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pp.1177-1185, 2014. ,
Universal Neural Machine Translation for Extremely Low Resource Languages, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol.1, pp.344-354, 2018. ,
The Classification of the Bantu Languages, 1948. ,
Comparative Bantu: An Introduction to the Comparative Linguistics and Prehistory of the Bantu Languages, 1967. ,
Arabic Preprocessing Schemes for Statistical Machine Translation, Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers, pp.49-52, 2006. ,
Documenting Endangered Languages: Achievements and Perspectives, 2011. ,
On endangered languages and the safeguarding of diversity, Language, vol.68, issue.1, pp.1-3, 1992. ,
, Endangered Languages. Language, vol.68, pp.1-42, 1992.
Focus marking and the unavailability of inversion structures in the Bantu language Bàsàá, Lingua, vol.154, pp.35-64, 2015. ,
Unsupervised Learning of Morphology, Computational Linguistics, vol.37, issue.2, pp.309-350, 2011. ,
Large-scale text collection for unwritten languages, Proceedings of the 6th International Joint Conference on Natural Language Processing, pp.1134-1138, 2013. ,
From Phoneme to Morpheme, Language, vol.31, issue.2, pp.190-222, 1955. ,
When Languages Die: The Extinction of the World's Languages and the Erosion of Human Knowledge, Oxford Studies in Sociolinguistics Series, 2007. ,
Should linguistic diversity be conserved like biodiversity?, 2012. ,
The Faculty of Language: What Is It, Who Has It, and How Did It Evolve?, Science, vol.298, issue.5598, pp.1569-1579, 2002. ,
Language Universals, Markedness Theory, and Natural Phonetic Processes, 1986. ,
Iterative Bayesian word segmentation for unsupervised vocabulary discovery from phoneme lattices, Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference On, pp.4057-4061, 2014. ,
Language documentation: What is it and what is it good for?, Jost Gippert, Nikolaus P. Himmelmann, and Ulrike Mosel ,
, , 2006.
Approaches to and Strategies for Language Revitalization, The Oxford Handbook of Endangered Languages, 2018. ,
Long Short-Term Memory, Neural Computation, vol.9, issue.8, pp.1735-1780, 1997. ,
Refining the SED Heuristic for Morpheme Discovery: Another Look at Swahili, Proceedings of the Workshop on Psychocomputational Models of Human Language Acquisition, pp.28-35, 2005. ,
Grammatical Tone: Sorting out the Differences, Tonal Aspects of Languages, pp.6-11, 2016. ,
A summary of the 2012 JHU CLSP workshop on zero resource speech technologies and models of early language acquisition, 2013. ,
Unsupervised Word Segmentation for Sesotho Using Adaptor Grammars, Proceedings of the Tenth Meeting of ACL Special Interest Group on Computational Morphology and Phonology, pp.20-27, 2008. ,
Using Adaptor Grammars to Identify Synergies in the Unsupervised Acquisition of Linguistic Structure, Proceedings of ACL-08: HLT, pp.398-406, 2008. ,
Unsupervised phonemic Chinese word segmentation using Adaptor Grammars, Proceedings of the 23rd International Conference on Computational Linguistics, pp.528-536, 2010. ,
Improving Nonparameteric Bayesian Inference: Experiments on Unsupervised Word Segmentation with Adaptor Grammars, Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp.317-325, 2009. ,
Bayesian Inference for PCFGs via Markov Chain Monte Carlo. In Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics, Proceedings of the Main Conference, pp.139-146, 2007. ,
Adaptor Grammars: A Framework for Specifying Compositional Nonparametric Bayesian Models, Advances in Neural Information Processing Systems, vol.19, pp.641-648, 2007. ,
Modelling Function Words Improves Unsupervised Word Segmentation, Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp.282-292, 2014. ,
Unsupervised Neural and Bayesian Models for Zero-Resource Speech Processing, 2016. ,
Visually Grounded Cross-Lingual Keyword Spotting in Speech, The 6th Intl. Workshop on Spoken Language Technologies for Under-Resourced Languages, pp.248-252, 2018. ,
Fully Unsupervised SmallVocabulary Speech Recognition Using a Segmental Bayesian Model, Sixteenth Annual Conference of the International Speech Communication Association, 2015. ,
Zipf's Law of Abbreviation and the Principle of Least Effort: Language users optimise a miniature lexicon for efficient communication, Cognition, vol.165, pp.45-52, 2017. ,
Discovering the phoneme inventory of an unwritten language: A machine-assisted approach, Speech Communication, vol.56, pp.152-166, 2014. ,
Structured Attention Networks, 5th International Conference on Learning Representations, p.21, 2017. ,
Trivial Transfer Learning for Low-Resource Neural Machine Translation, Proceedings of the Third Conference on Machine Translation: Research Papers, pp.244-252, 2018. ,
Statistical Machine Translation, 2010. ,
URL : https://hal.archives-ouvertes.fr/hal-01433972
, Neural Machine Translation, 2017.
Empirical methods for compound splitting, EACL '03: Proceedings of the Tenth Conference on European Chapter of the Association for Computational Linguistics, pp.187-193, 2003. ,
Six Challenges for Neural Machine Translation, Proceedings of the First Workshop on Neural Machine Translation, pp.28-39, 2017. ,
Statistical phrase-based translation, Proceedings of the 2003 Conference of the North American Chapter, vol.1, pp.48-54, 2003. ,
Open source toolkit for statistical machine translation, Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pp.177-180, 2007. ,
Semi-Supervised Learning of Concatenative Morphology, Proceedings of the 11th Meeting of the ACL Special Interest Group on Computational Morphology and Phonology, pp.78-86, 2010. ,
Variations de Formes Dans La Langue Mbochi (Bantu C25), 2014. ,
The world's languages in crisis, Language, vol.68, issue.1, pp.4-10, 1992. ,
Attention Focusing for Neural Machine Translation by Bridging Source and Target Embeddings, 2017. ,
, Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates, 2018.
Morpho Challenge Competition 2005-2010: Evaluations and Results, Proceedings of the 11th Meeting of the ACL Special Interest Group on Computational Morphology and Phonology, pp.87-95, 2010. ,
Conditional random fields: Probabilistic models for segmenting and labeling sequence data, 2001. ,
Unsupervised Machine Translation Using Monolingual Corpora Only, ICLR 2018, p.14, 2018. ,
Hierarchical sub-sentential alignment with anymalign, Proceedings of the 16th Annual Conference of the European Association for Machine Translation (EAMT 2012), pp.279-286, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00747385
Practical Very Large Scale CRFs, Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp.504-513, 2010. ,
Continuous Space Translation Models with Neural Networks, Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp.39-48, 2012. ,
Unsupervised lexicon discovery from acoustic input, Transactions of the Association for Computational Linguistics, vol.3, pp.389-403, 2015. ,
Documentation and analysis of an endangered language: Aspects of the grammar of Griko, 2013. ,
, Ethnologue: Languages of the World. SIL international, 2009.
, Ethnologue: Languages of the World. SIL international, 2015.
, Ethnologue: Languages of the World. SIL international, 2018.
Assessing Endangerment: Expanding Fishman's GIDS. Revue roumaine de linguistique, vol.55, pp.102-120, 2010. ,
The length of words reflects their conceptual complexity, Cognition, vol.153, pp.182-195, 2016. ,
Alignment by Agreement, Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, pp.104-111, 2006. ,
Learning When to Concentrate or Divert Attention: Self-Adaptive Attention Temperature for Neural Machine Translation, 2018. ,
, Neural Machine Translation with Supervised Attention, 2016.
Statistical machine translation, ACM Computing Surveys, vol.40, issue.3, pp.1-49, 2008. ,
Une méthode non-supervisée pour la segmentation morphologique et l'apprentissage de morphotactique à l'aide de processus de PitmanYor, 2016. ,
Prosodic boundary information helps unsupervised word segmentation, Annual Conference of the North American Chapter of the ACL, pp.953-963, 2015. ,
Effective Approaches to Attention-based Neural Machine Translation, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp.1412-1421, 2015. ,
A Phrase-Based,Joint Probability Model for Statistical Machine Translation, Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing, pp.133-139, 2002. ,
, Manual Annotation of Translational Equivalence: The Blinker Project. arXiv preprint cmp-lg/9805005, 1998.
Supervised Attentions for Neural Machine Translation, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp.2283-2288, 2016. ,
Integrating automatic transcription into the language documentation workflow: Experiments with Na data and the Persephone toolkit, Language Documentation & Conservation, vol.12, pp.393-429, 2018. ,
URL : https://hal.archives-ouvertes.fr/halshs-01841979
Recurrent neural network based language model, INTERSPEECH, pp.1045-1048, 2010. ,
Bayesian unsupervised word segmentation with nested Pitman-Yor language modeling, Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, vol.1, pp.100-108, 2009. ,
Improving IBM word-alignment model 1, Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, p.518, 2004. ,
Language Adaptive DNNs for Improved Low Resource Speech Recognition, pp.3878-3882, 2016. ,
Towards Phoneme Inventory Discovery for Documentation of Unwritten Languages, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017. ,
Unsupervised Bilingual Morpheme Segmentation and Alignment with Context-rich Hidden Semi-Markov Models, Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp.895-904, 2011. ,
Vanishing Voices: The Extinction of the World's Languages, 2000. ,
Correct Parallelization for Blocked Gibbs Sampling, 2014. ,
Rapid Adaptation of Neural Machine Translation to New Languages, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp.875-880, 2018. ,
Learning a language model from continuous speech, INTERSPEECH, pp.1053-1056 ,
,
An Unsupervised Model for Joint Phrase Alignment and Extraction, Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol.1, pp.632-641, 2011. ,
Machine Translation without Words through Substring Alignment, Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, vol.1, pp.165-174, 2012. ,
XNMT: The eXtensible Neural Machine Translation Toolkit, Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (AMTA), 2018. ,
Nonparametric Word Segmentation for Machine Translation, Proceedings of the 23rd International Conference on Computational Linguistics, COLING '10, pp.815-823, 2010. ,
Toward hierarchical models for statistical machine translation of inflected languages, Proceedings of the Workshop on Data-Driven Methods in Machine Translation, vol.14, pp.1-8, 2001. ,
Improved statistical alignment models, Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, pp.440-447, 2000. ,
A Systematic Comparison of Various Statistical Alignment Models, Computational Linguistics, vol.29, issue.1, pp.19-51, 2003. ,
Oxford Handbooks Online, 2015. ,
Fragment Grammars: Exploring Computation and Reuse in Language, 2009. ,
Exploring Different Representational Units in English-to-Turkish Statistical Machine Translation, Proceedings of the Second Workshop on Statistical Machine Translation, pp.25-32, 2007. ,
Bayesian Models for Unit Discovery on a Very Low Resource Language, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01709589
A Simple View of Linguistic Complexity, Second Language Research, vol.31, issue.1, pp.117-134, 2015. ,
Structural aspects of language endangerment, The Cambridge Handbook of Endangered Languages, Cambridge Handbooks in Language and Linguistics, pp.100-119, 2011. ,
BLEU: A Method for Automatic Evaluation of Machine Translation, Proceedings of 40th Annual Meeting of the Association for Computational Linguistics, pp.311-318, 2002. ,
Automatic differentiation in PyTorch, 2017. ,
Generating Alignments Using Target Foresight in Attention-Based Neural Machine Translation, The Prague Bulletin of Mathematical Linguistics, vol.108, issue.1, pp.27-36, 2017. ,
Word lengths are optimized for efficient communication, Proceedings of the National Academy of Sciences, vol.108, issue.9, pp.3526-3529, 2011. ,
Improved Speech-to-Text Translation with the Fisher and Callhome Spanish-English Speech Translation Corpus, Proceedings of the International Workshop on Spoken Language Translation (IWSLT), 2013. ,
Using Word Vectors to Improve Word Alignments for Low Resource Machine Translation, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol.2, pp.524-528, 2018. ,
How intonations interact with tones in Embosi (Bantu C25), a two-tone language without downdrift, Intonation in African Tone Languages, vol.24, 2016. ,
Dropping of the class-prefix consonant, vowel elision and automatic phonological mining in Embosi, Proceedings of the 44th ACAL Meeting, pp.221-230, 2015. ,
URL : https://hal.archives-ouvertes.fr/halshs-01251202
, Parallel Corpora in Mboshi, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01710043
Zero-shot Neural Transfer for Cross-lingual Entity Linking, Thirty-Third AAAI Conference on Artificial Intelligence (AAAI), 2019. ,
, Jorna Rissanen. Stochastic Complexity in Statistic Inquiry. Series in Computer Science, vol.15, 1989.
Language Endangerment and Language Death, The Routledge Handbook of Ecolinguistics, chapter, 2017. ,
Statistical Learning by 8-Month-Old Infants, Science, vol.274, issue.5294, pp.1926-1928, 1996. ,
Linguistic Fieldwork: A Student Guide. Cambridge Textbooks in Linguistics, 2012. ,
Temporal Attention Model for Neural Machine Translation, 2016. ,
Linguistic Unit Discovery from Multi-Modal Inputs in Unwritten Languages: Summary of the "Speaking Rosetta" JSALT 2017 Workshop, IEEE International Conference on Acoustics, Speech and Signal Processing, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01709578
Building an ASR System for Mboshi Using A Cross-Language Definition of Acoustic Units Approach, The 6th Intl. Workshop on Spoken Language Technologies for UnderResourced Languages, pp.167-171, 2018. ,
Probabilistic part-of-speech tagging using decision trees, Proceedings of the International Conference on New Methods in Language Processing, vol.12, pp.44-49, 1994. ,
, Bidirectional Recurrent Neural Networks. Trans. Sig. Proc, vol.45, issue.11, pp.2673-2681, 1997.
Neural Machine Translation of Rare Words with Subword Units, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, vol.1, pp.1715-1725, 2016. ,
A Mathematical Theory of Communication, The Bell System Technical Journal, vol.27, issue.3, pp.379-423, 1948. ,
Minimally-Supervised Morphological Segmentation Using Adaptor Grammars, Transactions of the Association for Computational Linguistics, vol.1, pp.255-266, 2013. ,
Morfessor 2.0: Toolkit for Statistical Morphological Segmentation, The 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL), Gothenburg, 2014. ,
Cross-lingual Propagation for Morphological Analysis, Proceedings of the 23rd National Conference on Artificial Intelligence, vol.2, pp.848-854, 2008. ,
Unsupervised Multilingual Learning for Morphological Segmentation, Proceedings of ACL-08: HLT, pp.737-745, 2008. ,
Dropout: A Simple Way to Prevent Neural Networks from Overfitting, The Journal of Machine Learning Research, vol.15, issue.1, pp.1929-1958, 2014. ,
Word segmentation through cross-lingual word-to-phoneme alignment, Spoken Language Technology Workshop (SLT), pp.85-90, 2012. ,
Word segmentation and pronunciation extraction from phoneme sequences through cross-lingual wordto-phoneme alignment, Computer Speech & Language, 2014. ,
Human Translations Guided Language Discovery for ASR Systems, 10th International Conference on Speech Science and Speech Technology, pp.1-4, 2009. ,
,
Innovative Technologies for Under-Resourced Language Documentation: The Bulb Project, Proceedings of CCURL (Collaboration and Computing for Under-Resourced Languages : Toward an Alliance for Digital Language Diversity), 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01350124
Sequence to sequence learning with neural networks, Advances in Neural Information Processing Systems, pp.3104-3112, 2014. ,
Dublin City University and Association for Computational Linguistics, Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pp.2326-2334, 2014. ,
A hierarchical Bayesian language model based on Pitman-Yor processes, Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, pp.985-992, 2006. ,
Bitext Alignment. Synthesis Lectures on Human Language Technologies, 2011. ,
Why Initialization Matters for IBM Model 1: Multiple Optima and Non-Strict Convexity, Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp.461-466, 2011. ,
Language Endangerment and Language Revitalization: An Introduction, 2006. ,
Modeling Coverage for Neural Machine Translation, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, vol.1, pp.76-85, 2016. ,
Using POS Information for SMT into Morphologically Rich Languages, 10th Conference of the European Chapter of the Association for Computational Linguistics, 2003. ,
From Characters to Words to in Between: Do We Capture Morphology?, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, vol.1, pp.2016-2027, 2017. ,
, Attention Is All You Need, 2017.
The zero resource speech challenge, Proc. of Interspeech, 2015. ,
Morphology-aware statistical machine translation based on morphs induced in an unsupervised manner, Proceedings of the Machine Translation Summit XI, pp.46-47, 2007. ,
Applying Morphological Decompositions to Statistical Machine Translation, Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR, pp.195-200, 2010. ,
PESA: Phrase Pair Extraction as Sentence Splitting, 2005. ,
HMM-based word alignment in statistical translation, Proceedings of the 16th Conference on Computational Linguistics, pp.836-841, 1996. ,
A Learning Algorithm for Continually Running Fully Recurrent Neural Networks, Neural Computation, vol.1, issue.2, pp.270-280, 1989. ,
Defining documentary linguistics, vol.1, pp.35-51, 2003. ,
Language Documentation, The Cambridge Handbook of Endangered Languages, Cambridge Handbooks in Language and Linguistics, pp.159-186, 2011. ,
Stochastic inversion transduction grammars and bilingual parsing of parallel corpora, Computational linguistics, vol.23, issue.3, pp.377-403, 1997. ,
Multilingual Structural Projection across Interlinear Text, The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference, pp.452-459, 2007. ,
Neural Cross-Lingual Named Entity Recognition with Minimal Resources, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp.369-379, 2018. ,
Partitioning Parallel Documents Using Binary Segmentation, Proceedings on the Workshop on Statistical Machine Translation, pp.78-85, 2006. ,
Bayesian SemiSupervised Chinese Word Segmentation for Statistical Machine Translation, Proceedings of the 22nd International Conference on Computational Linguistics, pp.1017-1024, 2008. ,
, Neural Machine Translation with Recurrent Attention Modeling, 2016.
Syntax-to-Morphology Mapping in Factored Phrase-Based Statistical Machine Translation from English to Turkish, Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL'10, pp.454-464, 2010. ,
Unwritten Languages Demand Attention Too! Word Discovery with Encoder-Decoder Models, Automatic Speech Recognition and Understanding, 2017. ,
Online adaptor grammars with hybrid inference, Transactions of the Association for Computational Linguistics, vol.2, pp.465-476, 2014. ,
Competitive Grouping in Integrated Phrase Segmentation and Alignment Model, Proceedings of the ACL Workshop on Building and Using Parallel Texts, pp.159-162, 2005. ,
An efficient phrase-to-phrase alignment model for arbitrarily long phrase and large corpora, Proceedings of the 10th Conference of the European Association for Machine Translation (EAMT-05), pp.30-31, 2005. ,
Integrated phrase segmentation and alignment algorithm for statistical machine translation, International Conference on Natural Language Processing and Knowledge Engineering, pp.567-573, 2003. ,
Transfer Learning for Low-Resource Neural Machine Translation, Proceedings of the 2016 Conference on ,