, 109 6.3.2 Extensions: towards joint alignment and segmentation, p.110

D. .. Experiments,

. .. Conclusion and . Neubig, 2017) 15 in order to integrate the extensions described in Section 6.3.2. The last version of our code, which handles mini-batches efficiently, heavily borrows from Joost Basting's code. 16 Source sentences include an end-of-sentence (EOS) special symbol 12 We count here the end-of-sentence token corresponding to the last column in the attention matrices, vol.13, 2018.

. .. Summary, 124 7.1.2 Synthesis of the main results for Mboshi, p.125

C. .. Perspectives-in,

O. Adams, Automatic Understanding of Unwritten Languages, 2017.

O. Adams, G. Neubig, T. Cohn, and S. Bird, Inducing Bilingual Lexicons from Small Quantities of Sentence-Aligned Phonemic Transcriptions, 12th International Workshop on Spoken Language Translation (IWSLT), 2015.

O. Adams, G. Neubig, T. Cohn, and S. Bird, Learning a Translation Model from Word Lattices, Proceedings of INTERSPEECH, pp.2518-2522, 2016.

O. Adams, G. Neubig, T. Cohn, S. Bird, Q. Truong-do et al., Learning a Lexicon and Translation Model from Phoneme Lattices, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp.2377-2382, 2016.

O. Adams, T. Cohn, G. Neubig, H. Cruz, S. Bird et al., Evaluating Phonemic Transcription of Low-Resource Tonal Languages for Language Documentation, 2018.
URL : https://hal.archives-ouvertes.fr/halshs-01709648

G. Adda, S. Stüker, M. Adda-decker, O. Ambouroue, L. Besacier et al., Breaking the Unwritten Language Barrier: The Bulb Project, Proceedings of SLTU (Spoken Language Technologies for Under-Resourced Languages), 2016.
URL : https://hal.archives-ouvertes.fr/halshs-01428027

T. Alkhouli and H. Ney, Biasing Attention-Based Recurrent Neural Networks Using External Alignment Information, Proceedings of the Second Conference on Machine Translation, pp.108-117, 2017.

T. Alkhouli, G. Bretschner, and H. Ney, On The Alignment Problem In Multi-Head Attention-Based Neural Machine Translation, Proceedings of the Third Conference on Machine Translation: Research Papers, pp.177-185, 2018.

A. Allauzen, F. Yvon-;-gaussier, E. Yvon, and F. , Méthodes statistiques pour la traduction automatique, éditeurs: Modèles statistiques pour l'accès à l'information textuelle, vol.7, pp.271-356, 2011.

C. L. Amboulou and . Mbochi, , 1998.

O. Ambouroue, Éléments de Description de l'orungu, Langue Bantu Du Gabon (B11b), 2007.

A. Anastasopoulos, Computational Tools for Endangered Language Documentation, 2019.

A. Anastasopoulos and D. Chiang, Tied Multitask Learning for Neural Speech Translation, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp.82-91, 2018.

A. Anastasopoulos, S. Bansal, D. Chiang, S. Goldwater, and A. Lopez, Spoken Term Discovery for Language Documentation using Translations, Proceedings of the Workshop on Speech-Centric Natural Language Processing, pp.53-58, 2017.

A. Anastasopoulos and D. Chiang, Leveraging translations for speech transcription in low-resource settings, 2018.

C. E. Antoniak, Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems, Ann. Statist, vol.2, issue.6, pp.1152-1174, 1974.

M. Aronoff, Word Formation in Generative Grammar, 1976.

M. Artetxe and H. Schwenk, Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond, 2018.

M. Artetxe, G. Labaka, E. Agirre, and K. Cho, Unsupervised Neural Machine Translation, 2018.

E. Auer, P. Wittenburg, H. Sloetjes, O. Schreer, S. Masneri et al., Automatic annotation of media field recordings, Proceedings of the ECAI 2010 Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, 2010.

K. Peter, J. Austin, and . Sallabank, The Cambridge Handbook of Endangered Languages. Cambridge Handbooks in Language and Linguistics, 2011.

D. Bahdanau, K. Cho, and Y. Bengio, Neural machine translation by jointly learning to align and translate, 2014.

S. Bansal, H. Kamper, A. Lopez, and S. Goldwater, Towards speechto-text translation without speech recognition, Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, vol.2, pp.474-479, 2017.

S. Bansal, H. Kamper, K. Livescu, A. Lopez, and S. Goldwater, Low-Resource Speech-to-Text Translation, Interspeech 2018, pp.1298-1302, 2018.

S. Bansal, H. Kamper, K. Livescu, A. Lopez, and S. Goldwater, Pre-training on high-resource speech recognition improves low-resource speech-to-text translation, 2018.

P. Bedrosian, The Mboshi noun class system, Journal of West African Languages, vol.26, issue.1, pp.27-47, 1996.

T. Berg-kirkpatrick, A. Bouchard-côté, J. Denero, and D. Klein, Painless Unsupervised Learning with Features, Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp.582-590, 2010.

J. Bergstra, D. Yamins, and D. Cox, Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures, Proceedings of the 30th International Conference on Machine Learning, vol.28, pp.115-123, 2013.

L. Besacier, B. Zhou, and Y. Gao, Towards Speech Translation of Non Written Languages, Spoken Language Technology Workshop, pp.222-225, 2006.

J. A. Bilmes and K. Kirchhoff, Factored Language Models and Generalized Parallel Backoff, Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol.2, pp.4-6, 2003.

F. Bimbot, S. Deligne, and F. Yvon, Unsupervised decomposition of phoneme strings into variable-length sequences, by multigrams, International Conference of PHonetic Sciences (ICPHS), 1995.

S. Bird, A Scalable Method for Preserving Oral Literature from Small Languages, 2010.

S. Bird, Bootstrapping the language archive: New prospects for natural language processing in preserving linguistic heritage, Linguistic Issues in Language Technology, vol.6, pp.1-16, 2011.

S. Bird and D. Chiang, Machine Translation for Language Preservation, The COLING 2012 Organizing Committee, pp.125-134, 2012.

S. Bird, F. R. Hanke, O. Adams, and H. Lee, Aikuma: A mobile app for collaborative language documentation, 2014.

D. Blachon, E. Gauthier, L. Besacier, G. Kouarata, M. Addadecker et al., Parallel Speech Collection for Under-resourced Language Studies Using the LIG-AIKUMA Mobile Device App, Procedia Computer Science, vol.81, pp.61-66, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01350065

D. Blackwell and J. B. Macqueen, Ferguson Distributions Via Polya Urn Schemes, Ann. Statist, vol.1, issue.2, pp.353-355, 1973.

I. Wilhelm-heinrich, . Bleek, and . De-nominum-generibus, Linguarum Africae Australis, Copticae, Semiticarum Aliarumque Sexualium... apud Adolphum Marcum, 1851

D. M. Blei, A. Y. Ng, and M. I. Jordan, Latent Dirichlet Allocation. the Journal of machine Learning research, vol.3, pp.993-1022, 2003.

J. P. Blevins, Word-Based Morphology, Journal of Linguistics, vol.42, issue.03, pp.531-573, 2006.

A. Marcely-zanon-boito, M. Anastasopoulos, A. Lekakou, L. Villavicencio, and . Besacier, A small Griko-Italian speech translation corpus, Proceedings of SLTU, 2018.

B. Börschinger and M. Johnson, Exploring the Role of Stress in Bayesian Word Segmentation using Adaptor Grammars, Transactions of the Association of Computational Linguistics, vol.2, pp.93-104, 2014.

J. A. Botha and P. Blunsom, Adaptor Grammars for Learning Non-Concatenative Morphology, EMNLP, pp.345-356, 2013.

M. R. Brent, An Efficient, Probabilistically Sound Algorithm for Segmentation and Word Discovery, Machine Learning, vol.34, pp.71-105, 1999.

M. Brenzinger, Language Diversity Endangered, Trends in Linguistics. Studies and Monographs

D. Gruyter, , 2008.

M. Brenzinger, A. Dwyer, T. De-graaf, C. Grindevald, M. Krauss et al., Groupe d'experts spécial de l'UNESCO sur les langues en danger), 2003.

F. Peter, . Brown, J. D. Vincent, S. A. Pietra, R. L. Della-pietra et al., The mathematics of statistical machine translation: Parameter estimation, Computational linguistics, vol.19, issue.2, pp.263-311, 1993.

J. Brunning, Alignment Models and Algorithms for Statistical Machine Translation, 2010.

F. Burlot and F. Yvon, Morphology-Aware Alignments for Translation to and from a Synthetic Language, Proceedings of the International Workshop on Spoken Language Translation, IWSLT'15, pp.188-195, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01635005

F. Burlot and F. Yvon, Learning Morphological Normalization for Translation from and into Morphologically Rich Languages, Prague Bulletin of Mathematical Linguistics, issue.108, pp.49-60, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01618382

Y. Chen, Y. Liu, Y. Cheng, O. K. Victor, and . Li, A Teacher-Student Framework for Zero-Resource Neural Machine Translation, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, vol.1, pp.1925-1935, 2017.

K. Cho, D. Bart-van-merrienboer, Y. Bahdanau, and . Bengio, On the Properties of Neural Machine Translation: Encoder-Decoder Approaches, Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, pp.103-111, 2014.

K. Cho, C. Bart-van-merrienboer, D. Gulcehre, F. Bahdanau, H. Bougares et al., Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp.1724-1734, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01433235

J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, Empirical evaluation of gated recurrent neural networks on sequence modeling, NIPS 2014 Workshop on Deep Learning, 2014.

T. Chung and D. Gildea, Unsupervised Tokenization for Machine Translation, Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pp.718-726, 2009.

Y. Chung, W. Weng, S. Tong, and J. Glass, Unsupervised CrossModal Alignment of Speech and Text Embedding Spaces, Advances in Neural Information Processing Systems, vol.31, pp.7365-7375, 2018.

B. Shay, D. M. Cohen, N. A. Blei, and . Smith, Variational inference for adaptor grammars, Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp.564-572, 2010.

T. Cohn, C. Hoang, E. Vymolova, K. Yao, C. Dyer et al., Incorporating structural alignment biases into an attentional neural translation model, 2016.

M. R. Costa-jussà, A. R. José, and . Fonollosa, Character-based Neural Machine Translation, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, vol.2, pp.357-361, 2016.

M. Thomas, J. A. Cover, and . Thomas, Elements of Information Theory, 2006.

M. Creutz, Unsupervised Segmentation of Words Using Prior Distributions of Morph Length and Frequency, Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, pp.280-287, 2003.

M. Creutz and K. Lagus, Unsupervised Discovery of Morphemes, Proceedings of the ACL-02 Workshop on Morphological and Phonological Learning, pp.21-30, 2002.

M. Creutz and K. Lagus, Induction of a Simple Morphology for HighlyInflecting Languages, Proceedings of the Seventh Meeting of the ACL Special Interest Group in Computational Phonology, pp.43-51, 2004.

M. Creutz and K. Lagus, Inducing the morphological lexicon of a natural language from unannotated text, Proceedings of the International and Interdisciplinary Conference on Adaptive Knowledge Representation and Reasoning (AKRR'05), 2005.

M. Creutz and K. Lagus, Unsupervised Models for Morpheme Segmentation and Morphology Learning, ACM Trans. Speech Lang. Process, vol.4, issue.1, 2007.

D. Crystal, Language Death, 2000.

A. De-gispert and J. B. Mariño, On the impact of morphology in English to Spanish statistical MT, Speech Communication, vol.50, issue.11-12, pp.1034-1046, 2008.
URL : https://hal.archives-ouvertes.fr/hal-00499216

S. Adrià-de-gispert, M. Virpioja, W. Kurimo, and . Byrne, Minimum Bayes Risk Combination of Translation Hypotheses from Alternative Morphological Decompositions, Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers, pp.73-76, 2009.

. Carl-de-marcken, Linguistic Structure As Composition and Perturbation, Proceedings of the 34th Annual Meeting on Association for Computational Linguistics, ACL '96, pp.335-341, 1996.

H. Déjean, Morphemes as Necessary Concept for Structures Discovery from Untagged Corpora, Proceedings of the Workshop on Paradigms and Grounding in Natural Language Learning, pp.295-299, 1998.

S. Deligne and F. Bimbot, Language Modeling by Variable Length Sequences: Theoretical Formulation and Evaluation of Multigrams, Acoustics, Speech, and Signal Processing, vol.1, pp.169-172, 1995.

S. Deligne, F. Yvon, and F. Bimbot, Variable-Length Sequence Matching for Phonetic Transcription Using Joint Multigrams, Fourth European Conference on Speech Communication and Technology, 1995.

S. Deligne, F. Yvon, and F. Bimbot, Selection of Multiphone Synthesis Units and Grapheme-to-Phoneme Transcription Using Variable-Length Modeling of Strings, Data-Driven Techniques in Speech Synthesis, pp.125-147, 2001.

A. P. Dempster, N. M. Laird, and D. B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal statistical Society, vol.39, issue.1, pp.1-38, 1977.

J. Denero, D. Gillick, J. Zhang, and D. Klein, Why Generative Phrase Models Underperform Surface Heuristics, Proceedings on the Workshop on Statistical Machine Translation, pp.31-38, 2006.

J. Denero, A. Bouchard-côté, and D. Klein, Sampling Alignment Structure under a Bayesian Translation Model, Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pp.314-323, 2008.

Y. Deng and W. Byrne, HMM Word and Phrase Alignment for Statistical Machine Translation, Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, pp.47-49, 2005.

J. Devlin, R. Zbib, Z. Huang, T. Lamar, R. Schwartz et al., Fast and Robust Neural Network Joint Models for Statistical Machine Translation, Proceedings of the 52nd Annual Meeting of the Association for BIBLIOGRAPHY Computational Linguistics, vol.1, pp.1370-1380, 2014.

M. Dingemanse, J. Hammond, H. Stehouwer, A. Somasundaram, and S. Drude, A high speed transcription interface for annotating primary linguistic data, Proceedings of the 6th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, pp.7-12, 2012.

L. M. Dobrin and J. Good, Practical Language Development: Whose Mission? Language, vol.85, pp.619-629, 2009.

L. J. Downing, On the ambiguous segmental status of nasals in homorganic NC sequences, The Internal Organization of Phonological Segments, pp.183-216, 2005.

M. Dreyer and J. Eisner, Graphical Models over Multiple Strings, Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, vol.1, pp.101-110, 2009.

M. Dreyer and J. Eisner, Discovering Morphological Paradigms from Plain Text Using a Dirichlet Process Mixture Model, Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp.616-627, 2011.

M. Dreyer, J. R. Smith, and J. Eisner, Latent-variable Modeling of String Transductions with Finite-state Methods, Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP '08, pp.1080-1089, 2008.

E. Dunbar, X. N. Cao, J. Benjumea, J. Karadayi, M. Bernard et al., The Zero Resource Speech Challenge, Automatic Speech Recognition and Understanding, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01687504

L. Duong, A. Anastasopoulos, D. Chiang, S. Bird, and T. Cohn, An attentional model for speech translation without transcription, Proceedings of NAACL-HLT, pp.949-959, 2016.

F. Ilknur-durgar-el-kahlout and . Yvon, The pay-offs of preprocessing for GermanEnglish Statistical Machine Translation, Proceedings of the Seventh International Workshop on Spoken Language Translation (IWSLT), pp.251-258, 2010.

C. Dyer, Using a maximum entropy model to build segmentation lattices for MT, Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp.406-414, 2009.

C. Dyer, V. Chahuneau, and N. A. Smith, A Simple, Fast, and Effective Reparameterization of IBM Model 2, Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp.644-648, 2013.

J. Christopher, . Dyer, and . The, Noisier Channel": Translation from Morphologically Complex Languages, Proceedings of the Second Workshop on Statistical Machine Translation, pp.207-211, 2007.

H. Eifring and R. Theil, Linguistics for Students of Asian and African Languages, 2004.

R. Eskander, O. Rambow, and T. Yang, Extending the Use of Adaptor Grammars for Unsupervised Morphological Segmentation of Unseen Languages, Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp.900-910, 2016.

N. Evans and S. C. Levinson, The myth of language universals: Language diversity and its importance for cognitive science, Behavioral and Brain Sciences, vol.32, issue.05, p.429, 2009.

D. L. Everett, Cultural Constraints on Grammar and Cognition in Pirahã: Another Look at the Design Features of Human Language, Current Anthropology, vol.46, issue.4, pp.621-646, 2005.

E. Eyigöz, D. Gildea, and K. Oflazer, Simultaneous Word-Morpheme Alignment for Statistical Machine Translation, HLT-NAACL, pp.32-40, 2013.

S. Shi-feng, M. Liu, M. Li, and . Zhou, Implicit Distortion and Fertility Models for Attention-based Encoder-Decoder NMT Model, 2016.

M. Fishel and H. Kirik, Linguistically Motivated Unsupervised Segmentation for Machine Translation, Proceedings of the Language Ressources and Evaluation Conference, 2010.

J. A. Fishman, Reversing Language Shift: Theoretical and Empirical Foundations of Assistance to Threatened Languages. Multilingual Matters. Multilingual Matters, 1991.

A. Fourtassi, B. Börschinger, M. Johnson, and E. D. Whyisenglishsoeasytosegment, Proc. of CMCL, 2013.

A. Fraser and D. Marcu, Measuring Word Alignment Quality for Statistical Machine Translation, Computational linguistics, vol.33, issue.3, pp.293-303, 2007.

A. Fraser, M. Weller, A. Cahill, and F. Cap, Modeling Inflection and Word-Formation in SMT, Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, pp.664-674, 2012.

K. Ganchev, J. V. Graça, J. Gillenwater, and B. Taskar, Posterior regularization for structured latent variable models, The Journal of Machine Learning Research, vol.99, 2001.

J. Gehring, M. Auli, D. Grangier, D. Yarats, and Y. N. Dauphin, Convolutional Sequence to Sequence Learning, 2017.

H. Ghader and C. Monz, What does Attention in Neural Machine Translation Pay Attention to?, 2017.

X. Glorot and Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, vol.9, pp.249-256, 2010.

P. Godard, Typologie Pour l'alignement Multilingue, vol.3, 2014.

P. Godard and F. Yvon, Enlightening the Bulb : Unsupervised learning of morphology for word and subword alignments, 2016.

P. Godard, G. Adda, M. Adda-decker, A. Allauzen, L. Besacier et al., Annie Rialland, and François Yvon. Preliminary Experiments on Unsupervised Word Discovery in Mboshi, Proceedings of Interspeech, 2016.

P. Godard, G. Adda, M. Adda-decker, J. Benjumea, L. Besacier et al., François Yvon, and Marcely Zanon Boito. A Very Low Resource Language Speech Corpus for Computational Language Documentation Experiments, Proceedings of LREC, 2018.

P. Godard, L. Besacier, F. Yvon, M. Adda-decker, G. Adda et al., Adaptor Grammars for the Linguist: Word Segmentation Experiments for Very Low-Resource Languages, Proceedings of the 15th Meeting of the ACL Special Interest Group on Computational Morphology and Phonology (SIGMORPHON), 2018.
URL : https://hal.archives-ouvertes.fr/hal-01910757

P. Godard, K. Loser, A. Allauzen, L. Besacier, and F. Yvon, Unsupervised Learning of Word Segmentation: Does Tone Matter?, Proceedings of the 19th International Conference on Computational Linguistics and Intelligent Text Processing (CICLING), 2018.

P. Godard, . Marcely-zanon, L. Boito, A. Ondel, F. Berard et al., Unsupervised Word Segmentation from Speech with Attention, Proceedings of Interspeech, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01818092

J. Goldsmith, Unsupervised Learning of the Morphology of a Natural Language, Computational Linguistics, vol.27, issue.2, pp.153-198, 2001.

S. Goldwater, Nonparametric Bayesian Models of Lexical Acquisition, 2006.

S. Goldwater and D. Mcclosky, Improving Statistical MT through Morphological Analysis, Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, pp.676-683, 2005.

S. Goldwater, T. L. Griffiths, and M. Johnson, Contextual Dependencies in Unsupervised Word Segmentation, Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pp.673-680, 2006.

S. Goldwater, T. L. Griffiths, and M. Johnson, Interpolating Between Types and Tokens by Estimating Power-Law Generators, Advances in Neural Information Processing Systems, vol.18, pp.459-466, 2006.

S. Goldwater, T. L. Griffiths, and M. Johnson, A Bayesian Framework for Word Segmentation: Exploring the Effects of Context, Cognition, vol.112, issue.1, pp.21-54, 2009.

L. Gong, A. Max, and F. Yvon, Improving bilingual sub-sentential alignment by sampling-based transpotting, Proceedings of the International Workshop on Spoken Language Translation, 2013.

J. Graça, K. Ganchev, and B. Taskar, Learning tractable word alignment models with complex constraints, Computational Linguistics, vol.36, issue.3, pp.481-504, 2010.

J. Graça, K. Ganchev, and B. Taskar, Expectation Maximization and Posterior Constraints, NIPS, vol.20, pp.569-576, 2007.

S. Grönroos, S. Virpioja, P. Smit, and M. Kurimo, Morfessor FlatCat: An HMM-Based Method for Unsupervised and Semi-Supervised Learning of Morphology, Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pp.1177-1185, 2014.

J. Gu, H. Hassan, J. Devlin, O. K. Victor, and . Li, Universal Neural Machine Translation for Extremely Low Resource Languages, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol.1, pp.344-354, 2018.

M. Guthrie, The Classification of the Bantu Languages, 1948.

M. Guthrie, Comparative Bantu: An Introduction to the Comparative Linguistics and Prehistory of the Bantu Languages, 1967.

N. Habash and F. Sadat, Arabic Preprocessing Schemes for Statistical Machine Translation, Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers, pp.49-52, 2006.

G. Haig, N. Nau, S. Schnell, and C. Wegener, Documenting Endangered Languages: Achievements and Perspectives, 2011.

K. Hale, On endangered languages and the safeguarding of diversity, Language, vol.68, issue.1, pp.1-3, 1992.

K. Hale, M. Krauss, L. J. Watahomigie, A. Y. Yamamoto, C. Craig et al., Endangered Languages. Language, vol.68, pp.1-42, 1992.

F. Hamlaoui and E. Makasso, Focus marking and the unavailability of inversion structures in the Bantu language Bàsàá, Lingua, vol.154, pp.35-64, 2015.

H. Hammarström and L. Borin, Unsupervised Learning of Morphology, Computational Linguistics, vol.37, issue.2, pp.309-350, 2011.

F. R. Hanke and S. Bird, Large-scale text collection for unwritten languages, Proceedings of the 6th International Joint Conference on Natural Language Processing, pp.1134-1138, 2013.

S. Zellig and . Harris, From Phoneme to Morpheme, Language, vol.31, issue.2, pp.190-222, 1955.

K. and D. Harrison, When Languages Die: The Extinction of the World's Languages and the Erosion of Human Knowledge, Oxford Studies in Sociolinguistics Series, 2007.

M. Haspelmath, Should linguistic diversity be conserved like biodiversity?, 2012.

M. D. Hauser, N. Chomsky, and W. Fitch, The Faculty of Language: What Is It, Who Has It, and How Did It Evolve?, Science, vol.298, issue.5598, pp.1569-1579, 2002.

R. K. Herbert, Language Universals, Markedness Theory, and Natural Phonetic Processes, 1986.

J. Heymann, O. Walter, R. Haeb-umbach, and B. Raj, Iterative Bayesian word segmentation for unsupervised vocabulary discovery from phoneme lattices, Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference On, pp.4057-4061, 2014.

P. Nikolaus and . Himmelmann, Language documentation: What is it and what is it good for?, Jost Gippert, Nikolaus P. Himmelmann, and Ulrike Mosel

G. Mouton-de, , 2006.

L. Hinton, Approaches to and Strategies for Language Revitalization, The Oxford Handbook of Endangered Languages, 2018.

S. Hochreiter and J. Schmidhuber, Long Short-Term Memory, Neural Computation, vol.9, issue.8, pp.1735-1780, 1997.

Y. Hu, I. Matveeva, J. Goldsmith, and C. Sprague, Refining the SED Heuristic for Morpheme Discovery: Another Look at Swahili, Proceedings of the Workshop on Psychocomputational Models of Human Language Acquisition, pp.28-35, 2005.

L. M. Hyman, Grammatical Tone: Sorting out the Differences, Tonal Aspects of Languages, pp.6-11, 2016.

A. Jansen, E. Dupoux, S. Goldwater, M. Johnson, S. Khudanpur et al., A summary of the 2012 JHU CLSP workshop on zero resource speech technologies and models of early language acquisition, 2013.

M. Johnson, Unsupervised Word Segmentation for Sesotho Using Adaptor Grammars, Proceedings of the Tenth Meeting of ACL Special Interest Group on Computational Morphology and Phonology, pp.20-27, 2008.

M. Johnson, Using Adaptor Grammars to Identify Synergies in the Unsupervised Acquisition of Linguistic Structure, Proceedings of ACL-08: HLT, pp.398-406, 2008.

M. Johnson and K. Demuth, Unsupervised phonemic Chinese word segmentation using Adaptor Grammars, Proceedings of the 23rd International Conference on Computational Linguistics, pp.528-536, 2010.

M. Johnson and S. Goldwater, Improving Nonparameteric Bayesian Inference: Experiments on Unsupervised Word Segmentation with Adaptor Grammars, Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp.317-325, 2009.

M. Johnson, T. Griffiths, and S. Goldwater, Bayesian Inference for PCFGs via Markov Chain Monte Carlo. In Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics, Proceedings of the Main Conference, pp.139-146, 2007.

M. Johnson, T. L. Griffiths, and S. Goldwater, Adaptor Grammars: A Framework for Specifying Compositional Nonparametric Bayesian Models, Advances in Neural Information Processing Systems, vol.19, pp.641-648, 2007.

M. Johnson, A. Christophe, E. Dupoux, and K. Demuth, Modelling Function Words Improves Unsupervised Word Segmentation, Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp.282-292, 2014.

H. Kamper, Unsupervised Neural and Bayesian Models for Zero-Resource Speech Processing, 2016.

H. Kamper and M. Roth, Visually Grounded Cross-Lingual Keyword Spotting in Speech, The 6th Intl. Workshop on Spoken Language Technologies for Under-Resourced Languages, pp.248-252, 2018.

H. Kamper, A. Jansen, and S. Goldwater, Fully Unsupervised SmallVocabulary Speech Recognition Using a Segmental Bayesian Model, Sixteenth Annual Conference of the International Speech Communication Association, 2015.

J. Kanwal, K. Smith, J. Culbertson, and S. Kirby, Zipf's Law of Abbreviation and the Principle of Least Effort: Language users optimise a miniature lexicon for efficient communication, Cognition, vol.165, pp.45-52, 2017.

T. Kempton and R. K. Moore, Discovering the phoneme inventory of an unwritten language: A machine-assisted approach, Speech Communication, vol.56, pp.152-166, 2014.

Y. Kim, C. Denton, L. Hoang, and A. Rush, Structured Attention Networks, 5th International Conference on Learning Representations, p.21, 2017.

T. Kocmi and O. Bojar, Trivial Transfer Learning for Low-Resource Neural Machine Translation, Proceedings of the Third Conference on Machine Translation: Research Papers, pp.244-252, 2018.

P. Koehn, Statistical Machine Translation, 2010.
URL : https://hal.archives-ouvertes.fr/hal-01433972

P. Koehn, Neural Machine Translation, 2017.

P. Koehn and K. Knight, Empirical methods for compound splitting, EACL '03: Proceedings of the Tenth Conference on European Chapter of the Association for Computational Linguistics, pp.187-193, 2003.

P. Koehn and R. Knowles, Six Challenges for Neural Machine Translation, Proceedings of the First Workshop on Neural Machine Translation, pp.28-39, 2017.

P. Koehn, F. J. Och, and D. Marcu, Statistical phrase-based translation, Proceedings of the 2003 Conference of the North American Chapter, vol.1, pp.48-54, 2003.

P. Koehn, H. Hoang, A. Birch, C. Callison-burch, M. Federico et al., Open source toolkit for statistical machine translation, Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pp.177-180, 2007.

O. Kohonen, S. Virpioja, and K. Lagus, Semi-Supervised Learning of Concatenative Morphology, Proceedings of the 11th Meeting of the ACL Special Interest Group on Computational Morphology and Phonology, pp.78-86, 2010.

G. Kouarata, Variations de Formes Dans La Langue Mbochi (Bantu C25), 2014.

M. Krauss, The world's languages in crisis, Language, vol.68, issue.1, pp.4-10, 1992.

S. Kuang, J. Li, A. Branco, W. Luo, and D. Xiong, Attention Focusing for Neural Machine Translation by Bridging Source and Target Embeddings, 2017.

T. Kudo, Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates, 2018.

M. Kurimo, S. Virpioja, V. Turunen, and K. Lagus, Morpho Challenge Competition 2005-2010: Evaluations and Results, Proceedings of the 11th Meeting of the ACL Special Interest Group on Computational Morphology and Phonology, pp.87-95, 2010.

J. Lafferty, A. Mccallum, and F. Pereira, Conditional random fields: Probabilistic models for segmenting and labeling sequence data, 2001.

G. Lample, A. Conneau, L. Denoyer, and M. Ranzato, Unsupervised Machine Translation Using Monolingual Corpora Only, ICLR 2018, p.14, 2018.

A. Lardilleux, F. Yvon, and Y. Lepage, Hierarchical sub-sentential alignment with anymalign, Proceedings of the 16th Annual Conference of the European Association for Machine Translation (EAMT 2012), pp.279-286, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00747385

T. Lavergne, O. Cappé, and F. Yvon, Practical Very Large Scale CRFs, Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp.504-513, 2010.

H. Le, A. Allauzen, and F. Yvon, Continuous Space Translation Models with Neural Networks, Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp.39-48, 2012.

C. Lee, T. J. O'donnell, and J. Glass, Unsupervised lexicon discovery from acoustic input, Transactions of the Association for Computational Linguistics, vol.3, pp.389-403, 2015.

M. Lekakou, V. Baldissera, and A. Anastasopoulos, Documentation and analysis of an endangered language: Aspects of the grammar of Griko, 2013.

M. Lewis, Ethnologue: Languages of the World. SIL international, 2009.

M. Lewis, Ethnologue: Languages of the World. SIL international, 2015.

M. Lewis, Ethnologue: Languages of the World. SIL international, 2018.

P. Lewis and G. Simons, Assessing Endangerment: Expanding Fishman's GIDS. Revue roumaine de linguistique, vol.55, pp.102-120, 2010.

M. L. Lewis and M. C. Frank, The length of words reflects their conceptual complexity, Cognition, vol.153, pp.182-195, 2016.

P. Liang, B. Taskar, and D. Klein, Alignment by Agreement, Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, pp.104-111, 2006.

J. Lin, X. Sun, X. Ren, M. Li, and Q. Su, Learning When to Concentrate or Divert Attention: Self-Adaptive Attention Temperature for Neural Machine Translation, 2018.

L. Liu, M. Utiyama, A. Finch, and E. Sumita, Neural Machine Translation with Supervised Attention, 2016.

A. Lopez, Statistical machine translation, ACM Computing Surveys, vol.40, issue.3, pp.1-49, 2008.

K. Löser and A. Allauzen, Une méthode non-supervisée pour la segmentation morphologique et l'apprentissage de morphotactique à l'aide de processus de PitmanYor, 2016.

B. Ludusan, G. Synnaeve, and E. Dupoux, Prosodic boundary information helps unsupervised word segmentation, Annual Conference of the North American Chapter of the ACL, pp.953-963, 2015.

T. Luong, H. Pham, and C. D. Manning, Effective Approaches to Attention-based Neural Machine Translation, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp.1412-1421, 2015.

D. Marcu and D. Wong, A Phrase-Based,Joint Probability Model for Statistical Machine Translation, Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing, pp.133-139, 2002.

I. and D. Melamed, Manual Annotation of Translational Equivalence: The Blinker Project. arXiv preprint cmp-lg/9805005, 1998.

H. Mi, Z. Wang, and A. Ittycheriah, Supervised Attentions for Neural Machine Translation, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp.2283-2288, 2016.

A. Michaud, O. Adams, T. Cohn, G. Neubig, and S. Guillaume, Integrating automatic transcription into the language documentation workflow: Experiments with Na data and the Persephone toolkit, Language Documentation & Conservation, vol.12, pp.393-429, 2018.
URL : https://hal.archives-ouvertes.fr/halshs-01841979

T. Mikolov, M. Karafiát, L. Burget, J. Cernocký, and S. Khudanpur, Recurrent neural network based language model, INTERSPEECH, pp.1045-1048, 2010.

D. Mochihashi, T. Yamada, and N. Ueda, Bayesian unsupervised word segmentation with nested Pitman-Yor language modeling, Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, vol.1, pp.100-108, 2009.

R. C. Moore, Improving IBM word-alignment model 1, Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, p.518, 2004.

M. Müller, S. Stüker, and A. Waibel, Language Adaptive DNNs for Improved Low Resource Speech Recognition, pp.3878-3882, 2016.

M. Müller, J. Franke, S. Stüker, and A. Waibel, Towards Phoneme Inventory Discovery for Documentation of Unwritten Languages, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017.

J. Naradowsky and K. Toutanova, Unsupervised Bilingual Morpheme Segmentation and Alignment with Context-rich Hidden Semi-Markov Models, Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp.895-904, 2011.

D. Nettle and S. Romaine, Vanishing Voices: The Extinction of the World's Languages, 2000.

G. Neubig and . Simple, Correct Parallelization for Blocked Gibbs Sampling, 2014.

G. Neubig and J. Hu, Rapid Adaptation of Neural Machine Translation to New Languages, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp.875-880, 2018.

G. Neubig, M. Mimura, S. Mori, and T. Kawahara, Learning a language model from continuous speech, INTERSPEECH, pp.1053-1056

. Citeseer,

G. Neubig, T. Watanabe, E. Sumita, S. Mori, and T. Kawahara, An Unsupervised Model for Joint Phrase Alignment and Extraction, Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol.1, pp.632-641, 2011.

G. Neubig, T. Watanabe, S. Mori, and T. Kawahara, Machine Translation without Words through Substring Alignment, Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, vol.1, pp.165-174, 2012.

G. Neubig, M. Sperber, X. Wang, M. Felix, A. Matthews et al., XNMT: The eXtensible Neural Machine Translation Toolkit, Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (AMTA), 2018.

T. Nguyen, S. Vogel, and N. A. Smith, Nonparametric Word Segmentation for Machine Translation, Proceedings of the 23rd International Conference on Computational Linguistics, COLING '10, pp.815-823, 2010.

S. Nießen and H. Ney, Toward hierarchical models for statistical machine translation of inflected languages, Proceedings of the Workshop on Data-Driven Methods in Machine Translation, vol.14, pp.1-8, 2001.

J. Franz, H. Och, and . Ney, Improved statistical alignment models, Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, pp.440-447, 2000.

J. Franz, H. Och, and . Ney, A Systematic Comparison of Various Statistical Alignment Models, Computational Linguistics, vol.29, issue.1, pp.19-51, 2003.

D. Odden, Oxford Handbooks Online, 2015.

T. J. O'donnell, J. B. Tenenbaum, and N. D. Goodman, Fragment Grammars: Exploring Computation and Reuse in Language, 2009.

K. Oflazer and I. , Exploring Different Representational Units in English-to-Turkish Statistical Machine Translation, Proceedings of the Second Workshop on Statistical Machine Translation, pp.25-32, 2007.

L. Ondel, P. Godard, L. Besacier, E. Larsen, M. Hasegawa-johnson et al., Bayesian Models for Unit Discovery on a Very Low Resource Language, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018.
URL : https://hal.archives-ouvertes.fr/hal-01709589

G. Pallotti, A Simple View of Linguistic Complexity, Second Language Research, vol.31, issue.1, pp.117-134, 2015.

N. Palosaari and L. Campbell, Structural aspects of language endangerment, The Cambridge Handbook of Endangered Languages, Cambridge Handbooks in Language and Linguistics, pp.100-119, 2011.

K. Papineni, S. Roukos, T. Ward, and W. Zhu, BLEU: A Method for Automatic Evaluation of Machine Translation, Proceedings of 40th Annual Meeting of the Association for Computational Linguistics, pp.311-318, 2002.

A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang et al., Automatic differentiation in PyTorch, 2017.

J. Peter, A. Nix, and H. Ney, Generating Alignments Using Target Foresight in Attention-Based Neural Machine Translation, The Prague Bulletin of Mathematical Linguistics, vol.108, issue.1, pp.27-36, 2017.

T. Steven, H. Piantadosi, E. Tily, and . Gibson, Word lengths are optimized for efficient communication, Proceedings of the National Academy of Sciences, vol.108, issue.9, pp.3526-3529, 2011.

M. Post, G. Kumar, A. Lopez, D. Karakos, C. Callison-burch et al., Improved Speech-to-Text Translation with the Fisher and Callhome Spanish-English Speech Translation Corpus, Proceedings of the International Workshop on Spoken Language Translation (IWSLT), 2013.

N. Pourdamghani, M. Ghazvininejad, and K. Knight, Using Word Vectors to Improve Word Alignments for Low Resource Machine Translation, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol.2, pp.524-528, 2018.

A. Rialland and . Martial-embanga-aborobongui, How intonations interact with tones in Embosi (Bantu C25), a two-tone language without downdrift, Intonation in African Tone Languages, vol.24, 2016.

A. Rialland, G. Martial-embanga, M. Aborobongui, L. Adda-decker, and . Lamel, Dropping of the class-prefix consonant, vowel elision and automatic phonological mining in Embosi, Proceedings of the 44th ACAL Meeting, pp.221-230, 2015.
URL : https://hal.archives-ouvertes.fr/halshs-01251202

A. Rialland, M. Adda-decker, G. Kouarata, G. Adda, L. Besacier et al., Parallel Corpora in Mboshi, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01710043

S. Rijhwani, J. Xie, G. Neubig, and J. Carbonell, Zero-shot Neural Transfer for Cross-lingual Entity Linking, Thirty-Third AAAI Conference on Artificial Intelligence (AAAI), 2019.

, Jorna Rissanen. Stochastic Complexity in Statistic Inquiry. Series in Computer Science, vol.15, 1989.

S. Romaine, Language Endangerment and Language Death, The Routledge Handbook of Ecolinguistics, chapter, 2017.

J. R. Saffran, R. N. Aslin, and E. L. Newport, Statistical Learning by 8-Month-Old Infants, Science, vol.274, issue.5294, pp.1926-1928, 1996.

J. Sakel and D. L. Everett, Linguistic Fieldwork: A Student Guide. Cambridge Textbooks in Linguistics, 2012.

H. Baskaran-sankaran, Y. Mi, A. Al-onaizan, and . Ittycheriah, Temporal Attention Model for Neural Machine Translation, 2016.

O. Scharenborg, L. Besacier, A. W. Black, M. Hasegawa-johnson, F. Metze et al., Linguistic Unit Discovery from Multi-Modal Inputs in Unwritten Languages: Summary of the "Speaking Rosetta" JSALT 2017 Workshop, IEEE International Conference on Acoustics, Speech and Signal Processing, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01709578

O. Scharenborg, P. Ebel, M. Hasegawa-johnson, and N. Dehak, Building an ASR System for Mboshi Using A Cross-Language Definition of Acoustic Units Approach, The 6th Intl. Workshop on Spoken Language Technologies for UnderResourced Languages, pp.167-171, 2018.

H. Schmid, Probabilistic part-of-speech tagging using decision trees, Proceedings of the International Conference on New Methods in Language Processing, vol.12, pp.44-49, 1994.

M. Schuster and K. K. Paliwal, Bidirectional Recurrent Neural Networks. Trans. Sig. Proc, vol.45, issue.11, pp.2673-2681, 1997.

R. Sennrich, B. Haddow, and A. Birch, Neural Machine Translation of Rare Words with Subword Units, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, vol.1, pp.1715-1725, 2016.

C. E. Shannon, A Mathematical Theory of Communication, The Bell System Technical Journal, vol.27, issue.3, pp.379-423, 1948.

K. Sirts and S. Goldwater, Minimally-Supervised Morphological Segmentation Using Adaptor Grammars, Transactions of the Association for Computational Linguistics, vol.1, pp.255-266, 2013.

P. Smit, S. Virpioja, S. Grönroos, and M. Kurimo, Morfessor 2.0: Toolkit for Statistical Morphological Segmentation, The 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL), Gothenburg, 2014.

B. Snyder and R. Barzilay, Cross-lingual Propagation for Morphological Analysis, Proceedings of the 23rd National Conference on Artificial Intelligence, vol.2, pp.848-854, 2008.

B. Snyder and R. Barzilay, Unsupervised Multilingual Learning for Morphological Segmentation, Proceedings of ACL-08: HLT, pp.737-745, 2008.

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, Dropout: A Simple Way to Prevent Neural Networks from Overfitting, The Journal of Machine Learning Research, vol.15, issue.1, pp.1929-1958, 2014.

F. Stahlberg, T. Schlippe, S. Vogel, and T. Schultz, Word segmentation through cross-lingual word-to-phoneme alignment, Spoken Language Technology Workshop (SLT), pp.85-90, 2012.

F. Stahlberg, T. Schlippe, S. Vogel, and T. Schultz, Word segmentation and pronunciation extraction from phoneme sequences through cross-lingual wordto-phoneme alignment, Computer Speech & Language, 2014.

S. Stüker, L. Besacier, and A. Waibel, Human Translations Guided Language Discovery for ASR Systems, 10th International Conference on Speech Science and Speech Technology, pp.1-4, 2009.

S. Stüker, G. Adda, M. Adda-decker, O. Ambouroue, L. Besacier et al.,

A. Rialland, M. Van-de-velde, F. Yvon, and S. Zerbian, Innovative Technologies for Under-Resourced Language Documentation: The Bulb Project, Proceedings of CCURL (Collaboration and Computing for Under-Resourced Languages : Toward an Alliance for Digital Language Diversity), 2016.
URL : https://hal.archives-ouvertes.fr/hal-01350124

I. Sutskever, O. Vinyals, and Q. Le, Sequence to sequence learning with neural networks, Advances in Neural Information Processing Systems, pp.3104-3112, 2014.

G. Synnaeve, I. Dautriche, B. Börschinger, M. Johnson, and E. Dupoux, Dublin City University and Association for Computational Linguistics, Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pp.2326-2334, 2014.

Y. Teh, A hierarchical Bayesian language model based on Pitman-Yor processes, Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, pp.985-992, 2006.

J. Tiedemann, Bitext Alignment. Synthesis Lectures on Human Language Technologies, 2011.

K. Toutanova and M. Galley, Why Initialization Matters for IBM Model 1: Multiple Optima and Non-Strict Convexity, Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp.461-466, 2011.

T. Tsunoda, Language Endangerment and Language Revitalization: An Introduction, 2006.

Z. Tu, Z. Lu, Y. Liu, X. Liu, and H. Li, Modeling Coverage for Neural Machine Translation, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, vol.1, pp.76-85, 2016.

N. Ueffing and H. Ney, Using POS Information for SMT into Morphologically Rich Languages, 10th Conference of the European Chapter of the Association for Computational Linguistics, 2003.

C. Vania and A. Lopez, From Characters to Words to in Between: Do We Capture Morphology?, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, vol.1, pp.2016-2027, 2017.

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones et al., Attention Is All You Need, 2017.

M. Versteegh, R. Thiolliere, T. Schatz, X. N. Cao, X. Anguera et al., The zero resource speech challenge, Proc. of Interspeech, 2015.

S. Virpioja, J. J. Väyrynen, M. Creutz, and M. Sadeniemi, Morphology-aware statistical machine translation based on morphs induced in an unsupervised manner, Proceedings of the Machine Translation Summit XI, pp.46-47, 2007.

S. Virpioja, J. Väyrynen, A. Mansikkaniemi, and M. Kurimo, Applying Morphological Decompositions to Statistical Machine Translation, Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR, pp.195-200, 2010.

S. Vogel, PESA: Phrase Pair Extraction as Sentence Splitting, 2005.

S. Vogel, H. Ney, and C. Tillmann, HMM-based word alignment in statistical translation, Proceedings of the 16th Conference on Computational Linguistics, pp.836-841, 1996.

R. J. Williams and D. Zipser, A Learning Algorithm for Continually Running Fully Recurrent Neural Networks, Neural Computation, vol.1, issue.2, pp.270-280, 1989.

A. C. Woodbury, Defining documentary linguistics, vol.1, pp.35-51, 2003.

A. C. Woodbury, Language Documentation, The Cambridge Handbook of Endangered Languages, Cambridge Handbooks in Language and Linguistics, pp.159-186, 2011.

D. Wu, Stochastic inversion transduction grammars and bilingual parsing of parallel corpora, Computational linguistics, vol.23, issue.3, pp.377-403, 1997.

F. Xia and W. Lewis, Multilingual Structural Projection across Interlinear Text, The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference, pp.452-459, 2007.

J. Xie, Z. Yang, G. Neubig, N. A. Smith, and J. Carbonell, Neural Cross-Lingual Named Entity Recognition with Minimal Resources, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp.369-379, 2018.

J. Xu, R. Zens, and H. Ney, Partitioning Parallel Documents Using Binary Segmentation, Proceedings on the Workshop on Statistical Machine Translation, pp.78-85, 2006.

J. Xu, J. Gao, K. Toutanova, and H. Ney, Bayesian SemiSupervised Chinese Word Segmentation for Statistical Machine Translation, Proceedings of the 22nd International Conference on Computational Linguistics, pp.1017-1024, 2008.

Z. Yang, Z. Hu, Y. Deng, C. Dyer, and A. Smola, Neural Machine Translation with Recurrent Attention Modeling, 2016.

R. Yeniterzi and K. Oflazer, Syntax-to-Morphology Mapping in Factored Phrase-Based Statistical Machine Translation from English to Turkish, Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL'10, pp.454-464, 2010.

A. Marcely-zanon-boito, A. Berard, L. Villavicencio, and . Besacier, Unwritten Languages Demand Attention Too! Word Discovery with Encoder-Decoder Models, Automatic Speech Recognition and Understanding, 2017.

K. Zhai, J. Boyd-graber, and S. B. Cohen, Online adaptor grammars with hybrid inference, Transactions of the Association for Computational Linguistics, vol.2, pp.465-476, 2014.

Y. Zhang and S. Vogel, Competitive Grouping in Integrated Phrase Segmentation and Alignment Model, Proceedings of the ACL Workshop on Building and Using Parallel Texts, pp.159-162, 2005.

Y. Zhang and S. Vogel, An efficient phrase-to-phrase alignment model for arbitrarily long phrase and large corpora, Proceedings of the 10th Conference of the European Association for Machine Translation (EAMT-05), pp.30-31, 2005.

Y. Zhang, S. Vogel, and A. Waibel, Integrated phrase segmentation and alignment algorithm for statistical machine translation, International Conference on Natural Language Processing and Knowledge Engineering, pp.567-573, 2003.

B. Zoph, D. Yuret, J. May, and K. Knight, Transfer Learning for Low-Resource Neural Machine Translation, Proceedings of the 2016 Conference on