![]() Such a problem occurs with "animal names, colors, and times of day," and "was also an issue with adjectives, but we observed few such errors with verbs. ![]() While it is easy to collect billions of example sentences for English, and over a hundred million example sentences for Icelandic, for example, the language kalaallisut, spoken by about 56,000 people in Greenland, has fewer than a million extant example sentences readily available in online texts and the Kelantan-Pattani Malay language, spoken by about five million people in Malaysia and Thailand, has fewer than 10,000 example sentences readily available.Īnd then there are what the authors term "characteristic error modes" in translations, such as "translating nouns that occur in distributionally similar contexts in the training data," such as substituting "relatively common nouns like 'tiger' with another kind of animal word, they note, "showing that the model learned the distributional context in which this noun occurs, but was unable to acquire the exact mappings from one language to another with enough detail within this category." The paper describes a project to create a data set of over a thousand languages, including so-called low-resource languages, those that have very few documents to use as samples for training machine learning.Īlso: DeepMind: Why is AI so good at language? It's something in language itself "Despite tremendous progress in low-resource machine translation, the number of languages for which widely-available, general-domain MT systems have been built has been limited to around 100, which is a small fraction of the over 7000+ languages that are spoken in the world today," write lead author Ankur Bapna and colleagues.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |