利用者:AllyseActon3319
出典: くみこみックス
AllyseActon3319 (会話 | 投稿記録)
(新しいページ: 'Machine Translation - The ins and outs, What Users Expect, and What They Get Machine translation (MT) systems are now ubiquitous. This ubiquity is a result of a combination of...')
次の差分→
最新版
Machine Translation - The ins and outs, What Users Expect, and What They Get
Machine translation (MT) systems are now ubiquitous. This ubiquity is a result of a combination of increased requirement for translation in the present global marketplace, and an exponential increase in computing energy has made such systems viable. And beneath the right circumstances, MT systems certainly are a powerful tool. They provide low-quality translations in situations where low-quality translation is preferable to no translation in any respect, or in which a rough translation of a giant document delivered within minutes or minutes is a bit more useful compared to a good translation delivered in three weeks' time.
Unfortunately, regardless of the widespread accessibility of MT, it's clear how the purpose and limitations of which systems are likely to be misunderstood, and their capability widely overestimated. On this page, I wish to give a brief overview of how MT systems work and so how they may be placed to best use. Then, I'll present some data on what Internet-based MT will be used at this time, and reveal that Sprachkurse you will find there's chasm between the intended and actual utilization of such systems, knowning that users still need educating on how to use MT systems effectively.
How machine translation works
You could have expected a computer translation program would use grammatical rules from the languages under consideration, combining all of them with some sort of in-memory "dictionary" to generate the resulting translation. And even, that's essentially how some earlier systems worked. But most modern MT systems actually require a statistical approach that is quite "linguistically blind". Essentially, it is trained with a corpus of example translations. It makes sense a statistical model that incorporates information like:
- "when the words (a, b, c) happen in succession in the sentence, it comes with an X% chance that the words (d, e, f) will happen in succession within the translation" (N.B. there needn't be exactly the same quantity of words in each pair); - "given two successive words (a, b) within the target language, if word (a) leads to -X, there's an X% chance that word (b) can certainly in -Y".
Given a huge body of these observations, it may then translate a sentence by considering various candidate translations-- produced by stringing words together almost arbitrarily (in fact, via some 'naive selection' process)-- and selecting the statistically more than likely option.
On hearing this high-level description of how MT works, everybody is surprised that this type of "linguistically blind" approach works whatsoever. What's a lot more surprising is it typically works more effectively than rule-based systems. This is partly because depending on grammatical analysis itself introduces errors in to the equation (automated analysis isn't completely accurate, and humans don't always agree with how you can analyse a sentence). And training a system on "bare text" allows you to base a system on far more data than would certainly be possible: corpora of grammatically analysed texts are small and few and far between; pages of "bare text" can be found in their trillions.
However, what this approach means would be that the quality of translations is quite dependent upon how well aspects of the source text are represented within the data originally used to train the system. In case you accidentally type he'll returned or vous avez demander (as opposed to he will return or vous avez demande), the system is going to be hampered because sequences including will returned are unlikely to have occurred often times in the training corpus (or worse, could have occurred which has a totally different meaning, such as they needed his will returned towards the solicitor). Because the system has little perception of grammar (to work out, as an example, that returned is really a form of return, and "the infinitive is probable after he will"), it in effect has little to be on.
Similarly, you could ask it to translate a sentence that's perfectly grammatical and customary in everyday use, but which include features which happen to not have been common within the training corpus. MT systems are normally trained about the varieties of text that human translations are readily available, such as technical or business documents, or transcripts of meetings of multilingual parliaments and conferences. This provides MT systems an all-natural bias towards certain kinds of formal or technical text. And in many cases if everyday vocabulary remains covered by the training corpus, the grammar each day speech (including using tu instead of usted in Spanish, or while using the present tense instead of the future tense in several languages) might not exactly.
MT systems in reality
Researches and developers laptop or computer translation systems have invariably been conscious of one of the greatest dangers is public misperception of their purpose and limitations. Somers (2003)[1], observing the usage of MT on the web and in forums, comments that: "This increased visibility of MT has already established numerous side effets. [...] There is certainly a desire to coach most people in regards to the inferior of raw MT, and, importantly, why the quality is indeed low." Observing MT in use during 2009, there's sadly little evidence that users' knowing of these problems has improved.
For instance, I'll present a smaller sample of knowledge coming from a Spanish-English MT service which i offer at the Espanol-Ingles internet site. The service operates by utilizing the user's input, applying some "cleanup" processes (like correcting some common orthographical errors and decoding common installments of "SMS-speak"), after which searching for translations in (a) a bank of examples through the site's Spanish-English dictionary, and (b) a MT engine. Currently, Google Translate is used to the MT engine, although a custom engine can be utilized later on. The figures I present listed here are from an analysis of 549 Spanish-English queries made available to the device from machines in Mexico[2]-- put simply, we think that most users are translating using their native language.
First, what are people with all the MT system for? For each query, I could a "best guess" in the user's purpose for translating the query. Oftentimes, the point is very obvious; in certain cases, there is certainly clearly ambiguity. With this caveat, I judge that in approximately 88% of cases, the intended me is fairly clear-cut, and categorise these uses as follows:
Searching for just one word or term: 38% Translating an elegant text: 23% Internet chat session: 18% Homework: 9% An unexpected (or else alarming!) observation is the fact that in this large proportion of cases, users are employing the translator to look up one particular word or term. In fact, 30% of queries consisted of one particular word. The finding might be a surprising considering the fact that the web page in question boasts a Spanish-English dictionary, and suggests that users confuse the intention of dictionaries and translators. But not represented in the raw figures, there was clearly certain cases of consecutive searches where it appeared that a user was deliberately divorce a sentence or phrase that could have in all probability been better translated if left together. Perhaps on account of student over-drilling on dictionary usage, we percieve, for instance, a question for cuarto para ("quarter to") followed immediately with a query for any number. There's clearly a necessity to teach students and users generally for the distinction between the electronic dictionary along with the machine translator[3]: in particular, which a dictionary will move the user to picking the proper translation in the context, but requires single-word or single-phrase lookups, whereas a translator generally is ideal on whole sentences and given a single word or term, only will report the statistically most common translation.
I estimate that in less than a quarter of cases, users are choosing the MT system for its "trained-for" purpose of translating or gisting a proper text (and so are entering a whole sentence, or at best partial sentence in lieu of an isolated noun phrase). Of course, it's impossible to know whether all of these translations were then created for publication without further proof, which definitely isn't intent behind it.
The use for translating formal texts is now almost rivalled by the use to translate informal on-line chat sessions-- a context in which MT systems are usually not trained. The on-line chat context poses particular difficulties for MT systems, since features for example non-standard spelling, not enough punctuation and presence of colloquialisms not present in other written contexts are normal. For chat sessions to become translated effectively would possibly demand a dedicated system trained on a far better (and possibly custom-built) corpus.