Saturday, August 05, 2006

Developing and using a translation quality index

Our new article "Developing and using a translation quality index" has just been published in the July/August 2006 issue of MultiLingual.
The article is not freely available online (though you can get to an excerpt following the "current issue" link on MultiLingual's home page), however, if you are interested in reading it, please contact me.

13 comments:

Anonymous said...

Hi Ricardo,

Can I read it. I am doing my research and trying to look for methods to assess translation quality.

Regards,
Sugeng
sghariyanto@yahoo.com

AELM said...

Hi Mr. Schiaffino:

This is an excellent blog! I got here after a google search for my masters thesis in translation quality assessment. I tried to access the Multilingua site but is down. Where else can I read an abstrac of your article about the quality index?

Best regards,
Anne

Riccardo said...

Hi Anne:

If you send me your e-mail address and mention this post I could send you a copy of the article.

To write me, use the following e-mail address:

Riccardo[underscore]Schiaffino[at]comcast.net

replacing [underscore] with the underscore sign (_), and [at] with the '@' symbol. Sorry for making things difficult, but I'm trying not to get too much spam.

rusloc said...

I was a bit disappointed to find this blog so very... dead. After I read the article in the Multilingual I hoped to find people in a situation similar to mine.
Being subjected to real-life use of LQI, and getting NO PASS reviews because of the method's defficiencies is the main reason I'd be happy to discuss it with the authors and other users.

In my opinion, the biggest problem with the index is its human interface. The task of sorting out errors and assigning a category to each error before counting them and generating the score is the weakest link in this concept. It is very subjective, and we know that 5 different reviewers will categorize at least half of the errors (not counting the obvious ones like typos) differently.

The article says that reviewers are professional translators who are specially trained for the job. That's fine and dandy, but I am not convinced. Who trained them? Who tested them? Who evaluated them BEFORE they were selected to be reviewers? What are the metrics for final grading of the graders?

In short, I think this whole concept is faulty because it leads to inaccurate conclusions. In my book, 50% accuracy in determining such an important component of localization products as translation quality is not enough. And from my experience that was approximately the percentage of "good" reviewer comments. The rest were either preferential, plain erroneous, miscategorized, counted several times with a different category, etc. As a result, we had to spend untold amounts of time replying to those comments.

As professionals you know how humiliating it could be... I felt hurt and burnt.
Using your own analogy with temperature measurements, the LQI in its current state is like a sensor with +/- 50 degrees tolerance. I would not use it to cook my rare-to-medium rare steak.
:)

Riccardo said...

Hi Vadim:

>>I was a bit disappointed to find this blog so very... dead.

My fault: I've devoted much more time to my other blog, and left this one languishing


>>The task of sorting out errors and assigning a category to each error before counting them and generating the score is the weakest link in this concept.

More than the weakest, I would say the most difficult

>>It is very subjective, and we know that 5 different reviewers will categorize at least half of the errors (not counting the obvious ones like typos) differently.

It is subjective when evaluators are not selected and trained properly. The purpose of the selection is to find people that are able to work with the system: people that are able to mark errors as objectively as possible (e.g, people that would not mark something as an error just because "they would never translate this like that"). For this reason, people that may be excellent translators or editors are not necessarily suited to do QA reviews.

>>Who trained them? Who tested them? Who evaluated them BEFORE they were selected to be reviewers?

This goes more in the specific details of the QA program, as implemented in Franco's company. I'm not sure whether he could disclose such details, but if so, the best person to explain this part of the system is him.

>>In my book, 50% accuracy in determining such an important component of localization products as translation quality is not enough.

Sorry, but I don't believe that our system is only 50% accurate (I agree that if it were so, it would not be useful at all)

>>And from my experience that was approximately the percentage of "good" reviewer comments.

Have you evaluated through our system (Franco's company), or was it in some other "similar" system? I'm sorry, but without more details, I could not comment on this.

>>Since I don't know The rest were either preferential, plain erroneous, miscategorized, counted several times with a different category, etc. As a result, we had to spend untold amounts of time replying to those comments.

In our system "preferential changes" are NOT counted as errors, and they do not contribute to the final score - they were in fact added to the system so as to provide a mechanisms for reviewers to indicate that they would translate something differently, without having to indicate the difference as an error.

rusloc said...

Thank you for your answers Ricardo.
"Have you evaluated through our system (Franco's company), or was it in some other "similar" system? I'm sorry, but without more details, I could not comment on this."
Yes, we are working with his organization, and due to the NDA we cannot discuss much. It's a pity of course.

When I said "Preferential" I meant they were preferential by their nature, but reviewers put other categories to them. Most ridiculous and subjective.

It was the quality of the reviewers work that made the LQI system look bad and caused our mistrust. I am expressing not only my own opinion, but the general conclusion after two dozen people discussed it in our forum. Sorry, the discussion is not available to general public for the same reason - NDA.

I liked your point regarding a good translator not necessarily being a good reviewer. I would agree with you here.

Riccardo said...

Hi Vadim:

I'm going to forward your messages to Franco: if there have been problems with the system, he will like to know, and he is the best person to take any necessary corrective action.

Franco Zearo said...

I want to thank Rusloc for the very insightful comments. Here are my answers:
>>In my opinion, the biggest problem with the index is its human interface.

I agree: A system that relies on human assessments can be greatly affected by the subjectivity of the assessments. Because an automated tool to assess translations in a reliable way does not exist, we have designed a system that uses human evaluators. We try to mitigate the subjectivity of human evaluations in 3 ways: (1) standard qualifications for reviewers (2) training (3) guidelines to help the reviewers with the classification of errors and distinguishing between errors and non-errors.

>>The task of sorting out errors and assigning a category to each error before counting them and generating the score is the weakest link in this concept.

I need to clarify: The index is not influenced by the error categories. There are 2 factors that influence the index: (1) the number of errors detected and (2) the severity of each error. The concept is based on the counting and classification of the severity of the errors. In our studies, this concept is better suited to obtaining measurements.

>>the LQI in its current state is like a sensor with +/- 50 degrees tolerance.
I believe that quality is 50% perception, and 50% tangible factors. So, I respect the poster's perception about the state of the LQI. As for the tangibles, we recently did a correlation study: Only 2.5% of the inspections had a subjecive assessment from the reviewer that disagreed with the LQX score. This tells me that the system is not flawed and that, like any process, it can be improved.

As for the training, grading, and monitoring of the reviewers, I concur that we can do more in this area in order to increase the objectivity of the evaluations. Not all lawyers can be judges, and not all translators are make good reviewers. The statistics make it possible to spot trends and to eliminate from the system those reviewers that consistently provide evaluations that are too biased.

rusloc said...

Thank you gentlemen. While I was reading the last post, I was thinking that the system would work just fine for an organization that REALLY cares for top quality. Not just statistics (to show a customer) but the real quality of the final product of which one can be proud...

"That company will have to make heavy investment in that "weakest / most difficult" link - human reviewer", thought I.
And then I saw exactly the same idea in Franco's comment.

I am glad that we are finally on the same page. I guess, I could use the concept in creating a translator grading utility. As a young company we need one. I hope I won't violate any copyrights by using the LQI concept.
:)

Kenny said...

Hi Mr. Schiaffino,



I am a master student from Taiwan, studying in the Graduate Institute of Translation and Interpretation of National Taiwan Normal University. I am currently doing a research and try to build a standard translation process. One of the step in the process in about translation quality assessment. I found the article title "Developing and using a translation quality index" through google, and I tried to link to translationquality.com but failed. I also linked to multilingual and failed to find this article, either.



The translation quality assessment here in Taiwan mostly rely on experienced or talented translators, but there aren’t enough of them. As a result, the quality of translation varies hugely because of different translation companies, translators, and reviewers. I am looking forward to learning from experts like you and build a model that may fit in with the need of the industry and the academia in Taiwan.



I appreciate your help no matter what. If you can give me some advices, that will be even better. Thanks in advance.



With Best regards,

Kenny

Riccardo said...

Hi Kenny,

If you would like me to send you a copy of the article, please e-mail me at riccardo {underscore} schiaffino {at} comcast {dot} net.
replace the words within brackets with the appropriate characters: sorry about not posting my e-mail address directly, but I'm getting way too much spam already.

Anonymous said...

Dear Anne,
i need a method for my master thesis in TQA. If it is possible for you please guide me in this issue. thank you in advance.

Anonymous said...

Who knows where to download XRumer 5.0 Palladium?
Help, please. All recommend this program to effectively advertise on the Internet, this is the best program!