*
Bookmark and Share

Metrics for Machine Translation Evaluation (MetricsMaTr)

NIST coordinates MetricsMaTr, a series of research challenge events for machine translation (MT) metrology, promoting the development of innovative, even revolutionary, MT metrics. MetricsMaTr focuses entirely on MT metrics.

NIST provides the evaluation infrastructure, the source files being MT system output. The participants develop MT metrics to assess the quality of the source files. The metrics are run on the test set at NIST in two tracks, one using a single reference, one using multiple references.

The goal is to create intuitively interpretable automatic metrics which correlate highly with human assessment of MT quality. Different types of human assessment are used.

There are several drawbacks to the current methods employed for the evaluation of MT technology:

  • Automatic metrics have not yet been proved able to predict the usefulness and reliability of MT technologies with respect to real applications with confidence.
  • Automatic metrics have not demonstrated that they are meaningful in target languages other than English.
  • Human assessments are expensive, slow, subjective, and difficult to standardize.

These problems, and the need to overcome them through the development of improved automatic (or even semi-automatic) metrics, have been a constant point of discussion at past NIST MT evaluation events.

MetricsMaTr aims to provide a platform to address these shortcomings. Specifically, the goals of MetricsMaTr are:

  • To inform other MT technology evaluation campaigns and conferences with regard to improved metrology.
  • To establish an infrastructure that encourages the development of innovative metrics.
  • To build a diverse community which will bring new perspectives to MT metrology research.
  • To provide a forum for MT metrology discussion and for establishing future directions of MT metrology.

The MetricsMaTr challenge is designed appeal to a wide and varied audience including researchers of MT technology and metrology, acquisition programs such as MFLTS, and commercial vendors. We welcome submissions from a wide range of disciplines including computer science, statistics, mathematics, linguistics, and psychology. NIST encourages submissions from participants not currently active in the field of MT.

The last NIST MetricsMaTr challenge was MetricsMaTr10. MetricsMaTr is held at regular intervals. Links to specific evaluation cycles are at the bottom of this page.

Summary of Results

The MetricsMaTr evaluation tests automatic metric scores for correlation with human assessments of machine translation quality for a variety of languages, data genres, and human assessments. This leads to a large amount of results. Archives of each year's complete release of results, including descriptions of the different components, are available for download:


Below, we provide a very high-level summary of these extensive results.

The table presents Spearman's rho correlations of automatic metric scores with human assessments on target language English data (stemming from NIST OpenMT, DARPA GALE, DARPA TRANSTAC test sets), limited to:

  • The highest-correlating new metric for each evaluation cycle
  • The highest-correlating baseline metric (out a suite of metrics available to NIST prior to MetricsMaTr08)
  • Correlation with human assessments of semantic adequacy on a 7-point scale

Highest correlation of automatic metrics with human assessments of semantic adequacy
Evaluation 1 reference translation 4 reference translations
Segment level Document level System level Segment level Document level System level
MetricsMaTr10

SVM_rank
rho=0.69

METEOR-next-rank
rho=0.84
METEOR-next-rank
rho=0.92
SVM_rank
rho=0.74
i_letter_BLEU
rho=0.85
SEPIA
rho=0.93
MetricsMaTr08 TERp
rho=0.68
METEOR-v0.7
rho=0.84
CDer
rho=0.9
SVM_RANK
rho=0.72
CDer
rho=0.85
ATEC3
rho=0.93
Baseline METEOR-v0.6
rho=0.68
NIST
rho=0.81
TER-v0.7.25
rho=0.89
METEOR-v0.6
rho=0.72
NIST
rho=0.84
NIST
rho=0.93

Contact

mt_poc@nist.gov

You can subscribe to the MetricsMaTr mailing list hosted by NIST, metricsmt_list@nist.gov, by sending e-mail to listproc@nist.gov. Put "subscribe metricsmt_list" in the body.

[ 2008 ] [ 2010 ]