The data is split into training, development, and two test sets. The first test set is public and available with the data, the second will not be released. The ranking in the leaderboards below is based on results on the unreleased test set.
Instructions for running on the unreleased test setTo avoid overfitting and degrading the leaderboard held-out test set, we require two months or more between runs on the leaderboard test set. We will do our best to run within two weeks (usually we will run much faster). We will only post results on the leaderboard when an online description of the system is available. Testing on the leaderboard test set is meant to be the final step before publication. Under extreme circumstances, we reserve the right to limit running on the leaderboard test set to systems that are mature for publication. Your model should generate a prediction file in