Overview of Dolomites

  • 519 expert-authored, long-form task descriptions (see examples above)
  • Spanning 25 fields
  • Independent judgements of task validity and societal implications of using LMs as writing assistants for these tasks
  • 1,857 examples that instantiate the tasks in our collection with plausible inputs and outputs
  • Tasks are challenging and require domain expertise: browse and see for yourself!

Dolomites is a long-form benchmark for evaluating language models on realistic domain-specific writing tasks.

Experts in various fields routinely perform methodical writing tasks to plan, organize, and report their work. From a clinician writing a differential diagnosis for a patient, to a teacher writing a lesson plan for students, these tasks are pervasive, requiring to methodically generate structured long-form output for a given input. We develop a typology of methodical tasks structured in the form of a task objective, procedure, input, and output, and introduce DoLoMiTes, a novel benchmark with specifications for 519~such tasks elicited from hundreds of experts from across 25~fields. Our benchmark further contains specific instantiations of methodical tasks with concrete input and output examples (1,857 in total) which we obtain by collecting expert revisions of up to 10 model-generated examples of each task. We use these examples to evaluate contemporary language models highlighting that automating methodical tasks is a challenging long-form generation problem, as it requires performing complex inferences, while drawing upon the given context as well as domain knowledge.

Subscribe to our list

Dolomites is an ongoing effort and we are actively expanding the set of tasks. For announcements, subscribe to our Google group »

Dolomites: Domain-Specific Long-Form Methodical Tasks

If you find your work useful, please cite us:

@inproceedings{malaviya2024dolomites,
  author = {Malaviya, Chaitanya and Agrawal, Priyanka and Ganchev, Kuzman and Srinivasan, Pranesh and Huot, Fantine and Berant, Jonathan and Yatskar, Mark and Das, Dipanjan and Lapata, Mirella and Alberti, Chris},
  title = {Dolomites: Domain-Specific Long-Form Methodical Tasks},
  journal = {Transactions of the Association for Computational Linguistics (TACL)},
  month = {September},
  year = {2024},
  url = "https://arxiv.org/abs/2405.05938"
}
          

Authors

Dolomites is a joint effort between researchers at the University of Pennsylvania & Google DeepMind, along with a group of expert annotators who helped create the data.