Researchers from Stony Brook University, the US Naval Academy, and the University of Texas at Austin have developed CAT-BENCH, a benchmark to assess language models' ability to predict the sequence of steps in cooking recipes. The research's main focus was on how language models comprehend plans by examining their understanding of the temporal sequencing of…
