We are thrilled to introduce CLADDER and CausalCOT, a revolutionary approach to causal reasoning in language models! CLADDER, a dataset with more than 10,000 causal questions covering diverse queries across the three rungs of the Ladder of Causation, is designed to test formal causal reasoning in LLMs through symbolic questions and ground truth answers. Additionally, the researchers also generated ground-truth explanations with sequential reasoning and verbalized the questions and answers into stories. To simplify causal reasoning problems, the researchers also designed CausalCOT, a chain-of-thought prompting strategy built with the GPT-4 model.
Evaluation results suggest that GPT-4 achieved an accuracy of 64.28%, with CausalCOT outperforming the latter with 66.64% accuracy. This impressive result indicates that the prompting strategy improves reasoning abilities across all levels, with significant improvement on anti-commonsensical and nonsensical data.
This research paper is a crucial step toward addressing the limitations of previous works and enhancing the causal reasoning capabilities of LLMs. We invite you to check out the paper and code and to join our 35k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, LinkedIn Group, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. Don’t forget to join us for an exciting journey in the world of AI!