Causal AI is the insertion of causal reasoning into machine learning. Causal graphs, known as directed acyclic graphs (DAGs), help to differentiate causes and correlations and are essential for the causal inference toolbox in causal AI. They can establish causal relationships and account for situations that machine learning cannot, such as spurious correlations, confounders, mediators, and colliders.
Machine learning aims to classify or predict accurately from given training data but does not ensure that the features used are causally linked to the target. Thus, it can disregard situations where variables are linked through direct or indirect cause-effect relationships. Causal AI, however, models such accurate causal relationships.
Causal graphs are graphs with nodes and edges where the edges link nodes that have a causal relationship. To form a causal graph, expert domain knowledge or causal discovery algorithms can be used. For example, expert domain knowledge determines that demand affects how much we spend on marketing and how many new customers sign up.
Machine learning algorithms and noise terms are used to estimate each non-root node in a Structural Causal Model (SCM). SCMs can generate new samples of data and answer causal questions with the help of counterfactuals and interventions. Counterfactuals are historically observed data used to calculate what would have happened to a variable if we had altered another, while interventions predict future outcomes based on changes.
For example, in a customer service setting, time-series data can be collected daily. The data science team uses expert domain knowledge to construct a causal graph using python. By training the SCM, the team can predict, for instance, how reducing call waiting time might affect customer churn rate. A data-generating process is created, using the knowledge of the causal graph. Then ridge regression is employed as a baseline comparison, and its outcomes are compared with those from the SCM and actual data.
This approach demonstrates that the use of ridge regression may underestimate the impact on churn, while the causal graph provides results closer to the ground truth. These methodologies however have their own challenges. It’s important to consider what assumptions are being violated, effects of non-linear relationships, and how to deal with high-dimensional datasets. Future blogs will cover these and other related topics in greater detail.