Retrieval Augmented Generation (RAG) has recently been gaining attention as it provides new possibilities for large language models like OpenAI's GPT-4 to use and leverage their own data. This technique essentially involves adding one's own data (via a retrieval tool) to the prompt that is passed into a language model which then generates an output.…
Causal AI is the insertion of causal reasoning into machine learning. Causal graphs, known as directed acyclic graphs (DAGs), help to differentiate causes and correlations and are essential for the causal inference toolbox in causal AI. They can establish causal relationships and account for situations that machine learning cannot, such as spurious correlations, confounders, mediators,…
Redshift is a data warehouse developed by Amazon that uses its own unique SQL syntax, which often can be challenging for new users used to other SQL formats. One powerful built-in function in Redshift is the PIVOT function. This function allows for the reshaping of data – transforming values in rows into columns, or values…
In the world of data engineering, having clear and consistent database columns is essential. An often overlooked yet highly effective tool for achieving this is the use of classwords. With a career spanning over two decades, I've found classwords to be indispensable in my data management practices. They are a critical communication tool that provides…
Time series forecasting is important in many sectors, including finance, weather, and health, as it enables predictions based on past patterns. While traditional methods like ARIMA and exponential smoothing are popular, they often fall short in complex and large-scale forecasting tasks. Herein lies the role of natural language processing (NLP), and more specifically large language…
Statistical estimates can provide interesting insights about a population sample. The article explains how it's possible to use samples to estimate the entire size of a population. In this case, the example used to illustrate this principle is the estimation of the total number of lottery tickets, thereby calculating the probability of winning.
The author describes…