Skip to content Skip to sidebar Skip to footer

Data Science

An introductory manual for developing a Retrieval Augmented Generation (RAG) application from the ground up | Authored by Bill Chambers.

Retrieval Augmented Generation (RAG) has recently been gaining attention as it provides new possibilities for large language models like OpenAI's GPT-4 to use and leverage their own data. This technique essentially involves adding one's own data (via a retrieval tool) to the prompt that is passed into a language model which then generates an output.…

Read More

“Responding to Causal Inquiries through Causal Diagrams” written by Ryan O’Sullivan, published in January 2024.

Causal AI is the insertion of causal reasoning into machine learning. Causal graphs, known as directed acyclic graphs (DAGs), help to differentiate causes and correlations and are essential for the causal inference toolbox in causal AI. They can establish causal relationships and account for situations that machine learning cannot, such as spurious correlations, confounders, mediators,…

Read More

“Five Essential Redshift SQL Functions to Understand” | Authored by Madison Schott | Mar, 2024

Redshift is a data warehouse developed by Amazon that uses its own unique SQL syntax, which often can be challenging for new users used to other SQL formats. One powerful built-in function in Redshift is the PIVOT function. This function allows for the reshaping of data – transforming values in rows into columns, or values…

Read More

Classwords – My Preferred Method of Naming Database Columns | Authored by Krzysztof K. Zdeb | Mar, 2024

In the world of data engineering, having clear and consistent database columns is essential. An often overlooked yet highly effective tool for achieving this is the use of classwords. With a career spanning over two decades, I've found classwords to be indispensable in my data management practices. They are a critical communication tool that provides…

Read More

Marco Peixeiro’s 2024 Article: Redesigning an LLM for Predictions on Time Series Data

Time series forecasting is important in many sectors, including finance, weather, and health, as it enables predictions based on past patterns. While traditional methods like ARIMA and exponential smoothing are popular, they often fall short in complex and large-scale forecasting tasks. Herein lies the role of natural language processing (NLP), and more specifically large language…

Read More

The Issue with German Tanks: Forecasting Potential Victory Odds… | Authored by Dorian Drost | March, 2024

Statistical estimates can provide interesting insights about a population sample. The article explains how it's possible to use samples to estimate the entire size of a population. In this case, the example used to illustrate this principle is the estimation of the total number of lottery tickets, thereby calculating the probability of winning. The author describes…

Read More