The constant progression of natural language processing (NLP) has brought about an era of advanced, large language models (LLMs) that can accomplish complex tasks with a considerably high level of accuracy. However, these models are costly in terms of computational requirements and memory, limiting their application in environments with finite resources. Model quantization is a…
