DeepRAG: A Step-by-Step Approach to Retrieval-Augmented Reasoning for Large Language Models
A New Framework
Large language models (LLMs) have shown potential in reasoning and natural language understanding. However, they still suffer from factual hallucinations due to limitations in the timeliness, accuracy, and coverage of their parametric knowledge. Retrieval-augmented generation (RAG) has emerged as an approach to address this issue by integrating relevant information from external knowledge sources. However, incorporating reasoning with RAG remains challenging due to ineffective task decomposition and redundant retrieval, which can introduce noise and degrade response quality.
Researchers have proposed DeepRAG, a new framework that models retrieval-augmented reasoning as a Markov Decision Process (MDP). This approach enables strategic and adaptive retrieval, allowing the model to dynamically determine whether to retrieve external knowledge or rely on its parametric knowledge at each reasoning step.
The idea behind DeepRAG is to decompose complex queries into a series of subqueries and then decide whether to retrieve external knowledge for each subquery. This decision is made by considering the current question, available information, and the model's confidence in its own knowledge. By carefully selecting which subqueries require external knowledge retrieval, DeepRAG improves retrieval efficiency and reduces the risk of introducing irrelevant or noisy information.
The DeepRAG framework consists of three main steps: binary tree search, imitation learning, and chain of calibration. Binary tree search constructs a binary tree for each subquery, exploring paths based on either parametric knowledge or external knowledge retrieval. This allows the model to evaluate the impact of different retrieval choices on the final answer.
Imitation learning is used to train the model to generate effective retrieval narratives by imitating the reasoning process that leads to the correct final answer with minimal retrieval cost. This process involves generating subqueries, making atomic decisions about retrieval, and providing intermediate answers.
Finally, chain of calibration refines the model's understanding of its own knowledge boundaries, enabling it to make more accurate atomic decisions about the necessity of retrieval. This is achieved by dynamically optimizing atomic decisions for each subquery and calibrating the model's internal knowledge.
Experimental results on five open-domain QA datasets demonstrate that DeepRAG significantly outperforms existing methods, achieving 21.99% higher answer accuracy while improving retrieval efficiency. This improvement can be attributed to the structured retrieval narrative and reliable, on-demand atomic decisions enabled by the DeepRAG framework.
Further research directions for DeepRAG include:
Exploring different MDP formulations and reward functions to further optimize the retrieval-augmented reasoning process.
Investigating the application of DeepRAG to other NLP tasks beyond question answering.
Developing methods to improve the interpretability and explainability of DeepRAG's reasoning process.
Paper: DeepRAG: Thinking to Retrieval Step by Step for Large Language Models (PDF)