Forecasting Asset Returns with Structured Text Factors and Dynamic Time Windows

Abstract
This study addresses the limitations in asset allocation modeling related to insufficient use of unstructured information and the absence of dynamic modeling mechanisms. It proposes a regression algorithm for asset returns that integrates macro text factors with time window modeling. The method takes macroeconomic texts as input and extracts high-dimensional semantic vectors using a pre-trained semantic encoder. These vectors capture key signals such as policy direction, market expectations, and structural changes in the economy. During factor representation, a residual fusion mechanism is introduced to enhance the nonlinear expressiveness of semantic features. A unified time window structure is then constructed to model the dynamic influence of text signals on asset returns over continuous periods. To improve prediction accuracy and responsiveness, a nonlinear regression layer is applied after sequence aggregation, enabling continuous forecasting of asset returns. This paper designs multiple experiments on a public macro text dataset and various types of asset return data. Systematic sensitivity analyses are conducted across several dimensions, including encoding size, asset class, text noise intensity, and corpus source heterogeneity. The experimental results show that the proposed model outperforms several mainstream baseline methods in regression accuracy, stability, and cross-market adaptability. The results confirm the effectiveness of combining macro text factors with temporal structures for asset return prediction and demonstrate strong structure-aware modeling performance.