Adaptive Privacy-Aware Federated Language Modeling for Collaborative Electronic Medical Record Analysis
Abstract
This study addresses the challenges of non-shareable data, significant semantic variation across institutions, and strict privacy constraints in collaborative electronic medical record settings. It proposes a federated language modeling framework for electronic medical records and introduces an adaptive privacy budget scheduling algorithm to improve model stability and applicability in real medical environments. The method builds a local medical text encoding module at each institution to convert raw records into continuous semantic representations and uses semantic factorization to separate latent representations into generalizable causal semantic factors and sensitive factors that require protection. This enables explicit distinction between key semantic structures and private information. During federated training, the framework constructs a unified semantic space through cross-institution semantic alignment and adjusts noise injection dynamically through the adaptive privacy budget mechanism to balance privacy protection and semantic usability. To evaluate its effectiveness, the study includes multiple comparative experiments and sensitivity analyses, examining performance, budget scheduling strategies, and variations in training conditions. The results show that the framework maintains strong semantic representation under strict privacy constraints and outperforms several baseline models across multiple metrics, demonstrating the necessity and effectiveness of building semantically decomposable and privacy-adaptive federated language models for cross-institution electronic medical record tasks. Overall, the proposed method provides a feasible solution for high-quality medical text modeling under privacy-restricted conditions and shows strong potential for multi-center medical data collaboration.