Abstract: With the advancement of human-computer interaction, surface electromyography (sEMG) -based gesture recognition has garnered increasing attention. However, effectively utilizing the spatio-temporal dependencies in sEMG signals and integrating multiple key features remain significant challenges for existing techniques. To address this issue, we propose a model named the Two-Stream Hybrid Spatio-Temporal Fusion Network (TS-HSTFNet). Specifically, we design a dynamic spatio-temporal graph convolution module that employs an adaptive dynamic adjacency matrix to explore the spatial dynamic patterns in the sEMG signals fully. Additionally, a spatio-temporal attention fusion module is designed to fully utilize the potential correlations among multiple features for the final fusion. The results indicate that the proposed TS-HSTFNet model achieves 84.96% and 88.08% accuracy on the Ninapro DB2 and Ninapro DB5 datasets, respectively, demonstrating high precision in gesture recognition. Our work emphasizes the importance of extracting spatio-temporal features in gesture recognition and provides a novel approach for multi-source information fusion.