Abstract:
Person re-identification plays a central role in tracking and monitoring crowd movement in public places, and hence it serves as an important means for providing public security in video surveillance application sites. The problem of person re-identification has received significant attention in the past few years, and with the introduction of deep learning, several interesting approaches have been developed. In this paper, we propose an ensemble model called Temporal Motion Aware Network (T-MAN) for handling the visual context and spatio-temporal information jointly from the input video sequences. Our methodology makes use of the long-range motion context with recurrent information for establishing correspondences among multiple cameras. The proposed T-MAN approach first extracts explicit frame-level feature descriptors from a given video sequence by using three different sub-networks (FPAN, MPN, and LSTM), and then aggregates these models using an ensemble technique to perform re-identification. The method has been evaluated on three publicly available data sets, namely, the PRID-2011, iLIDS-VID, and MARS, and re-identification accuracy of 83.0%, 73.5%, and 83.3% have been obtained from these three data sets, respectively. Experimental results emphasize the effectiveness of our approach and its superiority over the state-of-the-art techniques for video-based person re-identification. © 2020, Springer Science+Business Media, LLC, part of Springer Nature.