C. Gu, C. Sun, D. A. Ross, C. Vondrick, C. Pantofaru et al., AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01764300

S. S. Rautaray and A. Agrawal, Vision based hand gesture recognition for human computer interaction: a survey, Artificial Intelligence Review, vol.43, issue.1, pp.1-54, 2012.

M. Morel, C. Achard, R. Kulpa, and S. Dubuisson, Automatic evaluation of sports motion: A generic computation of spatial and temporal errors, Image and Vision Computing, vol.64, pp.67-78, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01586401

A. Chan-hon-tong, C. Achard, and L. Lucat, Simultaneous segmentation and classification of human actions in video streams using deeply optimized Hough transform, Pattern Recognition, vol.47, issue.12, pp.3807-3818, 2014.
URL : https://hal.archives-ouvertes.fr/cea-01818435

S. Yeung, O. Russakovsky, G. Mori, and L. Fei-fei, End-to-End Learning of Action Detection from Frame Glimpses in Videos, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

T. Liu, H. Zhang, and F. Qi, A novel video key-frame-extraction algorithm based on perceived motion energy model, IEEE transactions on circuits and systems for video technology, 2003.

S. K. Kuanar, R. Panda, and A. S. Chowdhury, Video key frame extraction through dynamic Delaunay clustering with a structural constraint, Journal of Visual Communication and Image Representation, vol.24, issue.7, pp.1212-1227, 2013.

N. Ejaz, I. Mehmood, and S. Wook-baik, Efficient visual attention based framework for extracting key frames from videos, Signal Processing: Image Communication, vol.28, issue.1, pp.34-44, 2013.

K. Zhang, W. Chao, F. Sha, and K. Grauman, Video Summarization with Long Short-Term Memory, Computer Vision ? ECCV 2016, pp.766-782, 2016.

I. Mademlis, A. Tefas, N. Nikolaidis, and I. Pitas, Movie shot selection preserving narrative properties, 2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP), 2016.

I. Mademlis, A. Tefas, and I. Pitas, A salient dictionary learning framework for activity video summarization via key-frame extraction, Information Sciences, vol.432, pp.319-331, 2018.

Z. Zhao, H. Ma, and S. You, Single Image Action Recognition Using Semantic Body Part Actions, 2017 IEEE International Conference on Computer Vision (ICCV), 2017.

G. Guo and A. Lai, A survey on still image based human action recognition, Pattern Recognition, vol.47, issue.10, pp.3343-3361, 2014.

H. Alwassel, F. Caba-heilbron, and B. Ghanem, Action Search: Spotting Actions in Videos and Its Application to Temporal Action Localization, Computer Vision ? ECCV 2018, pp.253-269, 2018.

S. Bhardwaj, M. Srinivasan, and M. M. Khapra, Efficient Video Classification Using Fewer Frames, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.

Z. Wu, C. Xiong, C. Ma, R. Socher, and L. S. Davis, AdaFrame: Adaptive Frame Selection for Fast Video Recognition, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.

W. Wu, D. He, X. Tan, S. Chen, and S. Wen, Multi-Agent Reinforcement Learning Based Frame Sampling for Effective Untrimmed Video Recognition, 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019.

S. Ma, L. Sigal, and S. Sclaroff, Learning Activity Progression in LSTMs for Activity Detection and Early Detection, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

M. Gao, M. Xu, L. S. Davis, R. Socher, and C. Xiong, StartNet: Online Detection of Action Start in Untrimmed Videos, 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019.

S. Giancola, M. Amine, T. Dghaily, and B. Ghanem, SoccerNet: A Scalable Dataset for Action Spotting in Soccer Videos, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2018.

H. Perreault, G. Bilodeau, N. Saunier, and M. Heritier, SpotNet: Self-Attention Multi-Task Network for Object Detection, 2020 17th Conference on Computer and Robot Vision (CRV), 2020.

A. Bansal, S. Ma, D. Ramanan, and Y. Sheikh, Recycle-GAN: Unsupervised Video Retargeting, Computer Vision ? ECCV 2018, pp.122-138, 2018.

L. Wolf, M. Guttmann, and D. Cohen-or, Non-homogeneous Content-driven Video-retargeting, 2007 IEEE 11th International Conference on Computer Vision, 2007.

M. Everingham, L. Van-gool, C. K. Williams, J. Winn, and A. Zisserman, The Pascal Visual Object Classes (VOC) Challenge, International Journal of Computer Vision, vol.88, issue.2, pp.303-338, 2009.

B. G. Fabian-caba-heilbron, V. Escorcia, and J. C. Niebles, Activitynet: A large-scale video benchmark for human activity understanding, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015.

J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, Preprint repository arXiv achieves milestone million uploads, Physics Today, 2014.

A. D. Laud, Theory and application of reward shaping in reinforcement learning, 2004.

T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, Soft actor-critic: Offpolicy maximum entropy deep reinforcement learning with a stochastic actor, 2018.

R. S. Sutton and A. G. Barto, Introduction to Reinforcement Learning, 1998.

Z. Yuan, J. C. Stroud, T. Lu, and J. Deng, Temporal Action Localization by Structured Maximal Sums, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.

Z. Shou, D. Wang, and S. Chang, Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

Z. Shou, J. Chan, A. Zareian, K. Miyazawa, and S. Chang, CDC: Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.

J. Gao, Z. Yang, C. Sun, K. Chen, and R. Nevatia, TURN TAP: Temporal Unit Regression Network for Temporal Action Proposals, 2017 IEEE International Conference on Computer Vision (ICCV), 2017.

H. Xu, A. Das, and K. Saenko, R-C3D: Region Convolutional 3D Network for Temporal Activity Detection, 2017 IEEE International Conference on Computer Vision (ICCV), 2017.

Y. Zhao, Y. Xiong, L. Wang, Z. Wu, X. Tang et al., Temporal Action Detection with Structured Segment Networks, 2017 IEEE International Conference on Computer Vision (ICCV), 2017.

J. Gao, Z. Yang, and R. Nevatia, Cascaded Boundary Regression for Temporal Action Detection, Procedings of the British Machine Vision Conference 2017, 2017.

T. Lin, X. Zhao, H. Su, C. Wang, and M. Yang, BSN: Boundary Sensitive Network for Temporal Action Proposal Generation, Computer Vision ? ECCV 2018, pp.3-21, 2018.

B. Seybold, D. Ross, J. Deng, R. Sukthankar, S. Vijayanarasimhan et al., Rethinking the faster r-cnn architecture for temporal action localization, 2018.

Y. Huang, Q. Dai, and Y. Lu, Decoupling Localization and Classification in Single Shot Temporal Action Detection, 2019 IEEE International Conference on Multimedia and Expo (ICME), 2019.

Y. Jiang, J. Liu, A. Zamir, G. Toderici, I. Laptev et al., THUMOS challenge: Action recognition with a large number of classes, 2014.

F. C. Heilbron, V. Escorcia, B. Ghanem, and J. C. Niebles, ActivityNet: A large-scale video benchmark for human activity understanding, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.961-970, 2015.

K. Simonyan, A. Zisserman-;-z.-ghahramani, M. Welling, C. Cortes, and N. D. , Two-stream convolutional networks for action recognition in videos, Advances in Neural Information Processing Systems, vol.27

K. Q. Lawrence and . Weinberger, , 2014.

J. Carreira and A. Zisserman, Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.

S. Paul, S. Roy, and A. K. Roy-chowdhury, W-TALC: Weakly-Supervised Temporal Activity Localization and Classification, Computer Vision ? ECCV 2018, pp.588-607, 2018.

S. Paul, S. Roy, and A. K. Roy-chowdhury, W-TALC: Weakly-Supervised Temporal Activity Localization and Classification, Computer Vision ? ECCV 2018, pp.588-607, 2018.

S. Narayan, H. Cholakkal, F. S. Khan, and L. Shao, 3C-Net: Category Count and Center Loss for Weakly-Supervised Action Localization, 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019.

J. He, Y. Song, and H. Jiang, Bi-direction Feature Pyramid Temporal Action Detection Network, Lecture Notes in Computer Science, vol.02, pp.889-901, 2020.

T. Lin, X. Zhao, H. Su, C. Wang, and M. Yang, BSN: Boundary Sensitive Network for Temporal Action Proposal Generation, Computer Vision ? ECCV 2018, pp.3-21, 2018.

T. Lin, X. Liu, X. Li, E. Ding, and S. Wen, BMN: Boundary-Matching Network for Temporal Action Proposal Generation, 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019.

Y. Huang, Q. Dai, and Y. Lu, Decoupling Localization and Classification in Single Shot Temporal Action Detection, 2019 IEEE International Conference on Multimedia and Expo (ICME), 2019.