فهرست:
1- مقدمه. 2
1-1- مقدمه. 2
1-2- کاربردها 14
1-3- چالشها و خصوصیات محیط... 6
1-4- تعریف کلی مساله. 11
2- مروری بر پژوهشهای گذشته. 24
2-1- مقدمه. 24
2-2- روشهای تک لایه. 24
2-2-1- معرفی انواع روشهای زمان- مکان.. 15
2-2-2- جمع بندی و مقایسه روشهای زمان-مکان.. 23
2-2-3- روشهای متوالی.. 25
2-2-4- جمع بندی و مقایسه روشهای متوالی.. 26
2-3- روشهای چندلایه (سلسله مراتبی). 26
2-3-1- روشهای آماری.. 27
2-3-2- روشهای نحوی.. 27
2-3-3- مدل توصیفی.. 28
2-3-4- جمع بندی و مقایسه روشهای سلسله مراتبی.. 28
3- مطالعه ابزارهای مورد استفاده 31
3-1- مقدمه. 31
3-2- ابزارهای مورد استفاده در استخراج ویژگی.. 31
3-2-1- هیستوگرام گرادیان جهت دار. 31
3-2-2- شار نوری.. 32
3-3- ابزارهای مورد استفاده در یادگیری ویژگیهای سطح بالاتر. 44
3-3-1- الگوی کلی در یادگیری ویژگی بدون ناظر. 36
3-3-2- روشهای متداول در یادگیری ویژگی بدون ناظر. 37
3-3-3- تجزیه تجربی مودی.. 61
3-4- ابزارهای مورد استفاده در دسته بندی.. 62
3-4-1- مدل مخفی مارکوف... 62
3-4-2- ماشین بردار پشتیبان: 56
4- روش پیشنهادی.. 61
4-1- مقدمه. 61
4-2- تعریف چارچوب اصلی.. 61
4-3- مراحل انجام کار. 62
4-3-1- بیان ویدیو. 64
4-3-2- استخراج ویژگی.. 76
4-3-3- کوانتیزه کردن کلمات و ساخت دیکشنری.. 68
4-3-4- ادغام. 88
4-3-5- دسته بندی.. 89
4-4- چارچوبهای پیشنهادی.. 92
4-4-1- چارچوب اول: 92
4-4-2- چارچوب دوم: 92
4-4-3- چارچوب سوم: 83
4-4-4- چارچوب چهارم: 84
4-4-5- چارچوب پنجم: 86
5- نتایج. 95
5-1- پایگاه دادههای موجود. 95
5-2- تنظیم پارامترهای مساله. 102
5-3- نتایج.. 104
6- بحث.. 120
6-1- نوآوریها و مزایا و معایب آنها 120
6-2- مقایسه چارچوبهای پیشنهادی.. 113
6-3- کارهای پیشنهادی جهت آینده. 114
6-4- جمع بندی.. 115
7- فهرست منابع. 116
منبع:
1.J. K. Aggarwal, and M. S. Ryoo, “Human Activity Analysis: A Review”, ACM Computing Surveys Journal (CSUR), Vol. 43, No. 3, pp. 1-47, 2011
2.R. Poppe, “A survey on vision-based human action recognition”, Image and Vision Computing, Vol. 28, pp. 976–990, 2010.
3.M. Blank, L. Gorelick, E. Shechtman, M. Irani, and R. Basri, “Actions as space-time shapes”, IEEE Trans. On Pattern Analysis and Machine Intelligence, Vol. 29, No. 12, pp. 2247–2253, 2007.
4.A. Bobick, and J. Davis “The recognition of human movement using temporal templates”, IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 23, No. 3, pp. 257-267, 2001.
5.E. Shechtman, and M. Irani, “Space-time behavior based correlation”, CVPR, 2005.
6.Y. Ke, R. Sukthankar, and M. Hebert, “Spatio-temporal shape and flow correlation for action recognition”, CVPR, 2007.
7.M.D. Rodriguez, J. Ahmed, and M. Shah, “Action MACH: a spatiotemporal maximum average correlation height filter for action recognition”, CVPR, 2008.
8.Z. Li, Y. Fu, T. Huang, and S. Yan, “Real-time human action recognition by luminance field trajectory analysis”, ACM International Conference on Multimedia, 2008.
9.Y. Sheikh, M. Sheikh, and M. Shah, “Exploring the space of a human action”, ICCV, 2005.
10.Yilmaz, and M. Shah, “Recognizing human actions in videos acquired by uncalibrated moving cameras”, ICCV, 2005.
11.G. Johansson, “Visual perception of biological motion and a model for its analysis”, Perception & Psychophysics, Vol. 14, pp. 201-211, 1973.
12.I. Laptev, T. Lindeberg, “On Space-Time Interest Points”, International Journal of Computer Vision, Vol. 64, pp. 107-123, 2005.
13.P. Dollár, V. Rabaud , G. Cottrell, S. Belongie, “Behavior Recognition via Sparse Spatio-Temporal Features”, IEEE International Workshop on Performance Evaluation of Tracking and Surveillance (PETS), 2005.
14.A. Oikonomopoulos, I. Patras, and M. Pantic, “Spatiotemporal salient points for visual recognition of human actions”, IEEE Trans. On Systems Man and Cybernetics (SMC) – Part B: Cybernetics, Vol. 36, No. 3, pp. 710–719, 2006.
15.S.F Wong, and R. Cipolla, “Extracting spatiotemporal interest points using global information”, ICCV, 2007.
16.T.K Kim, S.F Wong, and R. Cipolla, “Tensor canonical correlation analysis for action classification”, CVPR, 2007.
17.G. Willems, T. Tuytelaars, and L. VanGool, “An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector”, ECCV, 2008.
18.I. Laptev and P. Perez, “Retrieving actions in movies”, ICCV, 2007.
19.W.L Lu, James J. Little, “Simultaneous tracking and action recognition using the PCA–HOG descriptor”, Canadian Conference on Computer and Robot Vision, 2006.
20.P. Scovanner, S. Ali, and M. Shah, “A 3-dimensional SIFT descriptor and its application to action recognition”, International Conference on Multimedia, 2007.
21.J. Yamato, J. Ohya, and K. Ishii, “Recognizing human action in time-sequential images using hidden Markov model”, CVPR, 1992.
22.A.Veeraraghavan, R. Chellappa, and A. Roy-Chowdhury, “The function space of an activity”, CVPR, 2006.
23.R. Lublinerman, N. Ozay, D. Zarpalas, and O. Camps, “Activity recognition from silhouettes using linear systems and model (in) validation techniques”, ICPR, 2006.
24.F. Lv, and R. Nevatia, “Recognition and segmentation of 3-D human action using HMM and multi-class adaBoost”, ECCV, 2006.
25.B. Chakraborty, O. Rudovic, J. Gonzalez, “View-invariant human-body detection with extension to human action recognition using component-wise HMM of body parts”, International Conference on Automatic Face and Gesture Recognition, 2008.
26.N.M. Oliver, B. Rosario, and A.P. Pentland, “A Bayesian computer vision system for modeling human interactions”. IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 22, No. 8, pp. 831-843, 2006.
27.S. Park, J.K. and Aggarwal, “A hierarchical Bayesian network for event recognition of human actions and interactions”. Multimedia Systems, Vol. 10, No. 2, pp.164-179, 2004.
28.E. Yu, and J.K. Aggarwal, “Detection of fence climbing from monocular video”, ICPR, 2006.
29.Y. Shi, Y. Huang, D. Minnen, A.F. Bobick, and I.A. Essa, “Propagation networks for recognition of partially ordered sequential action”, CVPR, 2006.
30.Y.A. Ivanov, and A.F. Bobick, “Recognition of visual activities and interactions by stochastic parsing”. IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 22, No. 8, pp. 852-872, 2000.
31.D. Moore and I. Essa, “recognizing multi tasked activities using stochastic context-free grammar using video”, AAAI, 2002.
32.M.S. Ryoo, and J.K. Aggarwal, “Recognition of composite human activities through context-free grammar based representation”, CVPR, 2006.
33.A. Gupta, P. Srinivasan, J. Shi, and L.S. Davis, “Understanding videos, constructing plots learning a visually grounded storyline model from annotated video”, CVPR, 2009.
Th. Brox, A. Bruhn, N. Papenberg, and J. Weickert, “High accuracy optical flow estimation based on a theory for warping”, ECCV, 2004.
A. Coates, “Demystifying Unsupervised Feature Learning”, PhD thesis. Stanford University, 2012.
F. Bach, “Consistency of the group Lasso and multiple kernel learning”, Journal of Machine Learning Research, Vol. 9, pp.1179–1225, 2008.
G. Csurka, C. Dance, L. Fan, J. Willamowski, and C. Bray, “Visual categorization with bags of keypoints”, Workshop on statistical learning in computer vision, ECCV, 2004.
R. Tibshirani, “Regression shrinkage and selection via the lasso”, Journal of the Royal Statistical Society. Series B (Methodological), pp.267–288, 1996.
A. Coates, and A. Y. Ng, “The Importance of Encoding Versus Training with Sparse Coding and Vector Quantization”, ICML, 2011.
40.J. DiCarlo, D. Zoccolan, and N.C. Rust, “How Does the Brain Solve Visual Object Recognition?”, Neuron perspective, Vol. 73, 2012.
F. Murray, and K Kreutz-Delgado, “Visual Recognition and Inference Using Dynamic Overcomplete Sparse Learning”, Neural Computation, MIT Press, Vol. 19, pp. 2301–2352, 2007.
S. Thorpe, D. Fize, and C. Marlot, “Speed of processing in the human visual system”, Nature, Vol. 381. No. 6582, pp. 520–522, 1996.
K. Yu, T. Zhang, and Y. Gong, “Nonlinear learning using local coordinate coding”, Advances in Neural Information Processing Systems, Vol. 22, pp. 2223–2231, 2009.
B. Xie, M. Song, D. Tao, “Large-scale dictionary learning for local coordinate coding”, BMVC, 2010.
K. Sj¨ostrand, “Matlab implementation of LASSO, LARS, the elastic net and SPCA” 2005.
J.Wang, J. Yang, K. Yu, F. Lv, T.S. Huang, and Y. Gong, “Locality-constrained Linear Coding for image classification”, CVPR, 2010.
J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman, “Supervised Dictionary Learning”, NIPS, 2008.
J. Mairal, G. Sapiro, and M. Elad, “Learning Multiscale Sparse Representations for Image and Video Restoration”, Multiscale Modeling and Simulation, Vol. 7, No. 1, pp. 214-241, 2008.
G. shenghua, L.T. Chia, and I.W.H. Tsang, “Multi-layer group sparse coding for concurrent image classification and annotation”, CVPR, 2011.
H. Lee, C. Ekanadham, and A.Y. Ng, “Sparse deep belief net model for visual area V2”, NIPS, 2007.
J. Chua, I. Givoni, R. Prescott Adams, and B. J. Frey, “Learning structural element patch models with hierarchical palettes”, CVPR, 2012.
J. Mairal, R. Jenatton, G. Obozinski, and F. Bach, “Learning Hierarchical and Topographic Dictionaries with Structured Sparsity”, CoRR abs/ 1110.4481 , 2011
R. Jenatton, J.Y. Audibert, and F. Bach, “Structured Variable Selection with Sparsity-Inducing Norms”, Journal of Machine Learning Research, Vol. 12, pp. 2777-2824, 2011.
S. Bengio, F. Pereira, Y. Singer, and D. Strelow, “Group Sparse Coding”, NIPS, 2009.
P. Garrigues, and B. A. Olshausen, “Group Sparse Coding with a Laplacian Scale Mixture Prior”, NIPS, 2010.
N. E. Huang, Z. Shen, S. Long, M. Wu, H. Shih, Q. Zheng, N. Yen, C. Tung, and H. Liu, “The empirical mode decomposition and Hilbert spectrum for nonlinear and nonstationary time series analysis”, Proc.R. Soc. London, Vol. 454, pp. 903–995, 1998.
L.R. Rabiner, “A tutorial on Hidden Markov Models and selected applications in speech recognition”, IEEE Proceedings, Vol. 77, No. 2, 1989.
Ch.J.C. Burges, “A Tutorial on Support Vector Machines for Pattern Recognition”, Journal of Data Mining and Knowledge Discovery, Vol. 2, No. 2, pp. 121-167, 1998.
Introduction to Pattern Analysis, Ricardo Gutierrez-Osuna, Texas A&M University, support vector machines, lecture 21, 22.
B. Yao, X. Jiang, A. Khosla, A. L. Lin, L. J. Guibas, and L. Fei-Fei, “Human Action Recognition by Learning Bases of Action Attributes and Parts”, ICCV, 2011.
D. Weinland, and E. Boyer, “Action recognition using exemplar-based Embedding”, CVPR, 2008.
W.Yang, Y. Wang and G. Mori, “Recognizing human actions from still images with latent poses”, CVPR, 2010.
B. Yao and F.-F. Li, “Action Recognition with Exemplar Based 2.5D Graph Matching”, ECCV, 2012.
P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan, “Object detection with discriminatively trained part-based models”, IEEE Trans. on PAMI, Vol. 32, No. 9, pp. 1627-1645, 2010.
A. Yao, J. Gall, and L. Van Gool, “Coupled action recognition and pose estimation from multiple views”, IJCV, Vol. 100, No. 1, pp. 16–37, 2012.
S.M. Yoon, and A. Kuijper, “Human action recognition based on skeleton splitting”, Expert Systems with Applications, Vol. 40, No. 1, pp. 6848-6855, 2013.
L. Shao, and R. Gao, “A Wavelet Based Local Descriptor for Human Action Recognition”, BMVA, 2010.
M. Jain, H. J egou, and P. Bouthemy, “Better exploiting motion for better action recognition”, CVPR, 2013.
H. Wang, A. Kl¨aser, C. Schmid, C.L. Liu, “Action Recognition by Dense Trajectories”, CVPR, 2011.
S. Dickinson, A. Leonardis, B. Schiele, and M. Tarr, “The Evolution of Object Categorization and the Challenge of Image Abstraction Object Categorization”, Computer and Human Vision Perspectives, Cambridge University Press, pp. 1–37, 2009.
D.M. Blei, A. Y. Ng, M. I. Jordan, “Latent Dirichlet Allocation”, Journal of Machine Learning Research, Vol. 3, pp. 993-1022, 2003.
H. Kwong, R. Grosse, and A. Y. Ng, “shift-invariant sparse coding for audio classification”, conference on uncertainty in artificial intelligence, 2007.
J. Yang, K. Yu, Y. Gong, and T. Huang, “Linear spatial pyramid matching using sparse coding for image classification”, CVPR, 2009.
M. Elad, M. Aharon, “Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries”, IEEE Trans. on Image Processing, Vol. 15, No. 12, pp. 3736-3745, 2006.
E. Barshan, “Probabilistic Generative Models for Visual Recognition”, PhD thesis proposal presented to the University of Waterloo, 2013.
S. Sadanand, and J. J. Corso, “Action Bank: A High-Level Representation of Activity in Video”, CVPR, 2012.
M. Raptis, I. Kokkinos, and S. Soatto, “Discovering discriminative action parts from mid-level video representations”, CVPR, 2012.
L. Wang, Y. Qiao, and X. Tang, “Motionlets: mid-level 3D parts for human motion recognition”, CVPR, 2013.
Y. Su, M. Allan, and F. Jurie, “Improving object classification using semantic attributes”, BMVC, 2010.
80.Y. Wang, P. Sabzmeydani, and G. Mori, “Semi-latent dirichlet allocation: A hierarchical model for human action recognition”, Workshop on human motion understanding, modeling, capture and animation, 2007.
M. Ranzato, J. Susskind, V. Mnih, and G. Hinton. “On deep generative models with applications to recognition”, CVPR, 2011.
H. Lee, R. Grosse, R. Ranganath, and A. Y. Ng, “Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations”, International Conference on Machine Learning, 2009.
I. Akhter, Y. Sheikh, S. Khan, T. Kanade, “Trajectory Space: A Dual Representation for Nonrigid Structure from Motion”, IEEE Trans. on PAMI, Vol. 33, No. 7, pp. 1442 – 1456, 2010.
J. Mairal, F. Bach, J. Ponce, G. Sapiro, “Online Learning for Matrix Factorization and Sparse Coding”, Journal of Machine Learning Research, Vol. 11, pp. 19–60, 2010.
J. C. Nunes, S. Guyot, and E. Delechelle, “Texture analysis based on local analysis of the bi-dimensional empirical mode decomposition”, Machine Vision Application, Vol. 16, pp. 177–188, 2005.
C. Damerval, S. Meignen, and V. Perrier, “A fast algorithm for bidimensional EMD”, IEEE Signal Processing Letters, Vol. 12, No. 10, pp. 701–704, 2005.
Y. Xu, B. Liu, and S. Riemenschneider, “Two-dimensional empirical mode decomposition by finite elements”, Proc. R. Soc. London, Ser. A, Vol. 462, pp. 3081–3096, 2006.
G. Xu, X. Wang, and X. Xu, “Improved bi-dimensional EMD and Hilbert spectrum for the analysis of textures”, Pattern Recognition. Vol. 42, pp. 718–734, 2009.
N. E. Huang and Z. Wu, “A review on Hilbert-Huang transform: method and its applications to geophysical studies”, Rev. Geophys. 46, RG2006, 2008.
R. Jenatton, “Structured Sparsity-Inducing Norms: Statistical and Algorithmic Properties with Applications to Neuroimaging”, Ph.D thesis. Ecole Normale Suprieure de Cachan, 2011.
Y.L. Boureau, N. Le Roux, F. Bach, J. Ponce, and Y. LeCun, “Ask the locals: Multi-way local pooling for image recognition”, ICCV, 2011.
S. Danafar, and N. Gheissari, “Action recognition for surveillance applications using optic flow and SVM”, ACCV, 2007.
J. Niebles, H. Wang, H. Wang, and L. Fei-Fei, “Unsupervised learning of human action categories using spatial-temporal words”, British Machine Vision Conference, 2006.
Ch. Thiel, “Multiple Classifier Fusion Incorporating Certainty Factors”, Master's Thesis, Institute of Neural Information Processing, University of Ulm. 2004.
Ch. Schuldt, I. Laptev, and B. Caputo, “Recognizing human actions: a local SVM approach”, ICPR, 2004.
D. Weinland, R. Ronfard, and E. Boyer, “Free viewpoint action recognition using motion history volumes”, Computer Vision and Image Understanding (CVIU) Vol. 104, No. 2–3, pp. 249–257, 2006.
J. Liu, J. Luo, and M. Shah, “Recognizing Realistic Actions from Videos in the Wild, CVPR, 2009.
Ki. K. Reddy, and M. Shah, “Recognizing 50 Human Action Categories of Web Videos”, Machine Vision and Applications Journal, 2012.
I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld, “Learning realistic human actions from movies”, CVPR, 2008.
H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre, “HMDB: A Large Video Database for Human Motion Recognition”, ICCV, 2011.
J. Liu, S. Ali, and M. Shah, “Recognizing human actions using multiple features”, CVPR, 2008.
J. C. Niebles, and L. Fei-Fei, “A hierarchical model of shape and appearance for human action classification”, CVPR, 2007.
Z. Zhang, Y. Hu, S. Chan, and L.T. Chia, “Motion context: A new representation for human action recognition”, ECCV, 2008.
D. Tran, and A. Sorokin, “Human activity recognition with metric learning”, ECCV, 2008.
C. Thurau, and V. Hlavac, “Pose primitive based human action recognition in videos or still images”, CVPR, 2008.
I.N. Junejo, E. Dexter, I. Laptev, and P. Perez, “Cross-view action recognition from temporal self-similarities”, ICCV, 2008.
A. Klaser, M. Marszalek, and C. Schmid, “A spatio-temporal descriptor based on 3d-gradients”, British Machine Vision Conference, 2008.
E. Acar, T. Senst, A. Kuhn, I. Keller, H. Theisel, S. Albayrak and T. Sikora, “Human Action Recognition using Lagrangian Descriptor“, IEEE Workshop on Multimedia Signal Processing (MMSP), 2012.
S. Nowozin, G. Bakır, and K. Tsuda, “Discriminative Subsequence Mining for Action Classification”, ICCV, 2007.
T. Kim, S. Wong, and R. Cipolla, “Tensor canonical correlation analysis for action classification”, CVPR, 2007.
J. Liu, and M. Shah, “Learning human actions via information maximization”, CVPR, 2008.
S. Ali, and M. Shah, “Human Action Recognition in Videos Using Kinematic Features and Multiple Instance Learning”, IEEE Trans. Pattern Anal. Mach. Intell, Vol. 32, No. 2, pp. 288-303, 2010.
A. Kovashka and K. Grauman, “Learning a hierarchy of discriminative space-time neighborhood features for human action recognition”, CVPR, 2010.
Q.V. Le, W. Y. Zou, S.Y. Yeung, and A.Y. Ng, “Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis”, CVPR, 2011.
B. Solmaz, S. M. Assari, and M. Shah. “Classifying Web Videos Using A Global Video Descriptor”, Machine Vision and Applications, 2012.
O. Kliper-Gross, Y. Gurovich, T. Hassner, and L. Wolf, “Motion Interchange Patterns for Action Recognition in Unconstrained Videos”, ECCV, 2012.
F. Shi, E. Petriu, and R. Laganiere, “Sampling strategies for real-time action recognition”, CVPR, 2013.
H. Wang and C. Schmid, “Action Recognition with Improved Trajectories”, ICCV, 2013.