<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Bohr. Iam.</journal-id>
<journal-title>BOHR International Journal of Internet of things, Artificial Intelligence and Machine Learning</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Bohr. Iam.</abbrev-journal-title>
<issn pub-type="epub">2583-5521</issn>
<publisher>
<publisher-name>BOHR</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.54646/bijiam.2024.19</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Yoga Pose Recognition (YPR) using ML-DL and android application</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>Ghosh</surname> <given-names>Partha</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x002A;</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Sardar</surname> <given-names>Sitam</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Mondal</surname> <given-names>Riya</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Jha</surname> <given-names>Ayush</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Sarkar</surname> <given-names>Aniruddha</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Department of Computer Science and Engineering Government College of Engineering and Ceramic Technology</institution>, <country>Kolkata</country></aff>
<author-notes>
<corresp id="c001">&#x002A;Correspondence: Partha Ghosh, <email>parth_ghos@rediffmail.com</email></corresp>
</author-notes>
<pub-date pub-type="epub">
<day>12</day>
<month>12</month>
<year>2024</year>
</pub-date>
<volume>3</volume>
<issue>1</issue>
<fpage>1</fpage>
<lpage>15</lpage>
<history>
<date date-type="received">
<day>14</day>
<month>03</month>
<year>2024</year>
</date>
<date date-type="accepted">
<day>05</day>
<month>09</month>
<year>2024</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x00A9; 2023 Ghosh, Sardar, Mondal, Jha and Sarkar.</copyright-statement>
<copyright-year>2023</copyright-year>
<copyright-holder>Ghosh, Sardar, Mondal, Jha and Sarkar</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/"><p>&#x00A9; The Author(s). 2024 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.</p></license>
</permissions>
<abstract>
<p>The study aimed to create a Human Activity Recognition (HAR) model for Yoga Pose Recognition and Classification using datasets gathered through smart sensor technologies and imaging and filming devices to read various human actions, recognize various poses, analyze them, and then predict and classify the Yoga pose with minimum error. Pre-recorded data was fed to the model for the initial run and thereafter the model would learn and re-learn new inputs and outputs by supervised learning methods. A collection of data from cameras present in smart smartphones and other devices were used to create a dynamic dataset of posture photos and videos to predict the most feasible output and add the mapping in the dataset to recognize particular Yoga poses. Yoga is a methodical way of attaining balance and harmony both inside oneself and outside the body. It has its roots in ancient India. Its history spans millennia, with the word &#x201C;yoga&#x201D; being first used in the Rig Veda, an ancient Indian scripture, which dates back to around 1500 BC. The Atharva Veda, which was written about 1200&#x2013;1000 BC, places a strong emphasis on breath regulation. Indus-Saraswati seals and fossils depicting yoga sadhana practitioners have also been discovered. These artifacts date back to 2700 BC (<xref ref-type="bibr" rid="B10">10</xref>). Nowadays, yoga is performed by millions of people worldwide. It provides mental and physical health advantages, such as lowering stress, anxiety, and depression, as well as physical benefits like better flexibility, strength, and posture. Yoga has grown popular as more individuals try to live healthier lives.</p>
<p>The study investigated various human postures and actions to predict the possible Yoga pose performed by that particular human through ML/DL (Machine Learning and Deep Learning) approaches. The proposed system or model that learned and evolved by obtaining new data and through supervised learning. We have used single-user pose recognition to create personalized datasets. Our aim was to provide a self-instruction system that allows people to learn and practice yoga correctly by themselves. This development laid the foundation for building such a system by discussing various ML and DL approaches to accurately classify Yoga poses on pre-recorded videos and photos.</p>
</abstract>
<kwd-group>
<kwd>Human Activity Recognition (HAR)</kwd>
<kwd>CNN</kwd>
<kwd>Transfer Learning</kwd>
<kwd>Yoga Pose Recognition</kwd>
</kwd-group>
<counts>
<fig-count count="25"/>
<table-count count="8"/>
<equation-count count="1"/>
<ref-count count="10"/>
<page-count count="15"/>
<word-count count="5338"/>
</counts>
</article-meta>
</front>
<body>
<sec id="S1" sec-type="intro">
<title>1. Introduction</title>
<sec id="S1.SS1">
<title>1.1. Human Activity Recognition (HAR)</title>
<p>This wide-ranging topic of study uses machine learning and deep learning to determine a person&#x2019;s precise movement or activity from sensor data. It has greatly influenced many modern research avenues on humans, their surroundings, and the interactions between them. It is being heavily relied on in the modern healthcare sector for health informatics and predictions.</p>
<p>There are three types of HAR:</p>
<list list-type="simple">
<list-item>
<label>(1)</label>
<p>sensor-based single-user activity recognition.</p>
</list-item>
<list-item>
<label>(2)</label>
<p>sensor-based multi-user activity recognition.</p>
</list-item>
<list-item>
<label>(3)</label>
<p>sensor-based group activity recognition.</p>
</list-item>
</list>
</sec>
</sec>
<sec id="S2">
<title>2. Objective:</title>
<p>In our work, we focused on sensor-based single-user pose recognition. As smartphones, handheld devices, and other wearable devices have become more common nowadays, the focus of HAR dataset collection has shifted to the sensors present in these devices.</p>
<p>Pose recognition was accomplished in our project using both probabilistic and logical reasoning. Logic-based methods record all reasonable and coherent explanations for the observed behaviors. Thus, every consistent and conceivable result needs to be taken into account. More recently, activity recognition has used statistical learning models and probability theory to reason about activities and probable consequences under ambiguity.</p>
<sec id="S2.SS1">
<title>2.1. Human pose recognition and estimation</title>
<p>We focused on Human Pose Recognition and Estimation as the only HAR component. One of the more difficult problems in computer vision is human pose estimation. In order to create a skeletal representation, it deals with the localization of human joints in an image or video. It is challenging to automatically identify a person&#x2019;s stance in an image since it depends on a variety of factors, including the image&#x2019;s quality and scale, lighting, background clutter, clothes, surrounds, and how people interact with their environment.</p>
<p>An application of pose estimation that has attracted many researchers in this field is exercise and fitness. One form of exercise with intricate postures is Yoga, which is an age-old exercise that started in India but is now famous worldwide because of its many physical, mental, and spiritual benefits.</p>
</sec>
<sec id="S2.SS2">
<title>2.2. About yoga</title>
<p>But the thing with yoga is that, like any other kind of exercise, it requires proper technique; otherwise, a yoga session may be counterproductive and even harmful. This means that a teacher is required to oversee the session and adjust the student&#x2019;s posture. An artificial intelligence-based program may be used to recognize yoga postures and offer individualized feedback to assist people improve their form, as not all users have access to an instructor.</p>
</sec>
<sec id="S2.SS3">
<title>2.3 Our focus</title>
<p>This work focused on exploring the different approaches for Yoga pose classification and sought to attain insight into the following: What is pose estimation? What is deep learning? How can deep learning be applied to Yoga pose classification in real time? This project used references from conference proceedings, published papers, technical reports, and journals.</p>
</sec>
</sec>
<sec id="S3">
<title>3. Literature review</title>
<p>To convert smartphone readings into different kinds of physical activity, Marcin Straczkiewicz et al. (<xref ref-type="bibr" rid="B1">1</xref>) have presented a number of Human Activity Recognition (HAR) systems. They took out data on the sensors, body position of smartphones, kinds of physical activity that were researched, data processing methods, and classification systems applied to activity detection. Transitioning from data gathering to data analysis is the primary problem in this discipline. The methods utilized for feature extraction, activity categorization, data preprocessing, and data gathering were the main subjects of their investigation. They spoke on methods&#x2019; generalizability and repeatability, or their capacity to apply key components to broad and varied research participant groups. Finally, the obstacles that must be overcome to hasten the broader use of smartphone-based HAR in public health studies.</p>
<p>Users can obtain publicly accessible datasets as detailed in a section by Binh Nguyen et al. (<xref ref-type="bibr" rid="B2">2</xref>). A framework covering the state-of-the-art research and new directions in HAR applications is proposed. This research aimed to examine the current state of the art for HAR power usage and categorization. HAR&#x2019;s power needs are discussed in detail. To the best of the authors&#x2019; knowledge, this is the first review article discussing power use in HAR. <xref ref-type="table" rid="T1">Table 1</xref> shows the Accuracy Comparison of several models using the publicly available dataset.</p>
<table-wrap position="float" id="T1">
<label>TABLE 1</label>
<caption><p>Comparing the accuracy of different models on the public dataset.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left">Deep learning models</td>
<td valign="top" align="center">Accuracy rate (%)</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">LSTM</td>
<td valign="top" align="center">90.47</td>
</tr>
<tr>
<td valign="top" align="left">CNN</td>
<td valign="top" align="center">91.53</td>
</tr>
<tr>
<td valign="top" align="left">S-LSTM</td>
<td valign="top" align="center">95.81</td>
</tr>
<tr>
<td valign="top" align="left">LSTM</td>
<td valign="top" align="center">85.83</td>
</tr>
<tr>
<td valign="top" align="left">BLSTM</td>
<td valign="top" align="center">84.54</td>
</tr>
<tr>
<td valign="top" align="left">CNN</td>
<td valign="top" align="center">85.40</td>
</tr>
<tr>
<td valign="top" align="left">BLSTM</td>
<td valign="top" align="center">95.70</td>
</tr>
<tr>
<td valign="top" align="left">DBLSTM</td>
<td valign="top" align="center">96.75</td>
</tr>
<tr>
<td valign="top" align="left">HDL</td>
<td valign="top" align="center">97.95</td>
</tr>
</tbody>
</table></table-wrap>
<p>According to Shavit et al. (<xref ref-type="bibr" rid="B3">3</xref>), large short-term memory architectures or convolutional neural networks are used in the learning-based techniques currently used for activity recognition from inertial data. For sequence analysis tasks, transformers have recently been demonstrated to perform better than these structures. This study offers an enhanced and comprehensive framework for learning activity identification tasks: an activity recognition model based on Transformers. Across all investigated datasets and situations, the suggested method obtains consistently higher accuracy and improved generalization. The framework described above may be implemented in a codebase that can be found at (<xref ref-type="bibr" rid="B4">4</xref>). Relying on one or more of these sensors, HAR finds use in a wide range of applications, such as indoor navigation, gesture recognition, healthcare, and surveillance. The results obtained for the datasets SHAR, HAR, and SLR are presented in <xref ref-type="table" rid="T2">Table 2</xref>.</p>
<table-wrap position="float" id="T2">
<label>TABLE 2</label>
<caption><p>The SLR, HAR, and SHAR datasets&#x2019; respective results.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left">Experiment</td>
<td valign="top" align="center">Window size</td>
<td valign="top" align="center">IMU-CNN accuracy (%)</td>
<td valign="top" align="center">IMU-transformer accuracy</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">SLR</td>
<td valign="top" align="center">50</td>
<td valign="top" align="center">96.4</td>
<td valign="top" align="center">97.3</td>
</tr>
<tr>
<td valign="top" align="left">HAR</td>
<td valign="top" align="center">50</td>
<td valign="top" align="center">86.3</td>
<td valign="top" align="center">89.7</td>
</tr>
<tr>
<td valign="top" align="left">SHAR</td>
<td valign="top" align="center">50</td>
<td valign="top" align="center">83.4</td>
<td valign="top" align="center">85.2</td>
</tr>
<tr>
<td valign="top" align="left">Overall</td>
<td valign="top" align="center">50</td>
<td valign="top" align="center">88.6</td>
<td valign="top" align="center">90.6</td>
</tr>
</tbody>
</table></table-wrap>
<p>A lot of work has been done in the past in building systems that are automated or semiautomated which help to analyze exercise and sports activities such as swimming, basketball, etc.</p>
<p>S. Patil et al. (<xref ref-type="bibr" rid="B5">5</xref>) proposed a system for identifying Yoga posture differences between an expert and a practitioner using Speeded Up Robust Features (SURF), which uses information of image contours. However, describing and comparing the postures almost by using only the contour information is notsufficient. <xref ref-type="fig" rid="F1">Figure 1</xref> illustrates the Workflow Diagram.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption><p>Workflow Diagram.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="bijiam-2024-19-g001.tif"/>
</fig>
<p>W. Wu et al. (<xref ref-type="bibr" rid="B6">6</xref>) have devised a system that uses tactors and inertial measurement units (IMUs) for yoga training. However, this may cause the user discomfort and interfere with the natural yoga stance.</p>
<p>E. Trejo et al. (<xref ref-type="bibr" rid="B7">7</xref>) presented a system for Yoga pose detection for six poses using Adaboost classifier and Kinect sensors and achieved an accuracy of 94.8%. However, they used a depth sensor-based camera that may not be always accessible to users. <xref ref-type="fig" rid="F2">Figure 2</xref> demonstrates the Workflow Diagram.</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption><p>Workflow Diagram.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="bijiam-2024-19-g002.tif"/>
</fig>
<p>Another system for Yoga poses correction using Kinect has been presented by H. Chen et al. (<xref ref-type="bibr" rid="B8">8</xref>) which takes into account three Yoga poses, warrior III, downward dog, and tree pose. However, their results are not very impressive, and their accuracy score is only 82.84%. The traditional method of skeletonization has now been replaced by deep learning-based methods. <xref ref-type="table" rid="T3">Table 3</xref> shows Confusion Matrix of Asana recognition.</p>
<table-wrap position="float" id="T3">
<label>TABLE 3</label>
<caption><p>Confusion matrix of Asana recognition.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left">Ground truth</td>
<td valign="top" align="center" colspan="3">Recognition<hr/></td>
</tr>
<tr>
<td valign="top" align="left"/><td valign="top" align="center">Tree</td>
<td valign="top" align="center">Warrior</td>
<td valign="top" align="center">Dog</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Tree</td>
<td valign="top" align="center">24</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">0</td>
</tr>
<tr>
<td valign="top" align="left">Warrior</td>
<td valign="top" align="center">1</td>
<td valign="top" align="center">25</td>
<td valign="top" align="center">0</td>
</tr>
<tr>
<td valign="top" align="left">Dog</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">0</td>
<td valign="top" align="center">25</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="t3fns1"><p>&#x002A;Warrior III and Downward-facing dog are abbreviated as Warrior and Dog.</p></fn>
</table-wrap-foot>
</table-wrap>
</sec>
<sec id="S4">
<title>4. Motivation and problem formulation</title>
<p>Deep Learning is a promising domain where a lot of research is being done, enabling us to analyze tremendous data in a scalable manner. As compared to traditional Machine Learning models where feature extraction and engineering is a must, Deep Learning eliminates the necessity to do so by understanding complex patterns in the data and extracting features on its own.</p>
<sec id="S4.SS1">
<title>4.1. About deep learning</title>
<p>When a model receives picture input and produces a prediction, deep learning is frequently employed for image classification tasks. In order to ascertain the relationship between the input and output, Deep Learning algorithms employ Neural Networks. In Pose Estimation tasks, an image including the individual&#x2019;s pose is used as input, and a Deep Learning model is trained to properly identify the various stances in order to categorize the images with accuracy. This could be a computationally expensive task if the number of images is large. Also, as we want accurate results, we would not want to compromise on the quality of the images as that could affect the features extracted by the model. Below are some basic deep learning models used for classification problems.</p>
<sec id="S4.SS1.SSS1">
<title>4.1.1. Multilayer Perceptron (MLP)</title>
<p>One input and one output layer make up the MLP, a traditional neural network. Known as hidden layers, these are the layers that lie between the input and output layers. There can be one or more hidden layers. MLPs form a fully connected network as every node in one layer has a connection to every node in another layer. A fully connected network is a foundation for Deep Learning. MLP is popular for supervised classification where the input data is assigned a label or class.</p>
</sec>
<sec id="S4.SS1.SSS2">
<title>4.1.2. Recurrent Neural Network (RNN)</title>
<p>Neural Network Architectures, or RNNs, are employed in sequence prediction applications. One to many, many to one, and many to many are possible scenarios in sequence prediction. RNNs handle sequential data better since they retain a neuron&#x2019;s past information. RNNs are most commonly used for Natural Language Processing (NLP) problems where the input is naturally modeled in sequences.</p>
<p>In activity recognition or pose classification tasks too, there is a dependency between the previously performed action and the next action. In case of yoga as well, the context or information of initial or intermediary poses is important in predicting the final pose. Yoga can thus be thought of as a sequence of poses. This makes RNNs a suitable choice for Yoga pose recognition and classification.</p>
</sec>
<sec id="S4.SS1.SSS3">
<title>4.1.3. Convolutional Neural Network (CNN)</title>
<p>Convolutional Neural Network is a type of Neural Network widely used in the computer vision domain. It has proved to be highly effective such that it has become the go-to method for most image data. CNNs consist of a minimum of one convolutional layer which is the first layer and is responsible for feature extraction from the images. The convolutional layer, through the use of convolutional filters, generates what is called a feature map. With the help of a pooling layer, the dimensionality is reduced, which reduces the training time and prevents overfitting. CNNs show a great promise in pose classification tasks, making them a highly desirable choice.</p>
</sec>
</sec>
<sec id="S4.SS2">
<title>4.2. Problem formulation</title>
<p>As we can see, there is yet to be a robust methodology to identify and classify Yoga poses (Human poses in general) with minimum error, time, and computational power. We are hoping to use Deep Learning Algorithms to try and fill that gap.</p>
</sec>
<sec id="S4.SS3">
<title>4.3. Proposed solution</title>
<p>Through ML/DL approaches, the study of various human postures and actions is being conducted to predict the possible Yoga pose. Through ML/DL approaches, the new data obtained and the supervised learning continued to cause the system or model learn and evolve. We had used using single-user pose recognition to create personalized datasets. Our aim is to provide a self-instruction system that allows people to learn and practice yoga correctly by themselves. This project lays the foundation for building such a system by discussing various Machine Learning and Deep Learning approaches to accurately classify Yoga poses on prerecorded videos and images.</p>
</sec>
</sec>
<sec id="S5">
<title>5. Evaluation metrics</title>
<sec id="S5.SS1">
<title>5.1. Classification score</title>
<p>Classification score refers to what we usually mean by accuracy of the model. It can be described as the proportion of number of predictions that were correct to the total input samples. In case of multiclass classification, this metric gives good results when the number of samples in each class is almost the same.</p>
<disp-formula id="S5.Ex1"><mml:math id="M1">
<mml:mrow>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>c</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>c</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>u</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>r</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>a</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>c</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mpadded width="+3.3pt">
<mml:mi>y</mml:mi>
</mml:mpadded>
</mml:mrow>
<mml:mo rspace="5.8pt">=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>N</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>u</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>m</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>b</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>e</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mpadded width="+5pt">
<mml:mi>r</mml:mi>
</mml:mpadded>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>o</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mpadded width="+5pt">
<mml:mi>f</mml:mi>
</mml:mpadded>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>C</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>o</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>r</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>r</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>e</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>c</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mpadded width="+5pt">
<mml:mi>t</mml:mi>
</mml:mpadded>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>r</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>e</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>d</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>c</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>o</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>o</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>a</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mpadded width="+5pt">
<mml:mi>l</mml:mi>
</mml:mpadded>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>u</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>m</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>b</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>e</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mpadded width="+5pt">
<mml:mi>r</mml:mi>
</mml:mpadded>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>o</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mpadded width="+5pt">
<mml:mi>f</mml:mi>
</mml:mpadded>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>r</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>e</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>d</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>c</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>o</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mpadded width="+5pt">
<mml:mi>s</mml:mi>
</mml:mpadded>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>m</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>a</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>d</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>e</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
</disp-formula>
</sec>
<sec id="S5.SS2">
<title>5.2. Confusion matrix</title>
<p>Confusion matrix represents a matrix that explains the accuracy of the model completely. There are four important terms when it comes to measuring the performance of a model.</p>
<list list-type="simple">
<list-item>
<label>&#x2022;</label>
<p>True Positive: Predicted value &#x0026; the actual output are both 1.</p>
</list-item>
<list-item>
<label>&#x2022;</label>
<p>True Negative: Predicted value &#x0026; the actual output are both 0.</p>
</list-item>
<list-item>
<label>&#x2022;</label>
<p>False Positive: Predicted value is 1 but the actual output is 0.</p>
</list-item>
<list-item>
<label>&#x2022;</label>
<p>False Negative: Predicted value is 0 but the actual output is 1.</p>
</list-item>
</list>
<p><xref ref-type="table" rid="T4">Table 4</xref> shows a basic confusion matrix for binary classification. The diagonal values represent the samples that are correctly classified and thus, we always want the diagonal of the matrix to contain the maximum value. In case of a multiclass classification, each class represents one row and column of the matrix.</p>
<table-wrap position="float" id="T4">
<label>TABLE 4</label>
<caption><p>2x2 Confusion matrix.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left">Predicted values</td>
<td valign="top" align="center" colspan="3">Actual values<hr/></td>
</tr>
<tr>
<td/>
<td/>
<td valign="top" align="center">Positive (1)</td>
<td valign="top" align="center">Negative (0)</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left"></td>
<td valign="top" align="center">Positive (1)</td>
<td valign="top" align="center">TP</td>
<td valign="top" align="center">FP</td>
</tr>
<tr>
<td/>
<td valign="top" align="center">Negative (0)</td>
<td valign="top" align="center">FN</td>
<td valign="top" align="center">TN</td>
</tr>
</tbody>
</table></table-wrap>
</sec>
<sec id="S5.SS3">
<title>5.3. Model accuracy and model loss curves</title>
<p>These curves are also referred to as learning curves and are mostly used for models that learn incrementally over time, for example, Neural Networks. They represent the evaluation on the training and validation data which gives us an idea of how well the model is learning and how well is it generalizing. The model loss curve represents a minimizing score (loss), which means that a lower score results in better model performance. The model accuracy curve represents a maximizing score (accuracy), which means that a higher score denotes better performance of the model. A good fitting model loss curve is one in which the training and validation loss decrease and reach a point of stability and have a minimal gap between the final loss values. On the other hand, a good fitting model accuracy curve is one in which the training and validation accuracy increase and become stable and there is a minimum gap between the final accuracy values.</p>
</sec>
</sec>
<sec id="S6">
<title>6. Methodology</title>
<p>Proposed Models.</p>
<sec id="S6.SS1">
<title>6.1. DenseNet201</title>
<p>Overview</p>
<p>Through a feed-forward connection, DenseNet links each layer to every other layer. They significantly lower the number of parameters, improve feature propagation, resolve the vanishing-gradient issue, and promote feature reuse. The concept behind DenseNet is that convolutional networks with fewer connections between input and output layers may be trained to be significantly deeper, more accurate, and more efficient.</p>
<p>201 layers deep CNN is called DenseNet-201. DenseNet is a reasonably tiny and low power consumption model, which is why we chose it.</p>
<p>Results</p>
<p>Training Accuracy: 0.8066</p>
<p>Validation Loss: 2.1614</p>
<p>Validation Accuracy: 0.5377</p>
<p><bold>Workflow Diagram</bold></p>
<p><xref ref-type="fig" rid="F3">Figure 3</xref> illustrates Workflow Diagram of DenseNet 201.</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption><p>Workflow Diagram of DenseNet201.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="bijiam-2024-19-g003.tif"/>
</fig>
</sec>
<sec id="S6.SS2">
<title>6.2. ResNet50</title>
<p>Overview</p>
<p>When we add more layers to our deep neural networks, the performance becomes stagnant or starts to degrade. This happens due to the vanishing gradient problem. When gradients are back propagated through the deep neural network and repeatedly multiplied, this makes gradients extremely small causing the vanishing gradient problem. ResNet solves the vanishing gradient problem by using Identity shortcut connection or skip connections that skip one or more layers. Shortcut connections are connecting output on layer N to the input of layer N+Z.</p>
<p>ResNet-50 is a 50 layers deep CNN. It has 48 Convolution layers along with 1 MaxPool and 1 Average Pool layer. It has 3.8 x 10^9 Floating points operations. It is a type of Artificial Neural Network (ANN) that forms networks by stacking residual blocks.</p>
<p>Results</p>
<p>Training Accuracy: 0.7555</p>
<p>Validation Loss: 1.6325</p>
<p>Validation Accuracy: 0.5590</p>
<p><bold>Workflow Diagram</bold></p>
<p><xref ref-type="fig" rid="F4">Figure 4</xref> shows Workflow Diagram of ResNet50.</p>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption><p>Workflow Diagram of ResNet50.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="bijiam-2024-19-g004.tif"/>
</fig>
</sec>
<sec id="S6.SS3">
<title>6.3. VGG16</title>
<p>Overview</p>
<p>There are 16 layers in the convolutional neural network VGG-16. It is thought to be among the greatest computer vision models available today. It mostly concentrates on using the maxpool layer of a 2x2 filter with stride 2 and identical padding for convolution layers of a 3x3 filter with stride 1. Over the complete design, it maintains this configuration of convolution and max pool layers. The final component is a softmax output, which is followed by two FCs (completely connected layers). 16 layers with weights is what the 16 in VGG16 stands for. Approximately 138 million (approximately) trainable parameters make up this huge network.</p>
<p>Results</p>
<p>Training Accuracy: 0.9545</p>
<p>Validation Loss: 0.9325</p>
<p>Validation Accuracy: 0.9245</p>
<p><bold>Workflow Diagram</bold></p>
<p><xref ref-type="fig" rid="F5">Figure 5</xref> shows Workflow Diagram of VGG16</p>
<fig id="F5" position="float">
<label>FIGURE 5</label>
<caption><p>Workflow Diagram of VGG16.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="bijiam-2024-19-g005.tif"/>
</fig>
<p><xref ref-type="fig" rid="F6">Figure 6</xref> illustrates Workflow of VGG16 model.</p>
<fig id="F6" position="float">
<label>FIGURE 6</label>
<caption><p>Workflow of VGG16 model.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="bijiam-2024-19-g006.tif"/>
</fig>
</sec>
<sec id="S6.SS4">
<title>6.4. VGG19</title>
<p>Overview</p>
<p>A convolutional neural network with 19 layers is called VGG-19. There are sixteen convolution layers, five MaxPool layers, three fully linked layers, and one SoftMax layer. It makes use of (3x3) kernels with a 1-pixel stride size. 19.6 billion FLOPs are in VGG19.</p>
<p>To maintain the image&#x2019;s spatial resolution, spatial padding was applied. Using stride 2, max pooling is carried out over a 2x2 pixel frame. The Rectified Linear Unit (ReLu), which introduces non-linearity to improve the model&#x2019;s classification and computing efficiency, comes next. Previous models relied on sigmoid or tanh functions.</p>
<p>One big advantage of VGG19 is that the weights are easily available with other frameworks like keras so they can be tinkered with and used for as one wants.</p>
<p>Workflow Diagram</p>
<p><xref ref-type="fig" rid="F7">Figure 7</xref> describes Workflow Diagram of VGG19.</p>
<fig id="F7" position="float">
<label>FIGURE 7</label>
<caption><p>Workflow Diagram of VGG19.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="bijiam-2024-19-g007.tif"/>
</fig>
<p><xref ref-type="fig" rid="F8">Figure 8</xref> exemplifies Workflow of VGG19 model.</p>
<fig id="F8" position="float">
<label>FIGURE 8</label>
<caption><p>Workflow of VGG19 model.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="bijiam-2024-19-g008.tif"/>
</fig>
<p>Results</p>
<p>Training Accuracy: 0.9896</p>
<p>Validation Loss: 0.0210</p>
<p>Validation Accuracy: 0.9891</p>
<p><xref ref-type="fig" rid="F9">Figure 9</xref> shows total params and trainable params.</p>
<fig id="F9" position="float">
<label>FIGURE 9</label>
<caption><p>Total params and trainable params.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="bijiam-2024-19-g009.tif"/>
</fig>
<p><xref ref-type="fig" rid="F10">Figure 10</xref> demonstrates Accuracy.</p>
<fig id="F10" position="float">
<label>FIGURE 10</label>
<caption><p>Accuracy.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="bijiam-2024-19-g010.tif"/>
</fig>
<p><xref ref-type="fig" rid="F11">Figure 11</xref> explains Confusion Matrix: for 1. Down dog 2. Goddess 3. Plank 4. Tree 5. Warrior II and <xref ref-type="table" rid="T5">Table 5</xref> shows the Result Chart.</p>
<fig id="F11" position="float">
<label>FIGURE 11</label>
<caption><p>Confusion Matrix: 1. Down dog 2. Goddess 3. Plank 4. Tree 5. Warrior II.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="bijiam-2024-19-g011.tif"/>
</fig>
<table-wrap position="float" id="T5">
<label>TABLE 5</label>
<caption><p>Result Chart.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left">Class</td>
<td valign="top" align="center">n(truth)</td>
<td valign="top" align="center">n(classified)</td>
<td valign="top" align="center">Accuracy</td>
<td valign="top" align="center">Precision</td>
<td valign="top" align="center">Recall</td>
<td valign="top" align="center">F1 score</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">1</td>
<td valign="top" align="center">80</td>
<td valign="top" align="center">80</td>
<td valign="top" align="center">97.18%</td>
<td valign="top" align="center">0.94</td>
<td valign="top" align="center">0.94</td>
<td valign="top" align="center">0.94</td>
</tr>
<tr>
<td valign="top" align="left">2</td>
<td valign="top" align="center">75</td>
<td valign="top" align="center">76</td>
<td valign="top" align="center">96.89%</td>
<td valign="top" align="center">0.92</td>
<td valign="top" align="center">0.93</td>
<td valign="top" align="center">0.93</td>
</tr>
<tr>
<td valign="top" align="left">3</td>
<td valign="top" align="center">66</td>
<td valign="top" align="center">63</td>
<td valign="top" align="center">97.46%</td>
<td valign="top" align="center">0.95</td>
<td valign="top" align="center">0.91</td>
<td valign="top" align="center">0.93</td>
</tr>
<tr>
<td valign="top" align="left">4</td>
<td valign="top" align="center">61</td>
<td valign="top" align="center">60</td>
<td valign="top" align="center">96.89%</td>
<td valign="top" align="center">0.92</td>
<td valign="top" align="center">0.90</td>
<td valign="top" align="center">0.91</td>
</tr>
<tr>
<td valign="top" align="left">5</td>
<td valign="top" align="center">72</td>
<td valign="top" align="center">75</td>
<td valign="top" align="center">95.20%</td>
<td valign="top" align="center">0.87</td>
<td valign="top" align="center">0.90</td>
<td valign="top" align="center">0.88</td>
</tr>
</tbody>
</table></table-wrap>
<p><xref ref-type="fig" rid="F12">Figure 12</xref> shows Training Accuracy vs. Validation Accuracy &#x0026; Training Loss vs. Validation Loss.</p>
<fig id="F12" position="float">
<label>FIGURE 12</label>
<caption><p>Training Accuracy vs. Validation Accuracy and Training Loss vs. Validation Loss.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="bijiam-2024-19-g012.tif"/>
</fig>
</sec>
<sec id="S6.SS5">
<title>6.5. CNN (K-fold Cross Validation)</title>
<p>Overview</p>
<p>A resampling technique called cross-validation is used to assess machine learning models on a small sample of data. The process takes a single parameter, k, which is the number of groups into which a given data sample should be divided. As such, k-fold cross-validation is a common name for the process. When a particular number for k is selected, it may be substituted for k in the model&#x2019;s reference; for example, k = 10 would become 10-fold cross-validation.</p>
<p>In applied machine learning, cross-validation is mostly used to assess a model&#x2019;s proficiency on hypothetical data. That is, to assess the model&#x2019;s projected overall performance using a small sample in order to make predictions on data that were not utilized during the training of the model.</p>
<p>The general procedure is as follows:</p>
<list list-type="simple">
<list-item>
<label>(1)</label>
<p>Randomly arrange the dataset.</p>
</list-item>
<list-item>
<label>(2)</label>
<p>Divide the collection into k groups.</p>
</list-item>
<list-item>
<label>(3)</label>
<p>For every distinct group:
<list list-type="simple">
<list-item>
<label>a.</label>
<p>Use the group as a test or holdout set of data.</p>
</list-item>
<list-item>
<label>b.</label>
<p>Create a training data set from the remaining groups.</p>
</list-item>
<list-item>
<label>c.</label>
<p>Use the training set to fit a model, then assess it using the test set.</p>
</list-item>
<list-item>
<label>d.</label>
<p>Save the assessment result and throw away the model.</p>
</list-item>
</list></p></list-item>
<list-item>
<label>(4)</label>
<p>Compile the model&#x2019;s skill set using a sample of model evaluation results.</p>
</list-item>
</list>
<p>Workflow Diagram</p>
<p><xref ref-type="fig" rid="F13">Figure 13</xref> illustrates Workflow Diagram of K-fold Cross Validation.</p>
<fig id="F13" position="float">
<label>FIGURE 13</label>
<caption><p>Workflow Diagram of K-fold Cross Validation.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="bijiam-2024-19-g013.tif"/>
</fig>
<p>Results</p>
<p>Training Accuracy: 0.9675</p>
<p>Validation Loss: 0.1210</p>
<p>Validation Accuracy: 0.8620</p>
<p><xref ref-type="fig" rid="F14">Figure 14</xref> shows Confusion Matrix.</p>
<fig id="F14" position="float">
<label>FIGURE 14</label>
<caption><p>Confusion Matrix.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="bijiam-2024-19-g014.tif"/>
</fig>
<p><xref ref-type="fig" rid="F15">Figure 15</xref> shows Training Accuracy vs. Validation Accuracy vs. Training Loss vs. Validation Loss.</p>
<fig id="F15" position="float">
<label>FIGURE 15</label>
<caption><p>Training Accuracy vs. Validation Accuracy vs. Training Loss vs. Validation Loss.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="bijiam-2024-19-g015.tif"/>
</fig>
</sec>
<sec id="S6.SS6">
<title>6.6. CNN (Transfer Learning)</title>
<p>Overview</p>
<p>Using Transfer Learning, a Convolutional Neural Network technique, a model created for one task is applied to another as the foundation for a new model.</p>
<p>Given the enormous compute and time resources required to develop neural network models on these problems, as well as the significant skill jumps, they provide on related problems, pre-trained models are frequently used as the starting point on computer vision and natural language processing tasks in deep learning.</p>
<p>Transfer learning involves training a base network on a base dataset and task first, then repurposing the acquired features to train a second target network on a target dataset and task. When characteristics are general&#x2014;that is, appropriate for both base and target tasks&#x2014;rather than particular to the base task, this procedure is more likely to succeed.</p>
<p>Workflow Diagram</p>
<p><xref ref-type="fig" rid="F16">Figure 16</xref> demonstrates Workflow Diagram of Transfer Learning</p>
<fig id="F16" position="float">
<label>FIGURE 16</label>
<caption><p>Workflow Diagram of Transfer Learning.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="bijiam-2024-19-g016.tif"/>
</fig>
<p>Results</p>
<p>Training Accuracy: 0.9749</p>
<p>Validation Loss: 0.0907</p>
<p>Validation Accuracy: 0.9745</p>
<p>F1 Score: 0.974531880367235</p>
<p><xref ref-type="fig" rid="F17">Figure 17</xref> shows total params and trainable params</p>
<fig id="F17" position="float">
<label>FIGURE 17</label>
<caption><p>Total params and trainable params.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="bijiam-2024-19-g017.tif"/>
</fig>
<p><xref ref-type="fig" rid="F18">Figure 18</xref> depicts Confusion Matrix.</p>
<fig id="F18" position="float">
<label>FIGURE 18</label>
<caption><p>Confusion Matrix.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="bijiam-2024-19-g018.tif"/>
</fig>
<p><xref ref-type="table" rid="T6">Table 6</xref> describes the Result Chart.</p>
<table-wrap position="float" id="T6">
<label>TABLE 6</label>
<caption><p>Result chart.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left">Class</td>
<td valign="top" align="center">Accuracy</td>
<td valign="top" align="center">Precision</td>
<td valign="top" align="center">Recall</td>
<td valign="top" align="center">F1 score</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">1</td>
<td valign="top" align="center">98.60%</td>
<td valign="top" align="center">0.95</td>
<td valign="top" align="center">0.95</td>
<td valign="top" align="center">0.98</td>
</tr>
<tr>
<td valign="top" align="left">2</td>
<td valign="top" align="center">97.18%</td>
<td valign="top" align="center">0.93</td>
<td valign="top" align="center">0.94</td>
<td valign="top" align="center">0.97</td>
</tr>
<tr>
<td valign="top" align="left">3</td>
<td valign="top" align="center">97.92%</td>
<td valign="top" align="center">0.96</td>
<td valign="top" align="center">0.92</td>
<td valign="top" align="center">0.96</td>
</tr>
<tr>
<td valign="top" align="left">4</td>
<td valign="top" align="center">97.45%</td>
<td valign="top" align="center">0.94</td>
<td valign="top" align="center">0.90</td>
<td valign="top" align="center">0.97</td>
</tr>
<tr>
<td valign="top" align="left">5</td>
<td valign="top" align="center">96.30%</td>
<td valign="top" align="center">0.89</td>
<td valign="top" align="center">0.91</td>
<td valign="top" align="center">0.90</td>
</tr>
</tbody>
</table></table-wrap>
<p><xref ref-type="fig" rid="F19">Figure 19</xref> shows Training Accuracy vs. Validation Accuracy vs. Training Loss vs. Validation Loss.</p>
<fig id="F19" position="float">
<label>FIGURE 19</label>
<caption><p>Training Accuracy vs. Validation Accuracy vs. Training Loss vs. Validation Loss.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="bijiam-2024-19-g019.tif"/>
</fig>
</sec>
</sec>
<sec id="S7">
<title>7. Dataset</title>
<p>The Dataset used in the project is taken from Kaggle (<xref ref-type="bibr" rid="B9">9</xref>) and is publicly available. It consists of images of 5 Yoga poses namely&#x2212;Downdog (Adho Mukha Svanasana), Goddess (UtkataKonasana), Plank (Phalakasana), Tree (Vrikshsasana), and Warrior II (Virbhadrasana II).</p>
<p>The total number of images is 1081. All the images have been taken in an indoor and an outdoor environment. <xref ref-type="table" rid="T7">Table 7</xref> describes the Dataset.</p>
<table-wrap position="float" id="T7">
<label>TABLE 7</label>
<caption><p>Dataset.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left">Sl. No.</td>
<td valign="top" align="center">Yoga pose</td>
<td valign="top" align="center">Regional name</td>
<td valign="top" align="center">No. of images</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">01</td>
<td valign="top" align="center">Downdog</td>
<td valign="top" align="center">Adho Mukha Svanasana</td>
<td valign="top" align="center">223</td>
</tr>
<tr>
<td valign="top" align="left">02</td>
<td valign="top" align="center">Goddess</td>
<td valign="top" align="center">UtkataKonasana</td>
<td valign="top" align="center">180</td>
</tr>
<tr>
<td valign="top" align="left">03</td>
<td valign="top" align="center">Plank</td>
<td valign="top" align="center">Phalakasana</td>
<td valign="top" align="center">266</td>
</tr>
<tr>
<td valign="top" align="left">04</td>
<td valign="top" align="center">Tree</td>
<td valign="top" align="center">Vrikshsasana</td>
<td valign="top" align="center">160</td>
</tr>
<tr>
<td valign="top" align="left">05</td>
<td valign="top" align="center">Warrior II</td>
<td valign="top" align="center">Virbhadrasana II</td>
<td valign="top" align="center">252</td>
</tr>
<tr>
<td valign="top" align="left" colspan="3">Total Yoga Pose Images</td>
<td valign="top" align="center">1081</td>
</tr>
</tbody>
</table></table-wrap>
</sec>
<sec id="S8">
<title>8. Results or finding</title>
<p>Human Activity Recognition has been studied extensively in the past years and resulted in the advancement of Machine Learning and Deep Learning methodologies as well as giving rise to new techniques. Further development has been seen in the particular aspect we worked on, i.e., Human Pose Recognition and Estimation to assist in prevention of injuries in sports and exercises and improves performance.</p>
<p>Our Yoga Pose Recognition and Classification system, which will lead to a Yoga self-instruction system, carries the potential to make Yoga more accessible and approachable, making it popular and making sure it is performed in the right manner.</p>
<p>Deep Learning methods are promising because of the vast research being done in this field. The use of the CNN models on the given dataset is seen to be highly effective and classifies all the 5 yoga poses perfectly. <xref ref-type="table" rid="T8">Table 8</xref> illustrates the Comparative study.</p>
<table-wrap position="float" id="T8">
<label>TABLE 8</label>
<caption><p>Comparative study.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left">Sl. No.</td>
<td valign="top" align="center">Model</td>
<td valign="top" align="center">Training Accuracy</td>
<td valign="top" align="center">Validation Loss</td>
<td valign="top" align="center">Validation Accuracy</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">1</td>
<td valign="top" align="center">DenseNet201</td>
<td valign="top" align="center">0.8066</td>
<td valign="top" align="center">2.1614</td>
<td valign="top" align="center">0.5377</td>
</tr>
<tr>
<td valign="top" align="left">2</td>
<td valign="top" align="center">ResNet50</td>
<td valign="top" align="center">0.7555</td>
<td valign="top" align="center">1.6325</td>
<td valign="top" align="center">0.5590</td>
</tr>
<tr>
<td valign="top" align="left">3</td>
<td valign="top" align="center">VGG16</td>
<td valign="top" align="center">0.9545</td>
<td valign="top" align="center">0.9325</td>
<td valign="top" align="center">0.9245</td>
</tr>
<tr>
<td valign="top" align="left">4</td>
<td valign="top" align="center">VGG19</td>
<td valign="top" align="center">0.9896</td>
<td valign="top" align="center">0.0210</td>
<td valign="top" align="center">0.9891</td>
</tr>
<tr>
<td valign="top" align="left">6</td>
<td valign="top" align="center">CNN (K-Fold Cross Validation)</td>
<td valign="top" align="center">0.9675</td>
<td valign="top" align="center">0.1210</td>
<td valign="top" align="center">0.8620</td>
</tr>
<tr>
<td valign="top" align="left">5</td>
<td valign="top" align="center">CNN (Transfer Learning)</td>
<td valign="top" align="center">0.9749</td>
<td valign="top" align="center">0.0907</td>
<td valign="top" align="center">0.9745</td>
</tr>
</tbody>
</table></table-wrap>
</sec>
<sec id="S9">
<title>9. Discussions</title>
<p>According to <xref ref-type="table" rid="T8">Table 8</xref> one of the highest accuracies is shown by the Transfer Learning model with the least validation loss. Also, we see from <xref ref-type="table" rid="T6">Table 6</xref> that this model has the best consistent Accuracy, Precision, Recall and F1-score for all of the observed Yoga poses though the later ones have a minor drop in the values.</p>
<p>From <xref ref-type="fig" rid="F21">Figure 21</xref> we can show that our model reflects a quick rise in both training and validation accuracy that stabilizes early at a certain max value. Unlike VGG19, there is no further dip in the value and is consistent throughout. Inversely, we can show that our model has a quick fall in training and validation loss that gradually stabilizes at a certain min value. It is a bit unstable but stabilizes eventually. Transfer Learning is a bit less accurate than VGG19 but it reaches stability faster and gives better Precision, Recall, and F1-score. It also produces a better Confusion Matrix than the previous, as shown in <xref ref-type="fig" rid="F20">Figure 20</xref>. So, it can be trained faster and with a smaller optimized dataset with greater accuracy.</p>
<p>Thus, we conclude that Transfer Learning is the best model for Yoga Pose Recognition and Classification with minimum error we want to achieve.</p>
<fig id="F20" position="float">
<label>FIGURE 20</label>
<caption><p>Workflow Diagram of Android Application.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="bijiam-2024-19-g020.tif"/>
</fig>
<fig id="F21" position="float">
<label>FIGURE 21</label>
<caption><p>The QR code.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="bijiam-2024-19-g0021.tif"/>
</fig>
</sec>
<sec id="S10" sec-type="results">
<title>10. Results</title>
<p>Android Implementation</p>
<p>Android is a mobile operating system based on a modified version of the Linux Kernel and other open-source software, designed primarily for touchscreen mobiles.</p>
<p>Why have we chosen android?</p>
<p>Now a days android smart phones are the most popular across the globe. Almost everyone has a smart phone. Therefore, through our app, Yoga Trainer, we can deliver our product to a huge audience and also as the android is an open-source platform, it&#x2019;s easy to access and work with camera and sensors. There isn&#x2019;t any restriction in android during development. It gives more freedom, control, and customization.</p>
<p>Till now our implementation</p>
<list list-type="simple">
<list-item>
<label>&#x2022;</label>
<p>At the opening of our main screen (MainActivity() ? CameraFragment()) we are opening camera to take real-time video shots. To do so we are using CameraX Library, which is a Jetpack library, built to help make camera app development easier.</p>
</list-item>
<list-item>
<label>&#x2022;</label>
<p>Using CameraX whatever image frames we are getting in real time we are converting them into a required Bitmap, which is simply a rectangle of pixels.</p>
</list-item>
<list-item>
<label>&#x2022;</label>
<p>Now we are analyzing each image in real time via our trained tflite model with the help of image analyzer of cameraX and tflite android support.</p>
</list-item>
<list-item>
<label>&#x2022;</label>
<p>Then according to the analyzed data our result is shown. If any poses are matching, the name of the pose and accuracy are displayed.</p>
</list-item>
</list>
<p>Workflow Diagram</p>
<p><xref ref-type="fig" rid="F20">Figure 20</xref> shows Workflow Diagram of Android Application.</p>
<p>QR Code to download the app</p>
<p><xref ref-type="fig" rid="F21">Figure 21</xref> illustrates The QR Code</p>
<p>Download Link: <ext-link ext-link-type="uri" xlink:href="https://yogatrainer-finalyearproject.github.io/">https://yogatrainer-finalyearproject.github.io/</ext-link></p>
<p>Working (Shown through four images)</p>
<p><xref ref-type="fig" rid="F22">Figures 22</xref>&#x2013;<xref ref-type="fig" rid="F25">25</xref> show detection of different yoga poses.</p>
<fig id="F22" position="float">
<label>FIGURE 22</label>
<caption><p>Plank pose.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="bijiam-2024-19-g0022.tif"/>
</fig>
<fig id="F23" position="float">
<label>FIGURE 23</label>
<caption><p>Tree pose.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="bijiam-2024-19-g0023.tif"/>
</fig>
<fig id="F24" position="float">
<label>FIGURE 24</label>
<caption><p>Warrior pose.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="bijiam-2024-19-g0024.tif"/>
</fig>
<fig id="F25" position="float">
<label>FIGURE 25</label>
<caption><p>Downdog pose.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="bijiam-2024-19-g0025.tif"/>
</fig>
<p>Technology Used</p>
<p>Framework</p>
<list list-type="simple">
<list-item>
<label>&#x2022;</label>
<p>Android</p>
</list-item>
</list>
<p>Language</p>
<list list-type="simple">
<list-item>
<label>&#x2022;</label>
<p>Kotlin</p>
</list-item>
<list-item>
<label>&#x2022;</label>
<p>XML</p>
</list-item>
<list-item>
<label>&#x2022;</label>
<p>Python</p>
</list-item>
</list>
<p>Tools</p>
<list list-type="simple">
<list-item>
<label>&#x2022;</label>
<p>TFLite</p>
</list-item>
<list-item>
<label>&#x2022;</label>
<p>Figma</p>
</list-item>
<list-item>
<label>&#x2022;</label>
<p>Android Studio</p>
</list-item>
</list>
</sec>
<sec id="S11" sec-type="conclusion">
<title>11. Conclusion</title>
<p>The work aims to develop a self-instruction system for practicing yoga by individuals using machine learning and deep learning approaches. It uses single-user pose recognition to create personalized datasets, allowing users to learn and practice yoga independently. The system evolved through new data and supervised learning. The project discusses various ML and DL approaches for accurately classifying yoga poses from pre-recorded videos and photos.</p>
<p>The proposed model currently classifies only 5 Yoga asanas. There are a number of Yoga asanas, and hence creating a pose recognition and classification model that can be successful for all the asanas is a challenging problem. The dataset can be expanded my adding more Yoga poses performed by individuals not only in indoor settings but also in outdoor settings. A portable device for self-training and real-time predictions can be implemented for this system. This work demonstrates Human Activity Recognition for practical applications.</p>
<p>We will also like to extend the idea to build a full-fledged Yoga Tutorial and Guidance system in real time at home with step-by-step tutorials and will also provide posture error detection, posture correction guidance, and other healthcare-related services like personalized Yoga regime, online real-time guidance, personalized diet charts, etc. through a dedicated app.</p>
</sec>
<sec id="S12" sec-type="author-contributions">
<title>Author contributions</title>
<p>PG, SS, and AJ conceived of the presented idea. PG, SS, and AJ developed the theory and performed the computations. AJ and SS verified the analytical methods. PG, RM, and AS encouraged us to investigate and supervised the findings of this work. All authors discussed the results and contributed to the final manuscript.</p>
</sec>
</body>
<back>
<ref-list>
<title>References</title>
<ref id="B1"><label>1.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Straczkiewicz</surname> <given-names>M</given-names></name> <name><surname>Peter</surname> <given-names>J</given-names></name> <name><surname>Jukka-Pekka</surname> <given-names>O</given-names></name></person-group>. <article-title>A systematic review of smartphone-based human activity recognition methods for health research.</article-title> <source><italic>NPJ Digit Med.</italic></source> (<year>2021</year>) <volume>4</volume>:<fpage>1</fpage>&#x2013;<lpage>15</lpage>.</citation></ref>
<ref id="B2"><label>2.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nguyen</surname> <given-names>B</given-names></name> <name><surname>Yves</surname> <given-names>C</given-names></name> <name><surname>Teodiano</surname> <given-names>B</given-names></name> <name><surname>Sridhar</surname> <given-names>K</given-names></name></person-group>. <article-title>Trends in human activity recognition with focus on machine learning and power requirements.</article-title> <source><italic>Mach Learn Applic.</italic></source> (<year>2021</year>) <volume>5</volume>:<issue>100072</issue>.</citation></ref>
<ref id="B3"><label>3.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shavit</surname> <given-names>Y</given-names></name> <name><surname>Itzik</surname> <given-names>K</given-names></name></person-group>. <article-title>Boosting inertial-based human activity recognition with transformers.</article-title> <source><italic>IEEE Access.</italic></source> (<year>2021</year>) <volume>9</volume>:<fpage>53540</fpage>&#x2013;<lpage>7</lpage>.</citation></ref>
<ref id="B4"><label>4.</label><citation citation-type="journal"><collab>GitHub, Inc.</collab> <source><italic>yolish/har-with-imu-transformer.</italic></source> (<year>2023</year>). Available online at: <ext-link ext-link-type="uri" xlink:href="https://github.com/yolish/har-with-imu-transformer">https://github.com/yolish/har-with-imu-transformer</ext-link></citation></ref>
<ref id="B5"><label>5.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Patil</surname> <given-names>S</given-names></name> <name><surname>Amey</surname> <given-names>P</given-names></name> <name><surname>Aditya</surname> <given-names>P</given-names></name> <name><surname>Aamir</surname> <given-names>NA</given-names></name> <name><surname>Arundhati</surname> <given-names>N</given-names></name></person-group>. <article-title>Yoga tutor visualization and analysis using SURF algorithm.</article-title> <source><italic>2011 IEEE control and system graduate research colloquium.</italic></source> <publisher-name>IEEE</publisher-name> (<year>2011</year>).</citation></ref>
<ref id="B6"><label>6.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wu</surname> <given-names>W</given-names></name> <name><surname>Yin</surname> <given-names>W</given-names></name> <name><surname>Guo</surname> <given-names>F</given-names></name></person-group>. <article-title>Learning and self-instruction expert system for Yoga.</article-title> <source><italic>2010 2nd international workshop on intelligent systems and applications.</italic></source> <publisher-name>IEEE</publisher-name> (<year>2010</year>).</citation></ref>
<ref id="B7"><label>7.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Trejo</surname> <given-names>EW</given-names></name> <name><surname>Yuan</surname> <given-names>P</given-names></name></person-group>. <article-title>Recognition of Yoga poses through an interactive system with Kinect device.</article-title> <source><italic>2018 2nd International Conference on Robotics and Automation Sciences (ICRAS).</italic></source> <publisher-name>IEEE</publisher-name> (<year>2018</year>).</citation></ref>
<ref id="B8"><label>8.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>H</given-names></name> <name><surname>He</surname> <given-names>Y</given-names></name> <name><surname>Chou</surname> <given-names>C</given-names></name> <name><surname>Lee</surname> <given-names>S</given-names></name> <name><surname>Lin</surname> <given-names>BP</given-names></name> <name><surname>Yu</surname> <given-names>J</given-names></name></person-group>. <article-title>Computer-assisted self-training system for sports exercise using kinects.</article-title> <source><italic>2013 IEEE International Conference on Multimedia and Expo Workshops (ICMEW).</italic></source> <publisher-name>IEEE</publisher-name> (<year>2013</year>).</citation></ref>
<ref id="B9"><label>9.</label><citation citation-type="journal"><collab>Kaggle.</collab> <source><italic>Yoga Poses Dataset.</italic></source> (<year>2023</year>). Available online at: <ext-link ext-link-type="uri" xlink:href="https://www.kaggle.com/datasets/niharika41298/yoga-poses-dataset">https://www.kaggle.com/datasets/niharika41298/yoga-poses-dataset</ext-link></citation></ref>
<ref id="B10"><label>10.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jayasuriya</surname></name> <name><surname>Kasun</surname> <given-names>S.</given-names></name></person-group> (<year>2012</year>) <article-title>Discuss evidence of the Yoga practices in the Pre-Vedic Indus-Saraswati Valley</article-title>.</citation></ref>
</ref-list>
</back>
</article>
