<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Archiving and Interchange DTD v2.3 20070202//EN" "archivearticle.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="methods-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Bohr. Cs.</journal-id>
<journal-title>BOHR International Journal of Computer Science</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Bohr. Cs.</abbrev-journal-title>
<issn pub-type="epub">2583-455X</issn>
<publisher>
<publisher-name>BOHR</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.54646/bijcs.2022.10</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Methods</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Disease prediction using machine learning</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Sena</surname> <given-names>G. Vasu</given-names></name>
</contrib>
<contrib contrib-type="author">
<name><surname>Rajinikanth</surname> <given-names>K.</given-names></name>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Faizan</surname> <given-names>Mohammed Khaja</given-names></name>
<xref ref-type="corresp" rid="c001"><sup>&#x002A;</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Rajan</surname> <given-names>D. Rohit</given-names></name>
</contrib>
</contrib-group>
<aff><institution>Student, Computer Science and Engineering, Gokaraju Rangaraju Institute of Engineering and Technology</institution>, <addr-line>Hyderabad</addr-line>, <country>India</country></aff>
<author-notes>
<corresp id="c001">&#x002A;Correspondence: Mohammed Khaja Faizan, <email>fabulousfaizan1234@gmail.com</email></corresp>
</author-notes>
<pub-date pub-type="epub">
<day>14</day>
<month>07</month>
<year>2022</year>
</pub-date>
<volume>1</volume>
<issue>1</issue>
<fpage>57</fpage>
<lpage>60</lpage>
<history>
<date date-type="received">
<day>12</day>
<month>05</month>
<year>2022</year>
</date>
<date date-type="accepted">
<day>01</day>
<month>07</month>
<year>2022</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x00A9; 2023 Sena, Rajinikanth, Faizan and Rajan.</copyright-statement>
<copyright-year>2023</copyright-year>
<copyright-holder>Sena, Rajinikanth, Faizan and Rajan</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by-nc-nd/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract>
<p>Predicting disease at an early stage becomes critical, and the most difficult challenge is to predict it correctly along with the sickness. The prediction happens based on the symptoms of an individual. The model presented can work like a digital doctor for disease prediction, which helps to timely diagnose the disease and can be efficient for the person to take immediate measures. The model is much more accurate in the prediction of potential ailments. The work was tested with four machine learning algorithms and got the best accuracy with Random Forest.</p>
</abstract>
<kwd-group>
<kwd>machine learning</kwd>
<kwd>random forest</kwd>
<kwd>disease prediction</kwd>
<kwd>Naive Bayes</kwd>
</kwd-group>
<counts>
<fig-count count="6"/>
<table-count count="0"/>
<equation-count count="2"/>
<ref-count count="5"/>
<page-count count="4"/>
<word-count count="1482"/>
</counts>
</article-meta>
</front>
<body>
<sec id="S1" sec-type="intro">
<title>Introduction</title>
<p>The main goal of our project is to provide the disease name by taking the symptoms from the user or patients. Nowadays everything is available on the internet, so we thought of predicting the disease based on the symptoms that are given by the customer online. It is an interactive system that takes symptoms from the customer. The customer has to provide a minimum of 2 symptoms that they are suffering from.</p>
<p>The system responds effectively graphical user interface (GUI) to make it look like or feels like it is a live interaction. You can create this type of disease prediction using machine learning algorithms as well as artificial algorithms to enquire, identify, and respond to the customer.</p>
<sec id="S1.SS1">
<title>Random forest algorithm</title>
<list list-type="simple">
<list-item>
<label>(1)</label>
<p>Random forest selects k number of records randomly from data having m records.</p>
</list-item>
<list-item>
<label>(2)</label>
<p>A separate decision tree is created for each sample.</p>
</list-item>
<list-item>
<label>(3)</label>
<p>Output is produced from every decision tree.</p>
</list-item>
<list-item>
<label>(4)</label>
<p>The result is based on averaging for classification and regression. Random forest is considered one of the effective algorithms used in classification.</p>
</list-item>
</list>
</sec>
<sec id="S1.SS2">
<title>Decision tree algorithm</title>
<p>Decision trees are commonly employed for classification. A decision tree is a classifier with a tree structure in which features are represented by internal nodes and the branches of the tree represent decision rules. The decision tree has two nodes. The judgment or test is made based on the dataset&#x2019;s properties.</p>
</sec>
<sec id="S1.SS3">
<title>Naive Bayes algorithm</title>
<p>The algorithm that is used in the classification of binary and multiclass is the Naive Bayes algorithm. The Naive Bayes algorithm is very simple and easy to understand, and the Naive Bayes algorithm provides good output for a wide range of output P (class1| data1) = (P (datallclassl) &#x00D7; P (classl))/P (datal). With the help of the Naive Bayes algorithm, we can calculate the probability of a piece of data belonging to a given class.</p>
</sec>
<sec id="S1.SS4">
<title>K-nearest neighbor (KNN)</title>
<p>A pattern has been found to link the data and results, which helps in improving the recognition with each iteration. It involves the following steps:</p>
<list list-type="simple">
<list-item>
<label>(1)</label>
<p>We need to load the required data.</p>
</list-item>
<list-item>
<label>(2)</label>
<p>We need to calculate the distance between points, which is called the Euclidean distance.</p>
</list-item>
<list-item>
<label>(3)</label>
<p>We have top k top distances.</p>
</list-item>
</list>
<p>Python was chosen for a variety of reasons. It is dependent on your perspective and background. It is made for programmers. One of the most well-known programming languages is Python. Python is one of the easiest programming languages to learn. It is quite simple, and we can use the grammar language in it as syntax. Python is one of the high-level languages, which has an inbuilt garbage collector that is used to free up the memory from the elements that are not used in the code.</p>
</sec>
<sec id="S1.SS5">
<title>Bits and pieces together</title>
<p>This approach can utilize the already done work by utilizing it as a starting point. All the information from accomplished work can be combined.</p>
</sec>
<sec id="S1.SS6">
<title>Equations</title>
<p>The equations should be inserted in an editable format from the equation editor.</p>
<disp-formula id="S1.Ex1"><mml:math id="M1">
<mml:mrow>
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo rspace="5.8pt" stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo rspace="5.8pt">=</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mpadded width="+3.3pt">
<mml:mi>o</mml:mi>
</mml:mpadded>
</mml:mrow>
<mml:mo rspace="5.8pt">+</mml:mo>
<mml:mrow>
<mml:mstyle displaystyle="true">
<mml:munderover>
<mml:mo largeop="true" movablelimits="false" symmetric="true">&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi mathvariant="normal">&#x221E;</mml:mi>
</mml:munderover>
</mml:mstyle>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>c</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>o</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>s</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mstyle displaystyle="true">
<mml:mfrac>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>&#x03C0;</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mi>s</mml:mi>
</mml:mfrac>
</mml:mstyle>
</mml:mrow>
<mml:mo>+</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>s</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mstyle displaystyle="true">
<mml:mfrac>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>&#x03C0;</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mi>s</mml:mi>
</mml:mfrac>
</mml:mstyle>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:math>
</disp-formula>
</sec>
</sec>
<sec id="S2">
<title>Proposed system</title>
<p>In this model, we (GUI) take the symptoms from the user and predict the disease he is suffering from. The interface responds immediately in a fraction of a time with accurate accuracy.</p>
<list list-type="simple">
<list-item>
<label>&#x2022;</label>
<p>The user has to fill in the details like his name.</p>
</list-item>
<list-item>
<label>&#x2022;</label>
<p>The user has to enter the symptoms of suffering, at least 2 symptoms.</p>
</list-item>
<list-item>
<label>&#x2022;</label>
<p>The system will store the data like his name and the disease he is suffering from so that treating him the next time will be easy and fast to cure him.</p>
</list-item>
</list>
</sec>
<sec id="S3">
<title>Methodology</title>
<p>A methodology is a representation of a system&#x2019;s structure, behavior, and other features. A system architecture is made up of system components and subsystems that interact to form the total. Individuals use an architecture diagram to abstract the overall structure of a software system and define constraints, linkages, and boundaries between components. The methodology of the work is shown in <xref ref-type="fig" rid="F1">Figure 1</xref>.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption><p>Work flow.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="bijcs-2022-10-g001.tif"/>
</fig>
<p>Python has many applications. Some of them are the following:</p>
<list list-type="simple">
<list-item>
<label>&#x2022;</label>
<p>Web Development</p>
</list-item>
<list-item><p>Many web development projects use Python because Python has introduced a lot of frameworks that make work easier, simpler, and more attractive.</p>
</list-item>
<list-item>
<label>&#x2022;</label>
<p>Data Science</p>
</list-item>
<list-item><p>Data science itself involves so many stages like data mining, data sorting, data processing, etc. So, Python provides inbuilt functions that make work easier and simpler to work with.</p>
</list-item>
</list>
</sec>
<sec id="S4" sec-type="results">
<title>Results</title>
<p>This dataset was acquired from a Kaggle reference. Here, in the dataset, we have 5,000 rows of data that help in training models very efficiently shown in <xref ref-type="fig" rid="F2">Figure 2</xref>.</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption><p>Training set.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="bijcs-2022-10-g002.tif"/>
</fig>
<p>The testing data has nearly 45 rows that help in calculating accuracy shown in <xref ref-type="fig" rid="F3">Figure 3</xref>.</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption><p>Test set.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="bijcs-2022-10-g003.tif"/>
</fig>
<p><xref ref-type="fig" rid="F4">Figure 4</xref> shows the interface of the disease prediction scenario, and <xref ref-type="fig" rid="F5">Figure 5</xref> shows the final result achieved after providing symptoms in the interface.</p>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption><p>Interface for prediction of disease.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="bijcs-2022-10-g004.tif"/>
</fig>
<fig id="F5" position="float">
<label>FIGURE 5</label>
<caption><p>Final result.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="bijcs-2022-10-g005.tif"/>
</fig>
<p>The work is being done with four machine learning algorithms, i.e., decision trees, random forest, KNN, and Naive Bayes. The best result was achieved with a random forest algorithm. The comparison of all classifiers is shown in <xref ref-type="fig" rid="F6">Figure 6</xref>.</p>
<fig id="F6" position="float">
<label>FIGURE 6</label>
<caption><p>Comparison with different machine learning models.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="bijcs-2022-10-g006.tif"/>
</fig>
</sec>
<sec id="S5" sec-type="conclusion">
<title>Conclusion</title>
<p>After completing the work, we can conclude that the random forest predicts the disease with high accuracy, and after the random forest, it is the decision tree that gives one of the best accuracies. We have created a system that can decrease the rush at hospitals and medical areas, and it also helps in reducing the workload on the medical staff. As a result, our system benefits both patients and the medical field. By building such types of systems, we can save time and money spent by the patients to undergo tests or scanning to know what they are suffering from. On average, our system achieved an accuracy of 93% in editing diseases with the symptoms given by the user with the random forest algorithm. In creating this system, we also added a way to store the data entered by the user in the database, which can be used in the future to help in creating a better version of such a system.</p>
</sec>
<sec id="S6" sec-type="author-contributions">
<title>Author contributions</title>
<p>All authors listed have made a substantial, direct, and intellectual contribution to the work, and approved it for publication.</p>
</sec>
</body>
<back>
<ref-list>
<title>References</title>
<ref id="B1"><label>1.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mohapatra</surname> <given-names>H</given-names></name></person-group>. <article-title>HCR (English) using neural network.</article-title> <source><italic>Int J Adv Res Innov Ideas Educ.</italic></source> (<year>2015</year>) <volume>1</volume>:<fpage>379385</fpage>.</citation></ref>
<ref id="B2"><label>2.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mohapatra</surname> <given-names>H</given-names></name> <name><surname>Rath</surname> <given-names>AK</given-names></name></person-group>. <article-title>Detection and avoidance of water loss through municipality taps in India by using smart taps and ICT.</article-title> <source><italic>IET Wireless Sens Syst.</italic></source> (<year>2019</year>) <volume>9</volume>:<fpage>447</fpage>&#x2013;<lpage>57</lpage>.</citation></ref>
<ref id="B3"><label>3.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mohapatra</surname> <given-names>H</given-names></name> <name><surname>Rath</surname> <given-names>AK</given-names></name></person-group>. <article-title>Fault tolerance in WSN through PE-LEACH protocol.</article-title> <source><italic>IET Wireless Sens Syst.</italic></source> (<year>2019</year>) <volume>9</volume>:<fpage>358</fpage>&#x2013;<lpage>65</lpage>.</citation></ref>
<ref id="B4"><label>4.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mohapatra</surname> <given-names>H</given-names></name> <name><surname>Debnath</surname> <given-names>S</given-names></name> <name><surname>Rath</surname> <given-names>AK</given-names></name></person-group>. <article-title>Energy management in wireless sensor network through EB-LEACH (No. 1192).</article-title> <source><italic>Easy Chair.</italic></source> (2019).</citation></ref>
<ref id="B5"><label>5.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nirgude</surname> <given-names>V</given-names></name> <name><surname>Mahapatra</surname> <given-names>H</given-names></name> <name><surname>Shivarkar</surname> <given-names>S</given-names></name></person-group>. <article-title>Face recognition system using principal component analysis &#x0026; linear discriminant analysis method simultaneously with 3d morphable model and neural network BPNN method.</article-title> <source><italic>Glob J Adv Eng Technol Sci.</italic></source> (<year>2017</year>) <volume>4</volume>:<fpage>1</fpage>&#x2013;<lpage>6</lpage>.</citation></ref>
</ref-list>
</back>
</article>
