<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Archiving and Interchange DTD v2.3 20070202//EN" "archivearticle.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="methods-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Bohr. Cs.</journal-id>
<journal-title>BOHR International Journal of Computer Science</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Bohr. Cs.</abbrev-journal-title>
<issn pub-type="epub">2583-455X</issn>
<publisher>
<publisher-name>BOHR</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.54646/bijcs.2023.15</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Methods</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Associating fundamental features with technical indicators for analyzing quarterly stock market trends using machine learning algorithms</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Moore</surname> <given-names>Nicholas</given-names></name>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Bagui</surname> <given-names>Sikha</given-names></name>
<xref ref-type="corresp" rid="c001"><sup>&#x002A;</sup></xref>
</contrib>
</contrib-group>
<aff><institution>Department of Computer Science, University of West Florida</institution>, <addr-line>Pensacola, FL</addr-line>, <country>United States</country></aff>
<author-notes>
<corresp id="c001">&#x002A;Correspondence: Sikha Bagui, <email>bagui@uwf.edu</email></corresp>
</author-notes>
<pub-date pub-type="epub">
<day>27</day>
<month>08</month>
<year>2023</year>
</pub-date>
<volume>2</volume>
<issue>1</issue>
<fpage>46</fpage>
<lpage>61</lpage>
<history>
<date date-type="received">
<day>04</day>
<month>08</month>
<year>2023</year>
</date>
<date date-type="accepted">
<day>19</day>
<month>08</month>
<year>2023</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x00A9; 2023 Moore and Bagui.</copyright-statement>
<copyright-year>2023</copyright-year>
<copyright-holder>Moore and Bagui</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by-nc-nd/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract>
<p>The stock market is the primary entity driving every major economy across the globe, with each investment designed to capitalize on profit while decreasing its associated risks. As a result of the stock market&#x2019;s importance, there have been enumerable studies conducted with the goal of predicting the stock market through data analysis techniques including machine learning, neural networks, and time series analysis. This paper uses machine learning algorithms to perform stock market index classification using fundamental data while classifying the indices using technical indicators. The data were derived from Yahoo Finance on the top 100 indices in the NASDAQ stock market from January 2000 to December 2020.</p>
</abstract>
<kwd-group>
<kwd>stock market</kwd>
<kwd>machine learning</kwd>
<kwd>technical indicators</kwd>
<kwd>fundamental analysis</kwd>
<kwd>NASDAQ</kwd>
</kwd-group>
<counts>
<fig-count count="18"/>
<table-count count="12"/>
<equation-count count="0"/>
<ref-count count="22"/>
<page-count count="16"/>
<word-count count="7064"/>
</counts>
</article-meta>
</front>
<body>
<sec id="S1">
<title>1. Introduction and related works</title>
<p>The health of every economy in the world, both major and growing, hinges on their market&#x2019;s stock prices, and predicting these stock prices is a growing area of interest for world governments, professional investors, and private citizens. Despite efforts to develop new techniques and strategies toward this goal, market volatility along with the non-linear high heteroscedasticity of market data present a model that is problematic to forecast (<xref ref-type="bibr" rid="B1">1</xref>). There are three main approaches to analyzing the stock market: technical, fundamental, and sentimental. Technical analysis attempts to determine future price change patterns using technical indicators, and these indicators include the opening price (open), daily highest price (high), daily lowest price (low), closing price (close), adjusted closing price (adjusted close), and the total volume (volume). Technical indicators are detailed in daily stock market reports and represent data efficiently for time series analysis (<xref ref-type="bibr" rid="B2">2</xref>).</p>
<p>Fundamental analysis uses the economic standing of a firm&#x2019;s yearly or quarterly reports to predict future stock value Nti et al., (<xref ref-type="bibr" rid="B3">3</xref>). Fundamental analysis is the focus of this paper. Fundamental company reports vary depending on the nature of the business. Examples of fundamental features include total revenue, gross profit, total assets, total debt, operating cash flow, and capital expenditure (<xref ref-type="bibr" rid="B3">3</xref>).</p>
<p>The sentimental analysis relates to the public&#x2019;s general feeling or attitude toward specific stocks as it relates to its success or failure within a given market (<xref ref-type="bibr" rid="B4">4</xref>). The goal of each of these methods is to try and predict market trends, giving investors the information necessary to productively place their money in a place where it will increase their overall investment (<xref ref-type="bibr" rid="B5">5</xref>).</p>
<p>In a survey of the types of analysis performed on over 300 samples, 66% of papers focused on technical analysis, 23% on fundamental analysis, and 11% based on some combination of the two or some form of sentimental analysis (<xref ref-type="bibr" rid="B2">2</xref>). Given the vast amount of research available, this paper will serve as a foundation for applying machine learning techniques to fundamental data analysis.</p>
<p>The uniqueness of this paper is in the focus of the combination of fundamental data classified based on the high and low technical indicators into three distinct classes: buy, sell, or hold. Along with the classification system, a broad array of algorithms is applied in their most basic form, with the intended purpose of providing a benchmark performance of each classifier. The purpose of this is to gain useful insights into how these algorithms could be modified and expanded on for future use. This paper presents the results of collecting fundamental data on the top one hundred stocks in the NASDAQ stock market and applying eight different machine learning algorithms to predict whether a stock should be bought, sold, or held in any given quarter over the past 20 years from 2000 to 2020.</p>
<p>Most research focusing on technical analysis deals with, at its smallest, minute-to-minute prediction models Lamouirie and Achchab, (<xref ref-type="bibr" rid="B6">6</xref>), and at its largest, day-to-day (<xref ref-type="bibr" rid="B5">5</xref>). While useful, we intended to explore longer-term investment options that would be more useful to private citizens and long-term investors who wish to avoid the risk associated with day trading.</p>
<p>Given a large amount of research done on technical analysis and the generally positive results gained from that research, we began by drawing inspiration from Wang et al. (<xref ref-type="bibr" rid="B7">7</xref>), who attempted to train deep learning networks to analyze the Singapore Stock Exchange, straying from conventional trend studies to have their algorithms produce trading decisions directly. Their algorithms provided a buy, sell, or hold decision on a stock based on indicators gathered from a random forest algorithm. In 2017, Thakur et al. (<xref ref-type="bibr" rid="B8">8</xref>) repeated this method, expanding on using random forest algorithms to determine the rules used to classify each index as a buy/sell/hold index.</p>
<p>The purpose of this research is to allow non-investors a platform to study and enter the market, streamlining the results directly into a decision stating, that is, if a stock index should be bought, sold, or held. Discretizing the large number of fundamental features into a smaller number is a secondary focus of this study.</p>
<p>Hence, this study focuses on using fundamental values to produce decisions based on those same technical indicators. By associating the fundamental features with a decision based on the technical indicators, we have combined two methods of study, namely, technical and fundamental. We will study the fundamentals to predict classes based on the technical.</p>
<p>Given those articles and their influence on the work performed, it is prudent to note how this work will differ from these works. While many of the studies mentioned used machine learning algorithms (<xref ref-type="bibr" rid="B9">9</xref>&#x2013;<xref ref-type="bibr" rid="B11">11</xref>), none used them on fundamental data to predict long-term results, which for the purpose of this paper is defined as results in increments of greater than 30 days. This project attempts to forecast the decision in 90-day increments four times a year, over a 20 year period, allowing personal non-day trading investors to use this information to invest responsibly and reliably in a volatile market environment.</p>
<p>By collecting quarterly data from 100 different stocks over a 20-year period, it is the work&#x2019;s commitment to relate fundamental data to predicted classification on the rise and fall of technical indicators and then produce a decision for the user to buy, sell, or hold a stock. The report will also explore which fundamental features gathered from the quarterly report correlate the most with the decision classification, exploring two different methods of correlation and then producing two sets of features to be utilized in each of the machine learning algorithms. This project also serves the purpose of forming a benchmark foundation to continue studies in this field.</p>
<p>The remainder of this paper is organized as follows: Section &#x201C;2. Data and Preprocessing&#x201D; provides insight on the datasets utilized and the pre-processing performed to establish the final dataframe; Section &#x201C;3. Algorithms&#x201D; gives a high- level summary of the algorithms studied along with the parameters used in the experimentation; Section &#x201C;4. Results&#x201D; summarizes each algorithm&#x2019;s best parameters along with their results; and finally, Section &#x201C;5. Conclusion and Future Works&#x201D; points out the conclusions and posits future work to consider.</p>
</sec>
<sec id="S2">
<title>2. Data and preprocessing</title>
<p>The data for this project consists of all the data on the companies in the NASDAQ-100 stock market from January 1, 1999, to January 1, 2020, located in the Yahoo Finances database. The fundamental data are a collection of three separate reports pulled from the database. These dataframes (more technical term for files) and their feature counts were as follows: quarterly balance sheet (92), quarterly cash flow (72), and quarterly financials (52). A fourth dataframe on the historical daily values of each stock (technical indicators) was also pulled: historical prices indices (7). The combined total original feature count was 223.</p>
<p>The data are manually collected from Yahoo Finance. To prepare the data, a series of pre-processing steps were taken. First, extraneous features were removed from the dataframes. Many features were not reported continuously across all dataframes from each stock. Also, Yahoo Finance organizes features into individual sections and subsections, allowing for the generalizations of several features. Many of the subsections contained no values. Once the original dataframes were feature filtered, they were combined into a single quarterly report with the data associated around the dates that correlate with the end of the quarters of the fiscal calendars across all 20 years of data. This left us with a combined dataframe of the fundamental and technical values. The pre-processed dataframe consisted of 62 features of data and 8,498 indices of reported stock figures.</p>
<p>Next, the data were categorized. Each quarterly report was categorized into one of three possibilities: buy, sell, or hold if neither buy nor sell. The exact class necessities were as follows:</p>
<list list-type="simple">
<list-item>
<label>(1)</label>
<p>Sell &#x2013; High and low decrease by 5% or more in the next quarter.</p>
</list-item>
<list-item>
<label>(2)</label>
<p>Buy &#x2013; High and low increase by 5% or more in the next quarter.</p>
</list-item>
<list-item>
<label>(3)</label>
<p>Hold &#x2013; neither buy nor sell happens.</p>
</list-item>
</list>
<p>Each index represents the quarterly reports and the high and low values associated with those quarters. These indices were classified based on the stated rules. Once the classifications were added to each company&#x2019;s quarterly reports, the rest of the data could be transitioned as follows. This methodology is demonstrated in <xref ref-type="fig" rid="F1">Figure 1</xref>.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption><p>Buy, sell, or hold classification process.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="bijcs-2023-15-g001.tif"/>
</fig>
<p>A categorization was decided upon. The values in the reports were altered to measure the percent change from the previous QR to the current QR and new classifications can be added. A feature was added to the dataframe for each existing feature. This new feature would measure the change in each feature from one quarter to the next. For example, say that in the previous quarter, a company was valued at &#x0024;1,000, and in the next quarter, it was valued at &#x0024;1,100. This represents an increase of 10%. The new feature replaced the price value of 1,100 with 10% for our current quarterly report. This process was repeated for every quarterly report, excluding the first, as no previous data existed with which to modify the data. This left every quarterly report with a percent change for each fundamental value.</p>
<p>In summary:</p>
<list list-type="simple">
<list-item>
<label>&#x2022;</label>
<p>All the data files were collected into a single dataframe for the purposes of preprocessing and exploring the data.</p>
</list-item>
<list-item>
<label>&#x2022;</label>
<p>To train and test the classifier, the price-related features were separated into two different dataframes with a similar index value. This was done to prevent data leakage during the training and testing of the model.</p>
</list-item>
<list-item>
<label>&#x2022;</label>
<p>The date indices were replaced with a simple number of indices as dates were no longer needed.</p>
</list-item>
</list>
<sec id="S2.SS1">
<title>2.1. Feature select in using correlation values and tree classifiers</title>
<p>The fundamental data were collected, preprocessed, and reformatted. The final files were exported. The data exploration consisted of two separate but similar steps. Given the large number of features (59, after removing the name, symbol, and date columns), we wanted to condense the features into a more discreet number. The original features are presented in <xref ref-type="table" rid="T13">Appendix Table A1</xref>.</p>
<p>To perform this feature selection, we used two methodologies: correlation values and tree classifiers. For the first method, a correlation matrix was created, and the top 10 features were correlated with our decision classifications. Once the top 10 correlation features were located, the preprocessed dataframe was spliced to only include those features along with the decision classifications. The dataframe was exported for use in our models. Next, using SciKit, a decision tree was implemented to determine this set of top ten features to study. Once identified, they were also spliced out of the preprocessed dataframe and moved into a new dataframe along with the correlating classification labels. <xref ref-type="table" rid="T1">Table 1</xref> shows the features selected by both methods of feature discretization and used for the remainder of this experiment.</p>
<table-wrap position="float" id="T1">
<label>TABLE 1</label>
<caption><p>Top 10 features selected from correlation and decision tree methodologies.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left">Correlation Method</td>
<td valign="top" align="left">Decision Tree Method</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Basic Average Shares</td>
<td valign="top" align="left">Capital Expenditure</td>
</tr>
<tr>
<td valign="top" align="left">Diluted average shares</td>
<td valign="top" align="left">Total assets</td>
</tr>
<tr>
<td valign="top" align="left">Tax effect of unusual Items</td>
<td valign="top" align="left">End cash position</td>
</tr>
<tr>
<td valign="top" align="left">Other income expenses</td>
<td valign="top" align="left">Ordinary shares number</td>
</tr>
<tr>
<td valign="top" align="left">Total liabilities net minority<break/> Interest</td>
<td valign="top" align="left">Total liabilities net minority<break/> interest</td>
</tr>
<tr>
<td valign="top" align="left">Total unusual items</td>
<td valign="top" align="left">Total expenses</td>
</tr>
<tr>
<td valign="top" align="left">Excluding goodwill</td>
<td valign="top" align="left"/></tr>
<tr>
<td valign="top" align="left">Total unusual items<break/> working capital</td>
<td valign="top" align="left">Reconciled depreciation<break/> gross profit</td>
</tr>
<tr>
<td valign="top" align="left">Total revenue<break/> operating expenses</td>
<td valign="top" align="left">Cost of revenue operating expenses</td>
</tr>
</tbody>
</table></table-wrap>
<p>Using two different methods to determine which features to use will allow us to compare how the feature selection affects the accuracy of the models.</p>
<p>Now that the data were cleaned, formatted, and discretized features were selected, we finally began classifying the stocks using their quarterly reports. To do this, multiple different machine learning classifiers were used. The data were tested using eight different classifiers along with a dummy model to provide us with a benchmark to compare our results against.</p>
<list list-type="simple">
<list-item>
<label>&#x2022;</label>
<p>Ada Boost</p>
</list-item>
<list-item>
<label>&#x2022;</label>
<p>Decision Tree</p>
</list-item>
<list-item>
<label>&#x2022;</label>
<p>Extreme Gradient Boost</p>
</list-item>
<list-item>
<label>&#x2022;</label>
<p>K Nearest Neighbor</p>
</list-item>
<list-item>
<label>&#x2022;</label>
<p>Logistic Regression</p>
</list-item>
<list-item>
<label>&#x2022;</label>
<p>Naive Bayes</p>
</list-item>
<list-item>
<label>&#x2022;</label>
<p>Random Forest</p>
</list-item>
<list-item>
<label>&#x2022;</label>
<p>Support Vector Machine Model</p>
</list-item>
<list-item>
<label>&#x2022;</label>
<p>Dummy (Benchmark)</p>
</list-item>
</list>
<p>Each of these models were trained and fitted to the dataset to determine the best performing model. Along with running the default algorithms, we will also perform a grid search on several different parameters to try and identify the best results for each algorithm. <xref ref-type="fig" rid="F2">Figure 2</xref> lays out this entire process.</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption><p>Data processing and analysis framework.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="bijcs-2023-15-g002.tif"/>
</fig>
<p>Results are presented in terms of four different statistical metrics: accuracy, precision, recall, and F1-score. Our pre-processing methodology produced a slightly imbalanced dataset, hence this study places more importance on precision since it deals with the amount of false positives. When studying stocks, we have chosen to be conservative with our investing policy. Focusing on precision allows us to avoid investing in the wrong stock over recall, which would focus our concern on missing the opportunity. Each of the values has its importance, but because we do not want to invest in a stock that is a sell, we will focus on precision. The confusion matrix for each of the algorithms is also included to picture how each of our models predicted vs. the actual breakdown of how each index was classified. The complete results and diagrams for each set of features can be found in <xref ref-type="table" rid="T10">Tables 10</xref>, <xref ref-type="table" rid="T11">11</xref> and <xref ref-type="fig" rid="F3">Figures 3</xref>&#x2013;<xref ref-type="fig" rid="F18">18</xref>.</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption><p>Ada boost confusion matrix.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="bijcs-2023-15-g003.tif"/>
</fig>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption><p>Decision tree confusion matrix.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="bijcs-2023-15-g004.tif"/>
</fig>
<fig id="F5" position="float">
<label>FIGURE 5</label>
<caption><p>Extreme gradient boost confusion matrix.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="bijcs-2023-15-g005.tif"/>
</fig>
<fig id="F6" position="float">
<label>FIGURE 6</label>
<caption><p>K nearest neighbor confusion matrix.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="bijcs-2023-15-g006.tif"/>
</fig>
<fig id="F7" position="float">
<label>FIGURE 7</label>
<caption><p>Logistic regression confusion matrix.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="bijcs-2023-15-g007.tif"/>
</fig>
<fig id="F8" position="float">
<label>FIGURE 8</label>
<caption><p>Naive Bayes confusion matrix.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="bijcs-2023-15-g008.tif"/>
</fig>
<fig id="F9" position="float">
<label>FIGURE 9</label>
<caption><p>Random forest confusion matrix.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="bijcs-2023-15-g009.tif"/>
</fig>
<fig id="F10" position="float">
<label>FIGURE 10</label>
<caption><p>Support vector machine confusion matrix.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="bijcs-2023-15-g010.tif"/>
</fig>
<fig id="F11" position="float">
<label>FIGURE 11</label>
<caption><p>AdaBoost confusion matrix.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="bijcs-2023-15-g011.tif"/>
</fig>
<fig id="F12" position="float">
<label>FIGURE 12</label>
<caption><p>Decision tree confusion matrix.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="bijcs-2023-15-g012.tif"/>
</fig>
<fig id="F13" position="float">
<label>FIGURE 13</label>
<caption><p>Extreme gradient boost confusion matrix.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="bijcs-2023-15-g013.tif"/>
</fig>
<fig id="F14" position="float">
<label>FIGURE 14</label>
<caption><p>K nearest neighbor confusion matrix.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="bijcs-2023-15-g014.tif"/>
</fig>
<fig id="F15" position="float">
<label>FIGURE 15</label>
<caption><p>Logistic regression confusion matrix.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="bijcs-2023-15-g015.tif"/>
</fig>
<fig id="F16" position="float">
<label>FIGURE 16</label>
<caption><p>Naive Bayes confusion matrix</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="bijcs-2023-15-g016.tif"/>
</fig>
<fig id="F17" position="float">
<label>FIGURE 17</label>
<caption><p>Random forest confusion matrix.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="bijcs-2023-15-g017.tif"/>
</fig>
<fig id="F18" position="float">
<label>FIGURE 18</label>
<caption><p>Support vector machine confusion matrix.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="bijcs-2023-15-g018.tif"/>
</fig>
</sec>
</sec>
<sec id="S3">
<title>3. Algorithms</title>
<p><xref ref-type="table" rid="T2">Tables 2</xref>&#x2013;<xref ref-type="table" rid="T9">9</xref> give a short overview of the algorithms used in this study, their advantages and disadvantages, as well as any parameter sets for each model. A total of eight algorithms were chosen based on their use in previous studies on stock market data.</p>
<table-wrap position="float" id="T2">
<label>TABLE 2</label>
<caption><p>Ada boost algorithm synopsis.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<tbody>
<tr>
<td valign="top" align="left" colspan="2"><bold><italic>Ada Boost</italic></bold> - an estimator that initially fits on the original dataframe and then fits again on the same dataframe but in areas</td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">where the weights are incorrectly assigned those instances are re classified and more difficult instances become the focus.</td>
</tr>
<tr>
<td valign="top" align="left" colspan="2"><bold><italic>Parameters</italic></bold></td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">&#x2013; n_estimators - The maximum number of estimators at which boosting is terminated. In case of a perfect fit, the learning</td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">procedure is stopped early.</td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">&#x2003;&#x25CB; 50, 100, 200, 500</td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">&#x2013; learning_rate - Weight applied to each classifier at each boosting iteration. A higher learning rate increases the</td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">contribution of each classifier. There is a trade-off between the learning rate and estimator parameters.</td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">&#x2003;&#x25CB; 1, 0.1, 0.01</td>
</tr>
<tr>
<td valign="top" align="left"><bold><italic>Advantages</italic></bold></td>
<td valign="top" align="left"><bold><italic>Disadvantages</italic></bold></td>
</tr>
<tr>
<td valign="top" align="left">&#x2013; Less prone to overfitting data</td>
<td valign="top" align="left">&#x2013; Requires quality dataset void of noisy and outlier data.</td>
</tr>
<tr>
<td valign="top" align="left">&#x2013; Input parameters are not jointly optimized.</td>
<td valign="top" align="left">&#x2013; Statistically slower compared to other algorithms</td>
</tr>
</tbody>
</table></table-wrap>
<table-wrap position="float" id="T3">
<label>TABLE 3</label>
<caption><p>Decision tree algorithm synopsis.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<tbody>
<tr>
<td valign="top" align="left" colspan="2"><bold><italic>Decision Tree</italic></bold> - Uses a tree data structure to predict the results of a particular classification. Highly useful classification model.</td>
</tr>
<tr>
<td valign="top" align="center" colspan="2"><hr/></td>
</tr>
<tr>
<td valign="top" align="left" colspan="2"><bold><italic>Parameters</italic></bold></td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">&#x2013; <italic>criterion -</italic> defines the function used to measure the quality of a split.</td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">&#x2003;&#x25CB; &#x2018;gini&#x2019; and &#x2018;entropy&#x2019;</td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">&#x2013; <italic>max_depth</italic> - defines the max depth of the tree. If &#x2018;none&#x2019; nodes are expanded pure</td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">&#x2003;&#x25CB; None, 2, 3, 4, 5, 6</td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">&#x2013; <italic>min_samples_split</italic> - defines the min number of samples required to split a node</td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">&#x2003;&#x25CB; 2, 5, 10</td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">&#x2013; <italic>min_samples_leaf</italic> - defies the min number of samples required at a leaf node to split it.</td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">&#x2003;&#x25CB; 1,2,3,4,5,6</td>
</tr>
<tr>
<td valign="top" align="left"><bold><italic>Advantages</italic></bold></td>
<td valign="top" align="left"><bold><italic>Disadvantages</italic></bold></td>
</tr>
<tr>
<td valign="top" align="left">&#x2013; Easy to understand and implement</td>
<td valign="top" align="left">&#x2003;&#x2013; Multiclassification problems increase error rates</td>
</tr>
<tr>
<td valign="top" align="left">&#x2013; Insensitive to missing values</td>
<td valign="top" align="left">&#x2003;&#x2013; Underperforms when multiple features are highly</td>
</tr>
<tr>
<td valign="top" align="left">&#x2013; Uncorrelated features can be processed with positive correlated.</td>
<td valign="top" align="left"></td>
</tr>
<tr>
<td valign="top" align="left">results.</td>
<td valign="top" align="left"></td>
</tr>
</tbody>
</table></table-wrap>
<table-wrap position="float" id="T4">
<label>TABLE 4</label>
<caption><p>Gradient boost algorithm synopsis.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<tbody>
<tr>
<td valign="top" align="left" colspan="2"><bold><italic>Extreme Gradient Boost -</italic></bold> In each stage, n class regression trees are fit to the negative gradient of a multinomial deviance loss function which allows for the enhancement of arbitrary differentiable loss functions. Essentially each model is trained on the failures of the previous model.</td>
</tr>
<tr>
<td valign="top" align="left" colspan="2"><bold><italic>Parameters</italic></bold></td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">&#x2013; Booster &#x2013; which booster to use.</td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">&#x2003;&#x25CB; &#x201C;gbtree&#x201D;, &#x201C;gblinear&#x201D;, &#x201C;dart&#x201D;</td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">&#x2013; Eta &#x2013; step size shrink value used to prevent overfitting.</td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">&#x2003;&#x25CB; 0.1,0.5,0.9</td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">&#x2013; Gamma &#x2013; Minimum loss reduction required to make a further partition on a leaf node of the tree. The larger the gamma is, the more conservative the algorithm will be.</td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">&#x2003;&#x25CB; 0, 1, 3</td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">&#x2013; n_estimators &#x2013; number of trees in the forest</td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">&#x2003;&#x25CB; 50, 100, 200</td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">&#x2013; max_depth &#x2013; The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure</td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">&#x2003;&#x25CB; 1, 3, 6</td>
</tr>
<tr>
<td valign="top" align="left"><bold><italic>Advantages</italic></bold></td>
<td valign="top" align="left"><bold><italic>Disadvantages</italic></bold></td>
</tr>
<tr>
<td valign="top" align="left">&#x2013; Efficient classification model</td>
<td valign="top" align="left">&#x2013; Sensitive to outliers due to the carry through of errors in previous iterations</td>
</tr>
<tr>
<td valign="top" align="left">&#x2013; Historically more accurate then random forest</td>
<td valign="top" align="left">&#x2013; Difficult to upscale because of its reliance on previous iterations.</td>
</tr>
<tr>
<td valign="top" align="left">&#x2013; Can handle mixed feature types.</td>
<td valign="top" align="left"></td>
</tr>
</tbody>
</table></table-wrap>
<table-wrap position="float" id="T5">
<label>TABLE 5</label>
<caption><p>K nearest neighbor algorithm synopsis.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<tbody>
<tr>
<td valign="top" align="left" colspan="2"><bold><italic><underline>K Nearest Neighbor</underline></italic></bold> &#x2013; A supervised machine learning algorithm that finds the distances between all the examples in the data by selecting K closest examples. Chosen due to the high relation between two close data points in our data set.</td>
</tr>
<tr>
<td valign="top" align="left" colspan="2"><bold><italic><underline>Parameters</underline></italic></bold></td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">n_neighbors &#x2013; number of neighbors to use.</td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">&#x2003;&#x25CB; 50, 100, 200</td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">weights &#x2013; the weight function used in prediction.</td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">&#x2003;&#x25CB; &#x2018;uniform&#x2019; &#x2013; all points in each neighborhood are equally weighted.</td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">&#x2003;&#x25CB; &#x2018;distance&#x2019; &#x2013; closer neighbors on a query point will have more influence.</td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">p &#x2013; the parameter for the Minkowski metric</td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">&#x2003;&#x25CB; 1 &#x2013; equivalent to Manhattan distance</td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">&#x2003;&#x25CB; 2 &#x2013; uses the Euclidean metric</td>
</tr>
<tr>
<td valign="top" align="left"><bold><italic>Advantages</italic></bold></td>
<td valign="top" align="left"><bold><italic>Disadvantages</italic></bold></td>
</tr>
<tr>
<td valign="top" align="left">&#x2013; Versatile algorithm that can be used for classification, regression, and search</td>
<td valign="top" align="left">&#x2013; Speed in directly related to the size of the data making this classifier hard to size up.</td>
</tr>
</tbody>
</table></table-wrap>
<table-wrap position="float" id="T6">
<label>TABLE 6</label>
<caption><p>Logistic regression algorithm synopsis.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<tbody>
<tr>
<td valign="top" align="left" colspan="2"><bold><italic>Logistic Regression</italic></bold></td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">&#x2013; Used to assign observations to a discrete set of classes using a predictive analysis algorithm based on probability calculated using a sigmoid cost function.</td>
</tr>
<tr>
<td valign="top" align="left" colspan="2"><bold><italic>Parameters</italic></bold></td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">&#x2013; penalty &#x2013; specify the norm of the penalty</td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">&#x2003;&#x25CB; l1 - add a l1 penalty term</td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">&#x2003;&#x25CB; l2 - add an l2 penalty term</td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">&#x2013; fit_intercept &#x2013; specifies if a constant should be added to the decision function</td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">&#x2003;&#x25CB; True, False</td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">&#x2013; intercept_scaling &#x2013; used when using liblinear parameter and True self.fit interceptor it lessens the effect of regular synthetic weights.</td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">&#x2003;&#x25CB; 1, 10, 50</td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">&#x2013; Solver &#x2013; chose the algorithm used in the optimization problem</td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">&#x2003;&#x25CB; &#x2018;liblinear&#x2019; &#x2013; one vs rest schema</td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">&#x2003;&#x25CB; &#x2018;saga&#x2019; &#x2013; used for larger dataframes to handle multinomial loss</td>
</tr>
<tr>
<td valign="top" align="left"><bold>Advantages</bold></td>
<td valign="top" align="left"><bold><italic>Disadvantages</italic></bold></td>
</tr>
<tr>
<td valign="top" align="left">&#x2013; Performs well with continuous or categorical data.</td>
<td valign="top" align="left">&#x2013; Data intensive</td>
</tr>
<tr>
<td valign="top" align="left">&#x2013; Easy to use a interpret the results</td>
<td valign="top" align="left">&#x2013; Sensitive to multi-collinearity</td>
</tr>
<tr>
<td valign="top" align="left">&#x2013; Feature scaling not needed</td>
<td valign="top" align="left">&#x2013; Performs poorly with non-linear data</td>
</tr>
<tr>
<td valign="top" align="left"></td>
<td valign="top" align="left">&#x2013; Prone to overfitting the data</td>
</tr>
</tbody>
</table></table-wrap>
<table-wrap position="float" id="T7">
<label>TABLE 7</label>
<caption><p>Naive Bayes algorithm synopsis.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<tbody>
<tr>
<td valign="top" align="left" colspan="2"><bold><italic>Naive Bayes</italic></bold> &#x2013; A supervised learning algorithm used for classification by features assuming each feature is independent of each other with no correlation.</td>
</tr>
<tr>
<td valign="top" align="left" colspan="2"><bold><italic>Parameters</italic></bold></td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">&#x2013; var_smoothing &#x2013; Portion of the largest variance of all features that is added to variances for calculation stability.</td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">&#x2003;&#x25CB; 1.5&#x002A;&#x002A;-i for i in range (&#x2013;20, 20, 2)</td>
</tr>
<tr>
<td valign="top" align="left"><bold><italic>Advantages</italic></bold></td>
<td valign="top" align="left"><bold><italic>Disadvantages</italic></bold></td>
</tr>
<tr>
<td valign="top" align="left">&#x2013; Fast paced algorithm that can be used in real time</td>
<td valign="top" align="left">&#x2013; Assumes each feature make an equal contribution,</td>
</tr>
<tr>
<td valign="top" align="left">&#x2013; Scalable to larger datasets weighs each feature equally.</td>
<td valign="top" align="left">&#x2013; Requires each classification to be well represented.</td>
</tr>
<tr>
<td valign="top" align="left">&#x2013; Good performance with high dimensional data</td>
<td valign="top" align="left"></td>
</tr>
</tbody>
</table></table-wrap>
<table-wrap position="float" id="T8">
<label>TABLE 8</label>
<caption><p>Random forest table synopsis.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<tbody>
<tr>
<td valign="top" align="left" colspan="2"><bold><italic>Random Forest -</italic></bold> Using many individual decision trees, each of which returns a class prediction and the class with the most returns becomes the model&#x2019;s prediction.</td>
</tr>
<tr>
<td valign="top" align="left" colspan="2"><bold><italic>Parameters</italic></bold></td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">&#x2013; n_estimators &#x2013; number of trees in the forest</td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">&#x2003;&#x25CB; 10,50,100,200</td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">&#x2013; criterion &#x2013; the function to measure the quality of a split. Supported criteria are &#x201C;gini&#x201D; for the Gini impurity and &#x201C;entropy&#x201D;</td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">for the information gain.</td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">&#x2003;&#x25CB; &#x2018;gini&#x2019; and &#x2018;entropy&#x2019;</td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">&#x2013; max_depth &#x2013; The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure</td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">&#x2003;&#x25CB; None, 2, 5, 10</td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">&#x2013; min_samples_split &#x2013; The minimum number of samples required to split an internal node</td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">&#x2003;&#x25CB; 5, 10</td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">&#x2013; min_samples_leaf &#x2013; The minimum number of samples required to be at a leaf node to be considered to continue splitting.</td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">&#x2003;&#x25CB; 1, 2, 5</td>
</tr>
<tr>
<td valign="top" align="left"><bold><italic>Advantages</italic></bold></td>
<td valign="top" align="left"><bold><italic>Disadvantages</italic></bold></td>
</tr>
<tr>
<td valign="top" align="left">&#x2013; Works well with unbalanced data.</td>
<td valign="top" align="left">&#x2013; Smaller data frames and low dimension data are prone to in accurate classifications.</td>
</tr>
<tr>
<td valign="top" align="left">&#x2013; Excellent non&#x2013;linear classifier.</td>
</tr>
<tr>
<td valign="top" align="left">&#x2013; Maintains high accuracy when used with data that has missing values.</td>
<td valign="top" align="left">&#x2013; Setting parameters is difficult and sometimes randomized.</td>
</tr>
</tbody>
</table></table-wrap>
<table-wrap position="float" id="T9">
<label>TABLE 9</label>
<caption><p>Support vector machine model algorithm synopsis.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left" colspan="2"><italic>Support Vector Machine Model</italic> - An extension of the maximal margin classifier modified for general use cases especially non-linear features.</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left" colspan="2"><bold><italic>Parameters</italic></bold></td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">- C - Regularization parameter. The strength of the regularization is inversely proportional to C.</td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">&#x2003;&#x25CB; [0.01,0.1,1],</td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">- kernel is used to pre-compute the kernel matrix from data matrices; that matrix should be an array of shape &#x2018;rbf&#x2019;, &#x2018;sigmoid&#x2019;, &#x2018;linear&#x2019;</td>
</tr>
<tr>
<td valign="top" align="left"><bold><italic>Advantages</italic></bold></td>
<td valign="top" align="left"><bold><italic>Disadvantages</italic></bold></td>
</tr>
<tr>
<td valign="top" align="left">&#x2013; Works well in high dimensional spaces where the dimensions is greater than the data frames. </td>
<td valign="top" align="left">&#x2013; Better suited for binary classifications.</td>
</tr>
<tr>
<td valign="top" align="left"></td>
<td valign="top" align="left">&#x2013; Performs slower on larger datasets.</td>
</tr>
<tr>
<td valign="top" align="left">&#x2013; Avoids overfitting the data due to outliers.</td>
<td valign="top" align="left">&#x2013; Selecting the right kernel function is difficult and can be random.</td>
</tr>
</tbody>
</table></table-wrap>
<table-wrap position="float" id="T10">
<label>TABLE 10</label>
<caption><p>Results of top features based on a tree classifier.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="center" colspan="7">Decision Tree Algorithms Results<hr/></td>
</tr>
<tr>
<td valign="top" align="left">Algorithms</td>
<td valign="top" align="center">Class</td>
<td valign="top" align="center">Precision</td>
<td valign="top" align="center">Recall</td>
<td valign="top" align="center">F1-score</td>
<td valign="top" align="center">Accuracy</td>
<td valign="top" align="left">Parameters Used</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">AdaBoost</td>
<td valign="top" align="center">Sell</td>
<td valign="top" align="center">0.28</td>
<td valign="top" align="center">0.10</td>
<td valign="top" align="center">0.15</td>
<td valign="top" align="center">0.43</td>
<td valign="top" align="left">learning_rate = 1</td>
</tr>
<tr>
<td/>
<td valign="top" align="center">Buy</td>
<td valign="top" align="center">0.43</td>
<td valign="top" align="center">0.45</td>
<td valign="top" align="center">0.44</td>
<td/>
<td valign="top" align="left">n_estimators = 500</td>
</tr>
<tr>
<td/>
<td valign="top" align="center">Hold</td>
<td valign="top" align="center">0.45</td>
<td valign="top" align="center">0.56</td>
<td valign="top" align="center">0.50</td>
<td/>
<td valign="top" align="left"/></tr>
<tr>
<td valign="top" align="left">Decision</td>
<td valign="top" align="center">Sell</td>
<td valign="top" align="center">0.21</td>
<td valign="top" align="center">0.23</td>
<td valign="top" align="center">0.22</td>
<td valign="top" align="center">0.38</td>
<td valign="top" align="left">criterion = &#x2018;entropy&#x2019;</td>
</tr>
<tr>
<td valign="top" align="left">Tree</td>
<td valign="top" align="center">Buy</td>
<td valign="top" align="center">0.41</td>
<td valign="top" align="center">0.40</td>
<td valign="top" align="center">0.40</td>
<td/>
<td valign="top" align="left">max_depth = None</td>
</tr>
<tr>
<td/>
<td valign="top" align="center">Hold</td>
<td valign="top" align="center">0.45</td>
<td valign="top" align="center">0.43</td>
<td valign="top" align="center">0.44</td>
<td/>
<td valign="top" align="left">min_samples_leaf&#x2019; = 1<break/> min_samples_split = 5</td>
</tr>
<tr>
<td valign="top" align="left">Extreme</td>
<td valign="top" align="center">Sell</td>
<td valign="top" align="center">0.31</td>
<td valign="top" align="center">0.03</td>
<td valign="top" align="center">0.06</td>
<td valign="top" align="center">0.42</td>
<td valign="top" align="left">Booster = gbtree</td>
</tr>
<tr>
<td valign="top" align="left">Gradient</td>
<td valign="top" align="center">Buy</td>
<td valign="top" align="center">0.40</td>
<td valign="top" align="center">0.47</td>
<td valign="top" align="center">0.43</td>
<td/>
<td valign="top" align="left">eta = 0.1</td>
</tr>
<tr>
<td valign="top" align="left">Boost</td>
<td valign="top" align="center">Hold</td>
<td valign="top" align="center">0.44</td>
<td valign="top" align="center">0.56</td>
<td valign="top" align="center">0.50</td>
<td/>
<td valign="top" align="left">gamma = 0 grow_policy = depthwise max_depth = 6</td>
</tr>
<tr>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
<td valign="top" align="left">n_estimators = 50</td>
</tr>
<tr>
<td valign="top" align="left">K Nearest</td>
<td valign="top" align="center">Sell</td>
<td valign="top" align="center">0.29</td>
<td valign="top" align="center">0.01</td>
<td valign="top" align="center">0.01</td>
<td valign="top" align="center">0.44</td>
<td valign="top" align="left">n_neighbors = 50</td>
</tr>
<tr>
<td valign="top" align="left">Neighbor</td>
<td valign="top" align="center">Buy</td>
<td valign="top" align="center">0.42</td>
<td valign="top" align="center">0.45</td>
<td valign="top" align="center">0.44</td>
<td/>
<td valign="top" align="left">p = 2</td>
</tr>
<tr>
<td/>
<td valign="top" align="center">Hold</td>
<td valign="top" align="center">0.45</td>
<td valign="top" align="center">0.62</td>
<td valign="top" align="center">0.52</td>
<td/>
<td valign="top" align="left">weights = &#x2018;distance&#x2019;</td>
</tr>
<tr>
<td valign="top" align="left">Logistic</td>
<td valign="top" align="center">Sell</td>
<td valign="top" align="center">0.22</td>
<td valign="top" align="center">0.03</td>
<td valign="top" align="center">0.05</td>
<td valign="top" align="center">0.43</td>
<td valign="top" align="left">C = 1.0</td>
</tr>
<tr>
<td valign="top" align="left">Regression</td>
<td valign="top" align="center">Buy</td>
<td valign="top" align="center">0.43</td>
<td valign="top" align="center">0.33</td>
<td valign="top" align="center">0.37</td>
<td/>
<td valign="top" align="left">fit_intercept = False</td>
</tr>
<tr>
<td/>
<td valign="top" align="center">Hold</td>
<td valign="top" align="center">0.43</td>
<td valign="top" align="center">0.70</td>
<td valign="top" align="center">0.54</td>
<td/>
<td valign="top" align="left">intercept_scaling = 1 penalty = &#x2018;l2&#x2019; solver = liblinear</td>
</tr>
<tr>
<td valign="top" align="left">Naive</td>
<td valign="top" align="center">Sell</td>
<td valign="top" align="center">0.24</td>
<td valign="top" align="center">0.06</td>
<td valign="top" align="center">0.10</td>
<td valign="top" align="center">0.42</td>
<td valign="top" align="left">var_smoothing</td>
</tr>
<tr>
<td valign="top" align="left">Bayes</td>
<td valign="top" align="center">Buy</td>
<td valign="top" align="center">0.35</td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center">0.04</td>
<td/>
<td valign="top" align="left">= 0.001522438</td>
</tr>
<tr>
<td/>
<td valign="top" align="center">Hold</td>
<td valign="top" align="center">0.43</td>
<td valign="top" align="center">0.93</td>
<td valign="top" align="center">0.59</td>
<td/>
<td valign="top" align="left">8403474447</td>
</tr>
<tr>
<td valign="top" align="left">Random</td>
<td valign="top" align="center">Sell</td>
<td valign="top" align="center">0.34</td>
<td valign="top" align="center">0.08</td>
<td valign="top" align="center">0.14</td>
<td valign="top" align="center">0.47</td>
<td valign="top" align="left">criterion = &#x2018;gini&#x2019;</td>
</tr>
<tr>
<td valign="top" align="left">Forest</td>
<td valign="top" align="center">Buy</td>
<td valign="top" align="center">0.42</td>
<td valign="top" align="center">0.48</td>
<td valign="top" align="center">0.45</td>
<td/>
<td valign="top" align="left">max_depth = None</td>
</tr>
<tr>
<td/>
<td valign="top" align="center">Hold</td>
<td valign="top" align="center">0.46</td>
<td valign="top" align="center">0.54</td>
<td valign="top" align="center">0.50</td>
<td/>
<td valign="top" align="left">min_samples_leaf = 1<break/> min_samples_split = 5</td>
</tr>
<tr>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
<td valign="top" align="left">n_estimators = 10</td>
</tr>
<tr>
<td valign="top" align="left">Support</td>
<td valign="top" align="center">Sell</td>
<td valign="top" align="center">0.21</td>
<td valign="top" align="center">0.20</td>
<td valign="top" align="center">0.20</td>
<td valign="top" align="center">0.39</td>
<td valign="top" align="left">C = 1</td>
</tr>
<tr>
<td valign="top" align="left">Vector</td>
<td valign="top" align="center">Buy</td>
<td valign="top" align="center">0.43</td>
<td valign="top" align="center">0.37</td>
<td valign="top" align="center">0.40</td>
<td/>
<td valign="top" align="left">Kernel = sigmoid</td>
</tr>
<tr>
<td valign="top" align="left">Machine</td>
<td valign="top" align="center">Hold</td>
<td valign="top" align="center">0.42</td>
<td valign="top" align="center">0.48</td>
<td valign="top" align="center">0.45</td>
<td/>
<td valign="top" align="left"/></tr>
<tr>
<td valign="top" align="left">Benchmark</td>
<td valign="top" align="center">Sell</td>
<td valign="top" align="center">0.15</td>
<td valign="top" align="center">0.15</td>
<td valign="top" align="center">0.15</td>
<td valign="top" align="center">0.34</td>
<td valign="top" align="left">N/A</td>
</tr>
<tr>
<td valign="top" align="left">Model</td>
<td valign="top" align="center">Buy</td>
<td valign="top" align="center">0.26</td>
<td valign="top" align="center">0.26</td>
<td valign="top" align="center">0.27</td>
<td/>
<td valign="top" align="left"/></tr>
<tr>
<td/>
<td valign="top" align="center">Hold</td>
<td valign="top" align="center">0.27</td>
<td valign="top" align="center">0.24</td>
<td valign="top" align="center">0.29</td>
<td/>
<td valign="top" align="left"/></tr>
</tbody>
</table></table-wrap>
<table-wrap position="float" id="T11">
<label>TABLE 11</label>
<caption><p>Results of top features based on a correlation model.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="center" colspan="7">Correlation Method Algorithms Results<hr/></td>
</tr>
<tr>
<td valign="top" align="left">Algorithms</td>
<td valign="top" align="center">Class</td>
<td valign="top" align="center">Precision</td>
<td valign="top" align="center">Recall</td>
<td valign="top" align="center">F1-score</td>
<td valign="top" align="center">Accuracy</td>
<td valign="top" align="left">Parameters Used</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">AdaBoost</td>
<td valign="top" align="center">Sell</td>
<td valign="top" align="center">0.27</td>
<td valign="top" align="center">0.06</td>
<td valign="top" align="center">0.10</td>
<td valign="top" align="center">0.41</td>
<td valign="top" align="left">learning_rate = 1</td>
</tr>
<tr>
<td/>
<td valign="top" align="center">Buy</td>
<td valign="top" align="center">0.40</td>
<td valign="top" align="center">0.46</td>
<td valign="top" align="center">0.43</td>
<td/>
<td valign="top" align="left">n_estimators = 500</td>
</tr>
<tr>
<td/>
<td valign="top" align="center">Hold</td>
<td valign="top" align="center">0.43</td>
<td valign="top" align="center">0.55</td>
<td valign="top" align="center">0.48</td>
<td/>
<td valign="top" align="left"/></tr>
<tr>
<td valign="top" align="left">Decision</td>
<td valign="top" align="center">Sell</td>
<td valign="top" align="center">0.16</td>
<td valign="top" align="center">0.17</td>
<td valign="top" align="center">0.16</td>
<td valign="top" align="center">0.37</td>
<td valign="top" align="left">Criterion = entropy</td>
</tr>
<tr>
<td valign="top" align="left">Tree</td>
<td valign="top" align="center">Buy</td>
<td valign="top" align="center">0.41</td>
<td valign="top" align="center">0.42</td>
<td valign="top" align="center">0.42</td>
<td/>
<td valign="top" align="left">max_depth = None</td>
</tr>
<tr>
<td/>
<td valign="top" align="center">Hold</td>
<td valign="top" align="center">0.44</td>
<td valign="top" align="center">0.42</td>
<td valign="top" align="center">0.43</td>
<td/>
<td valign="top" align="left">min_samples_leaf<break/> = 6 min_samples_split = 5</td>
</tr>
<tr>
<td valign="top" align="left">Extreme</td>
<td valign="top" align="center">Sell</td>
<td valign="top" align="center">0.28</td>
<td valign="top" align="center">0.22</td>
<td valign="top" align="center">0.22</td>
<td valign="top" align="center">0.41</td>
<td valign="top" align="left">Booster = &#x2018;gbtree</td>
</tr>
<tr>
<td valign="top" align="left">Gradient</td>
<td valign="top" align="center">Buy</td>
<td valign="top" align="center">0.43</td>
<td valign="top" align="center">0.46</td>
<td valign="top" align="center">0.44</td>
<td/>
<td valign="top" align="left">eta = 0.1</td>
</tr>
<tr>
<td valign="top" align="left">Boost</td>
<td valign="top" align="center">Hold</td>
<td valign="top" align="center">0.44</td>
<td valign="top" align="center">0.51</td>
<td valign="top" align="center">0.46</td>
<td/>
<td valign="top" align="left">gamma = 0 grow_policy = depthwise max_depth = 6,</td>
</tr>
<tr>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
<td valign="top" align="left">n_estimators = 200</td>
</tr>
<tr>
<td valign="top" align="left">K Nearest</td>
<td valign="top" align="center">Sell</td>
<td valign="top" align="center">0.32</td>
<td valign="top" align="center">0.04</td>
<td valign="top" align="center">0.07</td>
<td valign="top" align="center">0.44</td>
<td valign="top" align="left">n_neighbors = 50</td>
</tr>
<tr>
<td valign="top" align="left">Neighbor</td>
<td valign="top" align="center">Buy</td>
<td valign="top" align="center">0.43</td>
<td valign="top" align="center">0.45</td>
<td valign="top" align="center">0.44</td>
<td/>
<td valign="top" align="left">p = 2</td>
</tr>
<tr>
<td/>
<td valign="top" align="center">Hold</td>
<td valign="top" align="center">0.42</td>
<td valign="top" align="center">0.57</td>
<td valign="top" align="center">0.48</td>
<td/>
<td valign="top" align="left">weights =<break/> &#x2018;distance&#x2019;</td>
</tr>
<tr>
<td valign="top" align="left">Logistic</td>
<td valign="top" align="center">Sell</td>
<td valign="top" align="center">0.22</td>
<td valign="top" align="center">0.03</td>
<td valign="top" align="center">0.05</td>
<td valign="top" align="center">0.43</td>
<td valign="top" align="left">C = 11.390625</td>
</tr>
<tr>
<td valign="top" align="left">Regression</td>
<td valign="top" align="center">Buy</td>
<td valign="top" align="center">0.43</td>
<td valign="top" align="center">0.33</td>
<td valign="top" align="center">0.37</td>
<td/>
<td valign="top" align="left">fit_intercept =<break/> False</td>
</tr>
<tr>
<td/>
<td valign="top" align="center">Hold</td>
<td valign="top" align="center">0.43</td>
<td valign="top" align="center">0.70</td>
<td valign="top" align="center">0.54</td>
<td/>
<td valign="top" align="left">intercept_scaling =<break/> 1<break/> penalty = &#x2018;l1&#x2019; solver = &#x2018;liblinear &#x2019;</td>
</tr>
<tr>
<td valign="top" align="left">Naive</td>
<td valign="top" align="center">Sell</td>
<td valign="top" align="center">0.08</td>
<td valign="top" align="center">0.00</td>
<td valign="top" align="center">0.01</td>
<td valign="top" align="center">0.40</td>
<td valign="top" align="left">var_smoothing</td>
</tr>
<tr>
<td valign="top" align="left">Bayes</td>
<td valign="top" align="center">Buy</td>
<td valign="top" align="center">0.40</td>
<td valign="top" align="center">0.97</td>
<td valign="top" align="center">0.57</td>
<td/>
<td valign="top" align="left">= 0.0077073466292<break/> 589396</td>
</tr>
<tr>
<td/>
<td valign="top" align="center">Hold</td>
<td valign="top" align="center">0.52</td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center">0.05</td>
<td/>
<td valign="top" align="left"/></tr>
<tr>
<td valign="top" align="left">Random</td>
<td valign="top" align="center">Sell</td>
<td valign="top" align="center">0.27</td>
<td valign="top" align="center">0.10</td>
<td valign="top" align="center">0.15</td>
<td valign="top" align="center">0.42</td>
<td valign="top" align="left">Criterion =<break/> &#x2018;entropy&#x2019;</td>
</tr>
<tr>
<td valign="top" align="left">Forest</td>
<td valign="top" align="center">Buy</td>
<td valign="top" align="center">0.45</td>
<td valign="top" align="center">0.46</td>
<td valign="top" align="center">0.45</td>
<td/>
<td valign="top" align="left">max_depth = None</td>
</tr>
<tr>
<td/>
<td valign="top" align="center">Hold</td>
<td valign="top" align="center">0.42</td>
<td valign="top" align="center">0.52</td>
<td valign="top" align="center">0.46</td>
<td/>
<td valign="top" align="left">min_samples_leaf = 2 min_samples_split = 5 n_estimators = 10</td>
</tr>
<tr>
<td valign="top" align="left">Support</td>
<td valign="top" align="center">Sell</td>
<td valign="top" align="center">0.21</td>
<td valign="top" align="center">0.10</td>
<td valign="top" align="center">0.14</td>
<td valign="top" align="center">0.40</td>
<td valign="top" align="left">C = 1</td>
</tr>
<tr>
<td valign="top" align="left">Vector</td>
<td valign="top" align="center">Buy</td>
<td valign="top" align="center">0.43</td>
<td valign="top" align="center">0.37</td>
<td valign="top" align="center">0.40</td>
<td/>
<td valign="top" align="left">Kernel = &#x2018;sigmoid&#x2019;</td>
</tr>
<tr>
<td valign="top" align="left">Machine</td>
<td valign="top" align="center">Hold</td>
<td valign="top" align="center">0.41</td>
<td valign="top" align="center">0.57</td>
<td valign="top" align="center">0.48</td>
<td/>
<td valign="top" align="left"/></tr>
<tr>
<td valign="top" align="left">Benchmark</td>
<td valign="top" align="center">Sell</td>
<td valign="top" align="center">0.15</td>
<td valign="top" align="center">0.15</td>
<td valign="top" align="center">0.15</td>
<td valign="top" align="center">0.34</td>
<td valign="top" align="left">N/A</td>
</tr>
<tr>
<td valign="top" align="left">Model</td>
<td valign="top" align="center">Buy</td>
<td valign="top" align="center">0.26</td>
<td valign="top" align="center">0.26</td>
<td valign="top" align="center">0.27</td>
<td/>
<td valign="top" align="left"/></tr>
<tr>
<td/>
<td valign="top" align="center">Hold</td>
<td valign="top" align="center">0.27</td>
<td valign="top" align="center">0.24</td>
<td valign="top" align="center">0.29</td>
<td/>
<td valign="top" align="left"/></tr>
</tbody>
</table></table-wrap>
<p>Each algorithm was tested several times, using all of the default methodologies as well as altering specific parameters using a grid search technique. This was done to ensure that we were locating the optimal settings for each model and so that we could in turn find the optimal model. This may increase the processing time taken because each model will need to be run for each combination of parameters but eventually produces better results.</p>
<p>The parameters tested are listed along with their algorithm&#x2019;s synopsis and a short description of what the parameter effects are. Each of the two discrete top features will both be tested in this manner, choosing the best set of parameters, which will in turn find the best algorithm for each set of features. Of the parameters listed for each algorithm, when more than one parameter value is listed, each listed parameter was tested for that model and the specific combination of parameters that produced the best results of all attempts for each model. The definitions for each parameter were taken from Pedregosa et al. (<xref ref-type="bibr" rid="B12">12</xref>) and the SciKit learn documentation. The dummy algorithm was run using Scikit Learn&#x2019;s default classifier.</p>
</sec>
<sec id="S4" sec-type="results">
<title>4. Results</title>
<p>The results displayed in <xref ref-type="table" rid="T10">Tables 10</xref>, <xref ref-type="table" rid="T11">11</xref> show what each algorithm returned using the best tuned parameters found through the grid search phase of the experiment. The results are reported on four different values: precision, recall, the f1-score for each classification, and the overall accuracy of each model. Each metric is reported for each of the three classifications (buy, sell, and hold). Due to the imbalanced nature of the classifications, we prioritize precision over accuracy when determining the efficacy of the classifiers. Along with the results in both tables, the confusion matrix for each model discretization method combination is presented in <xref ref-type="fig" rid="F3">Figures 3</xref>&#x2013;<xref ref-type="fig" rid="F18">18</xref>. The confusion matrix allows us to visualize in depth how each model performed by displaying the number of indices that were misclassified for each of the three classifications. By breaking down each classification into true classifications and false classifications and also labeling how each false classification was mislabeled, we can gain a deeper understanding how the algorithms performed and form a foundation for improvement.</p>
<p>When viewing the confusion matrices for each discretization method and focusing on the top performing algorithms, we can see that for both methodologies, the results were best at predicting true holds, then true buys, and rarely correctly predicted true sells. This indicates that the weight was placed on being more conservative, leaning toward holding over buying and selling. While these decisions are not straightforward, the most important aspect was for the algorithms to correctly classify buy and hold indices over incorrectly sold ones, as buying and holding are more directly related to money lost (i.e., buying a stock that is going to lose value or holding a stock that will lose value both cost you money you already have, whereas selling a stock that will gain you money costs you potential income you have not yet gained).</p>
<sec id="S4.SS1">
<title>4.1. Top 10 features based on a decision</title>
<sec id="S4.SS1.SSS1">
<title>4.1.1. Tree classifier</title>
<p>The top 10 features are presented in <xref ref-type="table" rid="T1">Table 1</xref>.</p>
</sec>
</sec>
<sec id="S4.SS2">
<title>4.2. Summary of top performing algorithms from each methodology</title>
<p><xref ref-type="table" rid="T12">Table 12</xref> displays the best results from both the discretization methods applied. As can be seen below, the random forest nodel has an increase in accuracy of 13% while the K nearest neighbor has an increase in accuracy of 10%. Along with a dramatic increase in accuracy, there is also a dramatic increase in the precision across all of the classifications, nearly doubling the sell classification and over a 15% increase in both the buy and hold classifications.</p>
<table-wrap position="float" id="T12">
<label>TABLE 12</label>
<caption><p>Result from most optimal runs of both discretization models.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="center" colspan="7">Best Performing Algorithms from Both Methodologies<hr/></td>
</tr>
<tr>
<td valign="top" align="left">Algorithms</td>
<td valign="top" align="center">Class</td>
<td valign="top" align="center">Precision</td>
<td valign="top" align="center">Recall</td>
<td valign="top" align="center">F1-score</td>
<td valign="top" align="center">Accuracy</td>
<td valign="top" align="left">Parameters Used</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">K Nearest Neighbor (Correlation Method) Sell</td>
<td valign="top" align="center">0.32</td>
<td valign="top" align="center">0.04</td>
<td valign="top" align="center">0.07</td>
<td valign="top" align="center">0.44</td>
<td valign="top" align="center">n_neighbors = 50</td>
<td valign="top" align="left"/></tr>
<tr>
<td valign="top" align="left"/>
<td valign="top" align="center">Buy</td>
<td valign="top" align="center">0.43</td>
<td valign="top" align="center">0.45</td>
<td valign="top" align="center">0.44</td>
<td/>
<td valign="top" align="left">p = 2</td>
</tr>
<tr>
<td valign="top" align="left"/>
<td valign="top" align="center">Hold</td>
<td valign="top" align="center">0.42</td>
<td valign="top" align="center">0.57</td>
<td valign="top" align="center">0.48</td>
<td/>
<td valign="top" align="left">weights = &#x2018;distance&#x2019;</td>
</tr>
<tr>
<td valign="top" align="left">Random Forest (Tree Method)</td>
<td valign="top" align="center">Sell</td>
<td valign="top" align="center">0.34</td>
<td valign="top" align="center">0.08</td>
<td valign="top" align="center">0.14</td>
<td valign="top" align="center">0.47</td>
<td valign="top" align="left">criterion = &#x2018;gini&#x2019;</td>
</tr>
<tr>
<td valign="top" align="left"/>
<td valign="top" align="center">Buy</td>
<td valign="top" align="center">0.42</td>
<td valign="top" align="center">0.48</td>
<td valign="top" align="center">0.45</td>
<td/>
<td valign="top" align="left">max_depth = None</td>
</tr>
<tr>
<td valign="top" align="left"/>
<td valign="top" align="center">Hold</td>
<td valign="top" align="center">0.46</td>
<td valign="top" align="center">0.54</td>
<td valign="top" align="center">0.50</td>
<td/>
<td valign="top" align="left">min_samples_leaf = 1<break/> min_samples_split = 5<break/> n_estimators = 10</td>
</tr>
<tr>
<td valign="top" align="left">Benchmark Model</td>
<td valign="top" align="center">Sell</td>
<td valign="top" align="center">0.15</td>
<td valign="top" align="center">0.15</td>
<td valign="top" align="center">0.15</td>
<td valign="top" align="center">0.34</td>
<td valign="top" align="left">N/A</td>
</tr>
<tr>
<td valign="top" align="left"/>
<td valign="top" align="center">Buy</td>
<td valign="top" align="center">0.26</td>
<td valign="top" align="center">0.26</td>
<td valign="top" align="center">0.27</td>
<td/>
<td valign="top" align="left"/></tr>
<tr>
<td valign="top" align="left"/>
<td valign="top" align="center">Hold</td>
<td valign="top" align="center">0.27</td>
<td valign="top" align="center">0.24</td>
<td valign="top" align="center">0.29</td>
<td/>
<td valign="top" align="left"/></tr>
</tbody>
</table></table-wrap>
</sec>
</sec>
<sec id="S5">
<title>5. Conclusion and future works</title>
<p>By collecting quarterly data from 100 different stocks over a 20-year period, it is the work&#x2019;s commitment to relate fundamental data to predicted classification on the rise and fall of technical indicators and then produce a decision for the user to buy, sell, or hold a stock. The report will also explore which fundamental features gathered from the quarterly report correlate the most with the decision classification, exploring two different methods of correlation and then producing two sets of features to be utilized in each of the machine learning algorithms. This project also serves the purpose of forming a benchmark foundation to continue studies in this field.</p>
<p>Based on this work, it can be concluded that when continuing this line of study, any efforts should be focused on the Knearest neighbor and the random forest algorithms as they showed the best improvement against the benchmark model. It should also be noted that, while the percentages could be considered low, given the nature of our study, the ability of our classifiers to predict the highest reported precision of 46% and accuracy of47%should be considered a significant improvement. Given the unforgiving nature of the study due to the volatile and unpredictable nature of the data, more work need to be done in this area, but this study shows that fundamental analysis at this stage forms a foundation for future studies. It can also be noted that, on average, the decision tree algorithm results were better than the correlation-based algorithm results. Also, it can be noted that, on average, the precision of the sell was lower than the precision of the buy or hold.</p>
<p>For future work, we are thinking along the lines of: (i) First and foremost, expanding our dataset to all the available stock indices in the Yahoo Finance database and forming a data pipeline to potentially allow our data to be used indefinitely as new data is produced and posted; (ii) combining the best performing algorithms to increase the performance of our models; and finally, (iii) exploring the effect of modifying the features by creating interactive features using domain knowledge.</p>
</sec>
<sec id="S6" sec-type="COI-statement">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec id="S7" sec-type="author-contributions">
<title>Author contributions</title>
<p>NM was responsible for the initial research and study, including the collecting of related works, performing the study of machine learning algorithms, and the initial draft of the manuscript. He also contributed to the final submission. SB was responsible for guiding NM as he conducted his research and advising on the research topic and formation. She also coauthored and edited the final submission. Both authors contributed to the article and approved the submitted version.</p>
</sec>
</body>
<back>
<ref-list>
<title>References</title>
<ref id="B1"><label>1.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ayala</surname> <given-names>J</given-names></name> <name><surname>Garda-Torres</surname> <given-names>M</given-names></name> <name><surname>Noguera</surname> <given-names>JLV</given-names></name> <name><surname>Gomez-Vela</surname> <given-names>F</given-names></name> <name><surname>Divina</surname> <given-names>F</given-names></name></person-group>. <article-title>Technical analysis strategy optimization using a machine learning approach in stock market indices.</article-title> <source><italic>Knowl Based Syst.</italic></source> (<year>2021</year>) <volume>225</volume>:<issue>107119</issue>. <pub-id pub-id-type="doi">10.1016/j.knosys.2021.107119</pub-id></citation></ref>
<ref id="B2"><label>2.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cohen</surname> <given-names>G</given-names></name> <name><surname>Kudryavtsev</surname> <given-names>A</given-names></name> <name><surname>Hon-Snir</surname> <given-names>S</given-names></name></person-group>. <article-title>Stock market analysis in practice: is it technical or fundamental?</article-title> <source><italic>J Appl Finan Bank.</italic></source> (<year>2011</year>) <volume>1</volume>:<fpage>125</fpage>&#x2013;<lpage>38</lpage>.</citation></ref>
<ref id="B3"><label>3.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nti</surname> <given-names>IK</given-names></name> <name><surname>Adekoya</surname> <given-names>AF</given-names></name> <name><surname>Weyori</surname> <given-names>BA</given-names></name></person-group>. <article-title>A systematic review of fundamental and technical analysis of stock market predictions.</article-title> <source><italic>Artif Intell Rev.</italic></source> (<year>2019</year>) <volume>53</volume>:<fpage>3007</fpage>&#x2013;<lpage>57</lpage>. <pub-id pub-id-type="doi">10.1007/s10462-019-09754-z</pub-id></citation></ref>
<ref id="B4"><label>4.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mizuno</surname> <given-names>T</given-names></name> <name><surname>Ohnishi</surname> <given-names>T</given-names></name> <name><surname>Watanabe</surname> <given-names>T</given-names></name></person-group>. <article-title>Novel and topical business news and their impact on stock market activity.</article-title> <source><italic>EPJ Data Sci.</italic></source> (<year>2017</year>) <volume>6</volume>:<fpage>1</fpage>&#x2013;<lpage>14</lpage>. <pub-id pub-id-type="doi">10.1140/epjds/s13688-017-0123-7</pub-id></citation></ref>
<ref id="B5"><label>5.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chong</surname> <given-names>E</given-names></name> <name><surname>Han</surname> <given-names>C</given-names></name> <name><surname>Park</surname> <given-names>FC</given-names></name></person-group>. <article-title>Deep learning networks for stock market analysis and prediction: Methodology, data representations, and case studies.</article-title> <source><italic>Expert Syst Applic.</italic></source> (<year>2017</year>) <volume>83</volume>:<fpage>187</fpage>&#x2013;<lpage>205</lpage>. <pub-id pub-id-type="doi">10.1016/j.eswa.2017.04.030</pub-id></citation></ref>
<ref id="B6"><label>6.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lanbouri</surname> <given-names>Z</given-names></name> <name><surname>Achchab</surname> <given-names>S</given-names></name></person-group>. <article-title>Stock market prediction on high frequency data using long-short term memory.</article-title> <source><italic>Procedia Comp Sci.</italic></source> (<year>2020</year>) <volume>175</volume>:<fpage>603</fpage>&#x2013;<lpage>8</lpage>. <pub-id pub-id-type="doi">10.1016/j.procs.2020.07.087</pub-id></citation></ref>
<ref id="B7"><label>7.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>Q</given-names></name> <name><surname>Li</surname> <given-names>J</given-names></name> <name><surname>Qin</surname> <given-names>Q</given-names></name> <name><surname>Sam Ge</surname> <given-names>S</given-names></name></person-group>. <article-title>Linear, adaptive and nonlinear trading models for Singapore stock market with random forests.</article-title> <source><italic>Proceedings of the 2011 9th IEEE International Conference on Control and Automation (ICCA).</italic></source> <publisher-loc>Paris</publisher-loc> (<year>2011</year>).</citation></ref>
<ref id="B8"><label>8.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Thakur</surname> <given-names>M</given-names></name> <name><surname>Kumar</surname> <given-names>D</given-names></name></person-group>. <article-title>A hybrid financial trading support system using multi-category classifiers and random forest.</article-title> <source><italic>Appl Soft Comput.</italic></source> (<year>2018</year>) <volume>67</volume>:<fpage>337</fpage>&#x2013;<lpage>49</lpage>. <pub-id pub-id-type="doi">10.1016/j.asoc.2018.03.006</pub-id></citation></ref>
<ref id="B9"><label>9.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Patel</surname> <given-names>J</given-names></name> <name><surname>Shah</surname> <given-names>S</given-names></name> <name><surname>Thakkar</surname> <given-names>P</given-names></name> <name><surname>Kotecha</surname> <given-names>K</given-names></name></person-group>. <article-title>Predicting stock market index using fusion of machine learning techniques.</article-title> <source><italic>Expert Syst Applic.</italic></source> (<year>2015</year>) <volume>42</volume>:<fpage>2162</fpage>&#x2013;<lpage>72</lpage>. <pub-id pub-id-type="doi">10.1016/j.eswa.2014.10.031</pub-id></citation></ref>
<ref id="B10"><label>10.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Basak</surname> <given-names>S</given-names></name> <name><surname>Kar</surname> <given-names>S</given-names></name> <name><surname>Saha</surname> <given-names>S</given-names></name> <name><surname>Khaidem</surname> <given-names>L</given-names></name> <name><surname>Dey</surname> <given-names>SR</given-names></name></person-group>. <article-title>Predicting the direction of stock market prices using tree-based classifiers.</article-title> <source><italic>North Am J Econ Finan.</italic></source> (<year>2019</year>) <volume>47</volume>:<fpage>552</fpage>&#x2013;<lpage>67</lpage>. <pub-id pub-id-type="doi">10.1016/j.najef.2018.06.013</pub-id></citation></ref>
<ref id="B11"><label>11.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vijh</surname> <given-names>M</given-names></name> <name><surname>Chandola</surname> <given-names>D</given-names></name> <name><surname>Tikkiwal</surname> <given-names>VA</given-names></name> <name><surname>Kumar</surname> <given-names>A</given-names></name></person-group>. <article-title>Stock closing price prediction using machine learning techniques.</article-title> <source><italic>Proced Comp Sci.</italic></source> (<year>2020</year>) <volume>167</volume>:<fpage>599</fpage>&#x2013;<lpage>606</lpage>.</citation></ref>
<ref id="B12"><label>12.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pedregosa</surname> <given-names>F</given-names></name> <name><surname>Duchesnay</surname> <given-names>E</given-names></name> <name><surname>Perrot</surname> <given-names>M</given-names></name> <name><surname>Brucher</surname> <given-names>M</given-names></name> <name><surname>Cournapeau</surname> <given-names>D</given-names></name> <name><surname>Passos</surname> <given-names>A</given-names></name><etal/></person-group> <article-title>Scikit-learn: machine learning in python. API design for machine learning software: experiences from the scikit-learn project.</article-title> <source><italic>J Mach Learn Res.</italic></source> (<year>2011</year>) <volume>12</volume>:<fpage>2825</fpage>&#x2013;<lpage>30</lpage>.</citation></ref>
<ref id="B13"><label>13.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Picasso</surname> <given-names>A</given-names></name> <name><surname>Merello</surname> <given-names>S</given-names></name> <name><surname>Ma</surname> <given-names>Y</given-names></name> <name><surname>Oneto</surname> <given-names>L</given-names></name> <name><surname>Cambria</surname> <given-names>E</given-names></name></person-group>. <article-title>Technical analysis and sentiment embeddings for market trend prediction.</article-title> <source><italic>Expert Syst Applic.</italic></source> (<year>2019</year>) <volume>135</volume>:<fpage>60</fpage>&#x2013;<lpage>70</lpage>. <pub-id pub-id-type="doi">10.1016/j.eswa.2019.06.014</pub-id></citation></ref>
<ref id="B14"><label>14.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dai</surname> <given-names>Y</given-names></name> <name><surname>Shang</surname> <given-names>Y.</given-names></name></person-group> <source><italic>Machine learning in stock price trend forecasting.</italic></source> <publisher-loc>Stanford</publisher-loc>: <publisher-name>University of Stanford</publisher-name> (<year>2013</year>).</citation></ref>
<ref id="B15"><label>15.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kulshreshtha</surname> <given-names>S</given-names></name> <name><surname>Vijayalakshmi</surname> <given-names>A</given-names></name></person-group>. <article-title>An ARIMA- LSTM hybrid model for stock market prediction using live data.</article-title> <source><italic>J Eng Sci Technol Rev.</italic></source> (<year>2020</year>) <volume>13</volume>:<fpage>117</fpage>&#x2013;<lpage>23</lpage>. <pub-id pub-id-type="doi">10.25103/jestr.134.11</pub-id></citation></ref>
<ref id="B16"><label>16.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jahufer</surname> <given-names>A</given-names></name></person-group>. <article-title>Choosing the best performing garch model for Sri Lanka stock market by non-parametric specification test.</article-title> <source><italic>J Data Sci.</italic></source> (<year>2021</year>) <volume>13</volume>:<fpage>457</fpage>&#x2013;<lpage>72</lpage>.</citation></ref>
<ref id="B17"><label>17.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jeric</surname> <given-names>SV</given-names></name></person-group>. <article-title>Rule extraction from random forest for intra-day trading using crobex data.</article-title> <source><italic>Proc FEB Zagreb Int Odyssey Conf Econ Bus.</italic></source> (<year>2020</year>) <volume>2</volume>:<fpage>411</fpage>&#x2013;<lpage>9</lpage>.</citation></ref>
<ref id="B18"><label>18.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>LJ</given-names></name> <name><surname>Shen</surname> <given-names>W-K</given-names></name> <name><surname>Zhu</surname> <given-names>J-M</given-names></name></person-group>. <article-title>Research on risk identification system based on random forest algorithm-high-order moment model.</article-title> <source><italic>Complexity</italic></source> (<year>2021</year>) <volume>2021</volume>:<fpage>1</fpage>&#x2013;<lpage>10</lpage>.</citation></ref>
<ref id="B19"><label>19.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>M</given-names></name> <name><surname>Zhang</surname> <given-names>Z</given-names></name> <name><surname>Shen</surname> <given-names>J</given-names></name> <name><surname>Deng</surname> <given-names>Z</given-names></name> <name><surname>He</surname> <given-names>J</given-names></name> <name><surname>Huang</surname> <given-names>S</given-names></name></person-group>. <article-title>A quantitative investment model based on random forest and sentiment analysis.</article-title> <source><italic>J. Phys Conf Ser.</italic></source> (<year>2020</year>) <volume>1575</volume>:<issue>12083</issue>. <pub-id pub-id-type="doi">10.1088/1742-6596/1575/1/012083</pub-id></citation></ref>
<ref id="B20"><label>20.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ciner</surname> <given-names>C</given-names></name></person-group>. <article-title>Do industry returns predict the stock market? A reprise using the random forest.</article-title> <source><italic>Q Rev Econ Finan.</italic></source> (<year>2019</year>) <volume>72</volume>:<fpage>152</fpage>&#x2013;<lpage>8</lpage>. <pub-id pub-id-type="doi">10.1016/j.qref.2018.11.001</pub-id></citation></ref>
<ref id="B21"><label>21.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Patel</surname> <given-names>J</given-names></name> <name><surname>Shah</surname> <given-names>S</given-names></name> <name><surname>Thakkar</surname> <given-names>P</given-names></name> <name><surname>Kotecha</surname> <given-names>K</given-names></name></person-group>. <article-title>Predicting stock and stock price index movement using trend deterministic data preparation and machine learning techniques.</article-title> <source><italic>Expert Syst Applic.</italic></source> (<year>2015</year>) <volume>42</volume>:<fpage>259</fpage>&#x2013;<lpage>68</lpage>. <pub-id pub-id-type="doi">10.1016/j.eswa.2014.07.040</pub-id></citation></ref>
<ref id="B22"><label>22.</label><citation citation-type="journal"><collab>Yahoo.</collab> <source><italic>Yahoo Finance - Stock Market Live, quotes, Business &#x0026; Finance News. Yahoo! Finance.</italic></source> <publisher-loc>Sunnyvale, CA</publisher-loc>: <publisher-name>Yahoo</publisher-name> (<year>2022</year>).</citation></ref>
</ref-list>
<app-group>
<app id="A1">
<title>Appendix</title>
<table-wrap position="float" id="T13">
<label>TABLE A1</label>
<caption><p>Original features</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left">Features</td>
<td valign="top" align="left">Description</td>
<td valign="top" align="left">Range of Values</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Date</td>
<td valign="top" align="left">Date that each value</td>
<td valign="top" align="left">Dates from</td>
</tr>
<tr>
<td valign="top" align="left"/>
<td valign="top" align="left">was reported</td>
<td valign="top" align="left">12/31/1999 to 01/01/2020</td>
</tr>
<tr>
<td valign="top" align="left">Name</td>
<td valign="top" align="left">Name of each stock</td>
<td valign="top" align="left">String values of</td>
</tr>
<tr>
<td valign="top" align="left"/>
<td valign="top" align="left">as reported in NASDAQ 100</td>
<td valign="top" align="left">varying lengths</td>
</tr>
<tr>
<td valign="top" align="left">Symbol</td>
<td valign="top" align="left">Symbol used to</td>
<td valign="top" align="left">Strings values from</td>
</tr>
<tr>
<td valign="top" align="left"/>
<td valign="top" align="left">associate each stock</td>
<td valign="top" align="left">three to four</td>
</tr>
<tr>
<td valign="top" align="left"/>
<td valign="top" align="left">to its name within stock market</td>
<td valign="top" align="left">characters.</td>
</tr>
<tr>
<td valign="top" align="left"><bold>DilutedAverage</bold></td>
<td valign="top" align="left"><bold>Shares outstanding after</bold></td>
<td valign="top" align="left"><bold>Dollar values from</bold></td>
</tr>
<tr>
<td valign="top" align="left">Shares<xref ref-type="table-fn" rid="t13fns1">&#x002A;</xref></td>
<td valign="top" align="left">all conversion</td>
<td valign="top" align="left">9,999,999,999 -</td>
</tr>
<tr>
<td valign="top" align="left"/>
<td valign="top" align="left">possibilities are</td>
<td valign="top" align="left">9,999,999,1013</td>
</tr>
<tr>
<td valign="top" align="left">TotalOperating</td>
<td valign="top" align="left">implemented Sum total of profit after</td>
<td valign="top" align="left">Dollar values from</td>
</tr>
<tr>
<td valign="top" align="left">IncomeAsReported</td>
<td valign="top" align="left">subtracting regular,</td>
<td valign="top" align="left">9,999,999,999 -</td>
</tr>
<tr>
<td valign="top" align="left"/>
<td valign="top" align="left">recurring costs and</td>
<td valign="top" align="left">9,999,999,1014</td>
</tr>
<tr>
<td valign="top" align="left">TotalExpenses<sup>+</sup></td>
<td valign="top" align="left">expenses Sum of cost of sales and</td>
<td valign="top" align="left">Dollar values from</td>
</tr>
<tr>
<td valign="top" align="left"/>
<td valign="top" align="left">operating expenses</td>
<td valign="top" align="left">9,999,999,999 -</td>
</tr>
<tr>
<td valign="top" align="left">NetIncomeFrom</td>
<td valign="top" align="left">After-tax earnings</td>
<td valign="top" align="left">9,999,999,1015 Dollar values from</td>
</tr>
<tr>
<td valign="top" align="left">Continuing and</td>
<td valign="top" align="left">generated</td>
<td valign="top" align="left">9,999,999,999 -</td>
</tr>
<tr>
<td valign="top" align="left">DiscontinuedOper-</td>
<td valign="top" align="left"/>
<td valign="top" align="left">9,999,999,1016</td>
</tr>
<tr>
<td valign="top" align="left">ation NormalizedIncome</td>
<td valign="top" align="left">Clearing impact of</td>
<td valign="top" align="left">Dollar values from</td>
</tr>
<tr>
<td valign="top" align="left"/>
<td valign="top" align="left">non-recurring items</td>
<td valign="top" align="left">9,999,999,999 -</td>
</tr>
<tr>
<td valign="top" align="left">InterestIncome</td>
<td valign="top" align="left">Taxable income</td>
<td valign="top" align="left">9,999,999,1017 Dollar values from</td>
</tr>
<tr>
<td valign="top" align="left">InterestExpense</td>
<td valign="top" align="left">Cost of borrowing</td>
<td valign="top" align="left">9,999,999,999 - 9,999,999,1018 Dollar values from</td>
</tr>
<tr>
<td valign="top" align="left"/>
<td valign="top" align="left">money from banks, bond</td>
<td valign="top" align="left">9,999,999,999 -</td>
</tr>
<tr>
<td valign="top" align="left"/>
<td valign="top" align="left">investors, and other</td>
<td valign="top" align="left">9,999,999,1019</td>
</tr>
<tr>
<td valign="top" align="left">NetInterestIncome</td>
<td valign="top" align="left">sources Difference between</td>
<td valign="top" align="left">Dollar values from</td>
</tr>
<tr>
<td valign="top" align="left"/>
<td valign="top" align="left">revenue from</td>
<td valign="top" align="left">9,999,999,999 -</td>
</tr>
<tr>
<td valign="top" align="left"/>
<td valign="top" align="left">interest-bearing assets</td>
<td valign="top" align="left">9,999,999,1020</td>
</tr>
<tr>
<td valign="top" align="left">EBIT</td>
<td valign="top" align="left">and expenses on interest-bearing liabilities Earnings before interest</td>
<td valign="top" align="left">Dollar values from</td>
</tr>
<tr>
<td valign="top" align="left"/>
<td valign="top" align="left">and taxes</td>
<td valign="top" align="left">9,999,999,999 -</td>
</tr>
<tr>
<td valign="top" align="left">EBITDA</td>
<td valign="top" align="left">Earnings before interest,</td>
<td valign="top" align="left">9,999,999,1021 Dollar values from</td>
</tr>
<tr>
<td valign="top" align="left"/>
<td valign="top" align="left">taxes, depreciation, and</td>
<td valign="top" align="left">9,999,999,999 -</td>
</tr>
<tr>
<td valign="top" align="left"/>
<td valign="top" align="left">amortization</td>
<td valign="top" align="left">9,999,999,1022</td>
</tr>
<tr>
<td valign="top" align="left">ReconciledCost</td>
<td valign="top" align="left">Act of reconciling all</td>
<td valign="top" align="left">Dollar values from</td>
</tr>
<tr>
<td valign="top" align="left">OfRevenue</td>
<td valign="top" align="left">sales</td>
<td valign="top" align="left">9,999,999,999 -</td>
</tr>
<tr>
<td valign="top" align="left">Reconciled</td>
<td valign="top" align="left">Fixed asset reconciliation</td>
<td valign="top" align="left">9,999,999,1023 Dollar values from</td>
</tr>
<tr>
<td valign="top" align="left">Depreciation +</td>
<td valign="top" align="left">statement</td>
<td valign="top" align="left">9,999,999,999 -</td>
</tr>
<tr>
<td valign="top" align="left">NetIncomeFrom</td>
<td valign="top" align="left">Net income obtained</td>
<td valign="top" align="left">9,999,999,1024 Dollar values from</td>
</tr>
<tr>
<td valign="top" align="left">Continuing</td>
<td valign="top" align="left">from net of minority</td>
<td valign="top" align="left">9,999,999,999 -</td>
</tr>
<tr>
<td valign="top" align="left">Operation</td>
<td valign="top" align="left">share-holders</td>
<td valign="top" align="left">9,999,999,1025</td>
</tr>
<tr>
<td valign="top" align="left">NetMinority Interest TotalUnusual Items</td>
<td valign="top" align="left">Non-recurring gain or</td>
<td valign="top" align="left">Dollar values from</td>
</tr>
<tr>
<td valign="top" align="left">Excluding</td>
<td valign="top" align="left">loss not considered part</td>
<td valign="top" align="left">9,999,999,999 -</td>
</tr>
<tr>
<td valign="top" align="left">Goodwill<xref ref-type="table-fn" rid="t13fns1">&#x002A;</xref></td>
<td valign="top" align="left">of normal business</td>
<td valign="top" align="left">9,999,999,1026</td>
</tr>
<tr>
<td valign="top" align="left">TotalUnusualItems<xref ref-type="table-fn" rid="t13fns1">&#x002A;</xref></td>
<td valign="top" align="left">Non-recurring gains or</td>
<td valign="top" align="left">Dollar values from</td>
</tr>
<tr>
<td valign="top" align="left"/>
<td valign="top" align="left">losses not considered</td>
<td valign="top" align="left">9,999,999,999 -</td>
</tr>
<tr>
<td valign="top" align="left"/>
<td valign="top" align="left">part of normal business</td>
<td valign="top" align="left">9,999,999,1027</td>
</tr>
<tr>
<td valign="top" align="left">NormalizedEBITDA</td>
<td valign="top" align="left">Net income from</td>
<td valign="top" align="left">Dollar values from</td>
</tr>
<tr>
<td valign="top" align="left"/>
<td valign="top" align="left">continuing operations</td>
<td valign="top" align="left">9,999,999,999 -</td>
</tr>
<tr>
<td valign="top" align="left"/>
<td valign="top" align="left">before interest, income</td>
<td valign="top" align="left">9,999,999,1028</td>
</tr>
<tr>
<td valign="top" align="left"/>
<td valign="top" align="left">taxes, depreciation and amortization, excluding any non-recurring items and/or non-cash equity compensation expense</td>
<td valign="top" align="left"/></tr>
<tr>
<td valign="top" align="left">TotalRevenue<xref ref-type="table-fn" rid="t13fns1">&#x002A;</xref></td>
<td valign="top" align="left">Sum of both operating and non-operating revenues of company as reported for any given quarter</td>
<td valign="top" align="left">Dollar values from 9,999,999,999 - 9,999,999,999</td>
</tr>
<tr>
<td valign="top" align="left">CostOfRevenue +</td>
<td valign="top" align="left">Cost of manufacturing and delivering product or service</td>
<td valign="top" align="left">Dollar values from - 9.999.999.1000</td>
</tr>
<tr>
<td valign="top" align="left">GrossProfit<sup>+</sup></td>
<td valign="top" align="left">Profit after deducting costs associated with making and selling products and/or providing services</td>
<td valign="top" align="left">Dollar values from -9,999,999,999 - 9,999,999,1001</td>
</tr>
<tr>
<td valign="top" align="left">OperatingExpense<sup>&#x002A;+</sup></td>
<td valign="top" align="left">Expense business incurs through its normal business operations</td>
<td valign="top" align="left">Dollar values from 9,999,999,999 - 9,999,999,1002</td>
</tr>
<tr>
<td valign="top" align="left">OperatingIncome</td>
<td valign="top" align="left">Profit realized from operations after deducting operating expenses</td>
<td valign="top" align="left">Dollar values from 9,999,999,999 - 9,999,999,1003</td>
</tr>
<tr>
<td valign="top" align="left">NetNonOperating</td>
<td valign="top" align="left">Expense unrelated</td>
<td valign="top" align="left">Dollar values from</td>
</tr>
<tr>
<td valign="top" align="left">InterestIncomeEx-</td>
<td valign="top" align="left">to core operations;</td>
<td valign="top" align="left">9,999,999,999 -</td>
</tr>
<tr>
<td valign="top" align="left">pense</td>
<td valign="top" align="left">Interest charged on loss of an asset; Does not include day to day expenses</td>
<td valign="top" align="left">9,999,999,1004</td>
</tr>
<tr>
<td valign="top" align="left">OtherIncome</td>
<td valign="top" align="left">Income that does</td>
<td valign="top" align="left">Dollar values from</td>
</tr>
<tr>
<td valign="top" align="left">Expense&#x002A;</td>
<td valign="top" align="left">not relate directly to business operations</td>
<td valign="top" align="left">9,999,999,999 - 9,999,999,1005</td>
</tr>
<tr>
<td valign="top" align="left">PretaxIncome</td>
<td valign="top" align="left">Net sales minus cost of goods sold minus operating expenses</td>
<td valign="top" align="left">Dollar values from 9,999,999,999 - 9,999,999,1006</td>
</tr>
<tr>
<td valign="top" align="left">TaxProvision</td>
<td valign="top" align="left">Estimated income tax company is legally expected to pay</td>
<td valign="top" align="left">Dollar values from 9,999,999,999 - 9,999,999,1007</td>
</tr>
<tr>
<td valign="top" align="left">NetIncome Com-</td>
<td valign="top" align="left">Bottom line profit</td>
<td valign="top" align="left">Dollar values from</td>
</tr>
<tr>
<td valign="top" align="left">monStockholders</td>
<td valign="top" align="left">belonging to common stockholders</td>
<td valign="top" align="left">9,999,999,999 - 9,999,999,1008</td>
</tr>
<tr>
<td valign="top" align="left">DilutedNIAvailto</td>
<td valign="top" align="left">Diluted Net Income;</td>
<td valign="top" align="left">Dollar values from</td>
</tr>
<tr>
<td valign="top" align="left">ComStockholders</td>
<td valign="top" align="left">net income adjusted for not paying out any interest expense or preferred dividends.</td>
<td valign="top" align="left">9,999,999,999 - 9,999,999,1009</td>
</tr>
<tr>
<td valign="top" align="left">BasicEPS</td>
<td valign="top" align="left">Net income minus preferred dividends divided by weight average of common shares outstanding</td>
<td valign="top" align="left">Dollar values from 9,999,999,999 - 9,999,999,1010</td>
</tr>
<tr>
<td valign="top" align="left">DilutedEPS</td>
<td valign="top" align="left">Value used to gauge quality of earnings per share of stock</td>
<td valign="top" align="left">Dollar values from 9,999,999,999 - 9,999,999,1011</td>
</tr>
<tr>
<td valign="top" align="left">BasicAverageShares&#x002A;</td>
<td valign="top" align="left">Average number of shares investors held at any point in period</td>
<td valign="top" align="left">Dollar values from 9,999,999,999 - 9,999,999,1012</td>
</tr>
<tr>
<td valign="top" align="left">TaxRateForCalcs</td>
<td valign="top" align="left">Effective federal tax rate</td>
<td valign="top" align="left">Dollar values from 9,999,999,999 - 9,999,999,1029</td>
</tr>
<tr>
<td valign="top" align="left">TaxEffectOf</td>
<td valign="top" align="left">Net value of taxable</td>
<td valign="top" align="left">Dollar values from</td>
</tr>
<tr>
<td valign="top" align="left">Unusualitems<xref ref-type="table-fn" rid="t13fns1">&#x002A;</xref></td>
<td valign="top" align="left">unusual items</td>
<td valign="top" align="left">9,999,999,999 - 9,999,999,1030</td>
</tr>
<tr>
<td valign="top" align="left">TotalAssets<sup>+</sup></td>
<td valign="top" align="left">Combined value of the total liabilities and shareholder &#x2019;s equity</td>
<td valign="top" align="left">Dollar values from 9,999,999,999 - 9,999,999,1031</td>
</tr>
<tr>
<td valign="top" align="left">TotalLiabilities</td>
<td valign="top" align="left">Share of equity</td>
<td valign="top" align="left">Dollar values from</td>
</tr>
<tr>
<td valign="top" align="left">NetMinority</td>
<td valign="top" align="left">ownership not owned or</td>
<td valign="top" align="left">9,999,999,999 -</td>
</tr>
<tr>
<td valign="top" align="left">Interest<sup>&#x002A;+</sup></td>
<td valign="top" align="left">controlled by parent corporation</td>
<td valign="top" align="left">9,999,999,1032</td>
</tr>
<tr>
<td valign="top" align="left">TotalEquityGross</td>
<td valign="top" align="left">Minority Interests</td>
<td valign="top" align="left">Dollar values from</td>
</tr>
<tr>
<td valign="top" align="left">MinorityInterest</td>
<td valign="top" align="left">divided by the total equity</td>
<td valign="top" align="left">9,999,999,999 - 9,999,999,1033</td>
</tr>
<tr>
<td valign="top" align="left">TotalCapitalization</td>
<td valign="top" align="left">Sum of the long-term debt and all other equities including common stock and preferred stock</td>
<td valign="top" align="left">Dollar values from 9,999,999,999 - 9,999,999,1034</td>
</tr>
<tr>
<td valign="top" align="left">CommonStockEquity</td>
<td valign="top" align="left">Stock held by founders and employees not included in stock owned by parent company</td>
<td valign="top" align="left">Dollar values from 9,999,999,999 - 9,999,999,1035</td>
</tr>
<tr>
<td valign="top" align="left">NetTangibleAssets</td>
<td valign="top" align="left">Total assets of company minus any intangible assets</td>
<td valign="top" align="left">Dollar values from 9,999,999,999 - 9,999,999,1036</td>
</tr>
<tr>
<td valign="top" align="left">WorkingCapital&#x002A;</td>
<td valign="top" align="left">Capital used in day to day trading operations</td>
<td valign="top" align="left">Dollar values from 9,999,999,999 - 9,999,999,1037</td>
</tr>
<tr>
<td valign="top" align="left">InvestedCapital</td>
<td valign="top" align="left">Money raised by issuing securities, stock equity shareholders and debt of bond holders</td>
<td valign="top" align="left">Dollar values from 9,999,999,999 - 9,999,999,1038</td>
</tr>
<tr>
<td valign="top" align="left">TangibleBookValue</td>
<td valign="top" align="left">Book value</td>
<td valign="top" align="left">Dollar values from 9,999,999,999 - 9,999,999,1039</td>
</tr>
<tr>
<td valign="top" align="left">TotalDebt</td>
<td valign="top" align="left">Sum of short- and long-term debt</td>
<td valign="top" align="left">Dollar values from 9,999,999,999 - 9,999,999,1040</td>
</tr>
<tr>
<td valign="top" align="left">ShareIssued</td>
<td valign="top" align="left">Authorized shares sold to and held by shareholders of company</td>
<td valign="top" align="left">Dollar values from 9,999,999,999 - 9,999,999,1041</td>
</tr>
<tr>
<td valign="top" align="left">OrdinaryShares</td>
<td valign="top" align="left">Stocks sold on a public</td>
<td valign="top" align="left">Dollar values from</td>
</tr>
<tr>
<td valign="top" align="left">Number<sup>+</sup></td>
<td valign="top" align="left">exchange.</td>
<td valign="top" align="left">9,999,999,999 - 9,999,999,1042</td>
</tr>
<tr>
<td valign="top" align="left">OperatingCashFlow</td>
<td valign="top" align="left">Cash generated by normal business operation</td>
<td valign="top" align="left">Dollar values from 9,999,999,999 - 9,999,999,1043</td>
</tr>
<tr>
<td valign="top" align="left">InvestingCashFlow</td>
<td valign="top" align="left">Cash generated (or spent) on non-current assets</td>
<td valign="top" align="left">Dollar values from 9,999,999,999 - 9,999,999,1044</td>
</tr>
<tr>
<td valign="top" align="left">FinancingCashFlow</td>
<td valign="top" align="left">Generated cash flow to pay back loan</td>
<td valign="top" align="left">Dollar values from 9,999,999,999 - 9,999,999,1045</td>
</tr>
<tr>
<td valign="top" align="left">EndCashPosition<sup>+</sup></td>
<td valign="top" align="left">Cash on books at specific point in time</td>
<td valign="top" align="left">Dollar values from 9,999,999,999 - 9,999,999,1046</td>
</tr>
<tr>
<td valign="top" align="left">CapitalExpenditure<sup>+</sup></td>
<td valign="top" align="left">Used to undertake new projects or investments</td>
<td valign="top" align="left">Dollar values from 9,999,999,999 - 9,999,999,1047</td>
</tr>
<tr>
<td valign="top" align="left"><bold>IssuanceOf</bold></td>
<td valign="top" align="left"><bold>Amount of money</bold></td>
<td valign="top" align="left"><bold>Dollar values from</bold></td>
</tr>
<tr>
<td valign="top" align="left">CapitalStock</td>
<td valign="top" align="left">generated when</td>
<td valign="top" align="left">9,999,999,999 -</td>
</tr>
<tr>
<td valign="top" align="left"/>
<td valign="top" align="left">company initially sold its common stock on open market</td>
<td valign="top" align="left">9,999,999,1048</td>
</tr>
<tr>
<td valign="top" align="left">RepaymentOfDebt</td>
<td valign="top" align="left">After all long-term debt</td>
<td valign="top" align="left">Dollar values from</td>
</tr>
<tr>
<td valign="top" align="left"/>
<td valign="top" align="left">instrument obligations</td>
<td valign="top" align="left">9,999,999,999 -</td>
</tr>
<tr>
<td valign="top" align="left"/>
<td valign="top" align="left">are repaid, balance sheet will reflect a canceling of principal and liability expenses for total amount of interest</td>
<td valign="top" align="left">9,999,999,1049</td>
</tr>
<tr>
<td valign="top" align="left">RepurchaseOf</td>
<td valign="top" align="left">When a company buys</td>
<td valign="top" align="left">Dollar values from</td>
</tr>
<tr>
<td valign="top" align="left">CapitalStock</td>
<td valign="top" align="left">back its shares from</td>
<td valign="top" align="left">9,999,999,999 -</td>
</tr>
<tr>
<td valign="top" align="left"/>
<td valign="top" align="left">marketplace</td>
<td valign="top" align="left">9,999,999,1050</td>
</tr>
<tr>
<td valign="top" align="left">FreeCashFlow</td>
<td valign="top" align="left">Cash generated after</td>
<td valign="top" align="left">Dollar values from</td>
</tr>
<tr>
<td valign="top" align="left"/>
<td valign="top" align="left">accounting for cash</td>
<td valign="top" align="left">9,999,999,999 -</td>
</tr>
<tr>
<td valign="top" align="left"/>
<td valign="top" align="left">outflows</td>
<td valign="top" align="left">9,999,999,1051</td>
</tr>
<tr>
<td valign="top" align="left">Open</td>
<td valign="top" align="left">Price at which financial security opens in market</td>
<td valign="top" align="left">Value from 0 to 100</td>
</tr>
<tr>
<td valign="top" align="left">High</td>
<td valign="top" align="left">Price at which financial security is highest on market</td>
<td valign="top" align="left">Value from 0 to 101</td>
</tr>
<tr>
<td valign="top" align="left">Low</td>
<td valign="top" align="left">Lowest price of financial security</td>
<td valign="top" align="left">Value from 0 to 102</td>
</tr>
<tr>
<td valign="top" align="left">Close</td>
<td valign="top" align="left">Closing price of financial security</td>
<td valign="top" align="left">Value from 0 to 103</td>
</tr>
<tr>
<td valign="top" align="left">Adj Close</td>
<td valign="top" align="left">Amends stock&#x2019;s closing price</td>
<td valign="top" align="left">Value from 0 to 104</td>
</tr>
<tr>
<td valign="top" align="left">Volume</td>
<td valign="top" align="left">Amount of asset or</td>
<td valign="top" align="left">Value from 0 to</td>
</tr>
<tr>
<td valign="top" align="left"/>
<td valign="top" align="left">security that changes hands</td>
<td valign="top" align="left">999,999,999</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="t13fns1"><p>&#x002A; = Feature derived from Correlation Method. + = Feature derived from Decision Tree Method. &#x002A; + = Feature derived from both methodologies. Please note that definitions of each feature were pulled from investopia or yahoo finance.</p></fn>
</table-wrap-foot>
</table-wrap>
</app>
</app-group>
</back>
</article>