Main Article Content

Authors

Sikha Bagui
Timothy Bennett

Abstract

The Random Forest (RF) algorithm, originally proposed by Breiman et al. (1), is a widely used machine learning
algorithm that gains its merit from its fast learning speed as well as high classification accuracy. However, despite
its widespread use, the different mechanisms at work in Breiman’s RF are not yet fully understood, and there is still
on-going research on several aspects of optimizing the RF algorithm, especially in the big data environment. To
optimize the RF algorithm, this work builds new ensembles that optimize the random portions of the RF algorithm
using genetic algorithms, yielding Random Genetic Forests (RGF), Negatively Correlated RGF (NC-RGF), and
Preemptive RGF (PFS-RGF). These ensembles are compared with Breiman’s classic RF algorithm in Hadoop’s
big data framework using Spark on a large, high-dimensional network intrusion dataset, UNSW-NB15.

Share This Article On Social Media
Usage Statistics

Downloads

Download data is not yet available.

Article Details

Section
Original Research