April 28, 2024
broken clouds Clouds 75 °F

Machine learning research may aid industry

Doctoral researcher Dana Bani-Hani created an Doctoral researcher Dana Bani-Hani created an
Doctoral researcher Dana Bani-Hani created an "oracle" that can make accurate predictions related to spam emails, bank fraud, workers quitting their jobs and more. Image Credit: Jonathan Cohen.

Spam emails, bank fraud, diabetes, workers quitting their jobs. What do these topics have in common? The answer can be found in machine learning research at Binghamton University.

Dana Bani-Hani, a doctoral student studying industrial and systems engineering, has spent the past few years teaching machines how to read data sets in any industry. The system she coded, called a Recursive General Regression Neural Network Oracle (R-GRNN Oracle), takes data inputs and creates prediction outputs.

Classification models are not new in data science and analytics, but what Bani-Hani created goes beyond the basics. A typical system uses algorithms, called classifiers, that run through a data set of many different variables to create a prediction. Oracles are created to run multiple sets of these classifiers to see which algorithm creates the most accurate prediction.

For example, a classifier can look at a myriad of emails and factor in certain word usage, word count and several other variables to determine if the email is spam. An oracle looks at the different classifier outputs and determines which most accurately predicted the spam emails.

What sets the R-GRNN Oracle apart from other oracles is its capability to take classifier outputs and rank them based on their accuracy. Based on the ranking, classifiers are given weights and are combined to produce a prediction superior to any one classifier on its own.

Think of this process like an orchestra. Each instrument has its own strengths, just like different classifiers, so it is useful to include them all. The conductor, like the R-GRNN Oracle, directs the different instruments to play loudly or more softly based on how the instrument makes the final symphony sound.

At this point, the system would be called a General Regression Neural Network (GRNN), which has been created before at Binghamton University. The real crux of Bani-Hani’s work lies in the first letter, R, standing for Recursion.

The R-GRNN Oracle takes the original GRNN output, and uses that entire system as an input for another GRNN prediction. This is combined with the most successful of the original classifiers.

So, back to the orchestra: The original symphony is recorded, and then played back again later. This time, along with the recording, a few instruments play again to further fine-tune the important sounds of the orchestra.

“Because of the way [the GRNN] works, I was able to create the recursive model,” Bani-Hani says. “The concept of recursion is not widely used in machine learning, so I decided to put an oracle inside of an oracle.”

Read more about her work in Discover-e.

Posted in: