Select a theme:   Light Mode  |  Dark Mode
June 23, 2026

Harnessing the data that surrounds us

Computer science students help ecologists make sense of data gathered in the field

Asking what makes a computer scientist head outdoors sounds like the start of a bad joke. But in this instance, Associate Professor Kenneth Chiu has been bringing the outdoors into his computer science classroom. He has been connecting students to research projects that use their programming and data analysis skills to help ecologists gain a deeper understanding of the data they’ve collected. This type of hands-on learning is also showing students the practical applications of computer science outside of the classroom.

Icy Lakes

Junior Colin Fiutak and sophomore Kerry O’Neill were approached by Chiu after their team won first place at the annual HackBU hackathon in 2018.

“He gave us a list of options for projects we could choose from, and we thought this one sounded cool,” says Fiutak.

The computer science majors started work on a project that uses satellite imagery to identify when lakes became covered with ice.

“We spend five or six hours per week on research, usually on the weekends. We have to use geographical information systems, like satellite imaging, which aren’t always compatible with the Python programming that we have, so we had to figure out how to adjust it,” says Fiutak.

O’Neill explains that getting all of the lake data set up in their system could prove helpful to study the effects of climate change over time. In April 2018, Fiutak and O’Neill joined Chiu at the Global Lake Ecological Observatory Network Conference.

“We gave a one-minute lightning talk on what we were planning on doing and networked with people who specialized in studying lakes. One person at the event suggested the data set that we now work with,” says Fiutak. The next step for O’Neill and Fiutak is to start expanding the data they’re entering into their system.

Their current data set is limited. “It’s about 200 lakes in the Northeast, and our model currently is 99.9 percent accurate in detecting whether there is ice cover on a lake,” explains Fiutak. “The next step is expanding the scope of this outside the Northeast. We want to make sure that the data set involves all lakes in the United States or at least has a good, diverse group of lakes.”

An App to Measure Water Quality

What if you could use your smartphone camera to tell the quality of the water in front of you? That’s the project that computer science graduate student Shehtab Zaman and sophomore Kate Baumstein are working on.

Zaman says that normally, scientists lower a Secchi disk into the water until they can no longer see the disk to figure out the turbidity of the water. But this isn’t an exact science; it’s based on the best judgment of the scientists.

Instead, Zaman and Baumstein were asked to create a system that could more accurately identify the transparency of water to understand its cleanliness.

“The idea was to take pictures of lake water and be able to say how much chlorophyll and colored dissolved organic matter, or CDOM, is there,” Zaman says. “Any smartphone can do it — well, any smartphone after the iPhone 6, since the cameras changed after that.”

Baumstein spent last summer collecting pictures and testing water to build the data needed to make their system accurate, and she’ll continue adding data this summer.

Identifying Algae

Zaman, who earned his bachelor’s degree from Binghamton University in physics with a minor in computer science, is also working with Chiu on another ecological project.

A researcher at another university had classified 300,000 pictures of algae and wanted to see if a computer scientist could use that data to make algal identification easier.

So, Zaman is running these photos through a machine-learning application that will pick up the visual cues in the photographs to help classify the types of algae.

“It takes a long time,” he says. “You have to train it over and over again because you have to get the system familiar with each picture, but also every possible orientation of that picture. There have been times when I just let the computer run for a week or so to see what would happen.”

Having 300,000 pictures of algae that had already been classified by hand is a good start, but Zaman says incorporating all the various rotations of those photos has given him more than 7 million images to run through the machine.

“It should make the system more accurate, but sometimes you’ll be training it and start to see the accuracy going backward. It’s all part of the process though,” he says.

Discovering the Vector

Scientists know for a fact that the Culex mosquito is the carrier of the West Nile Virus. However, there are some diseases for which scientists are unsure of the origin — what they call vector.

Undergraduate students Gabriel Steinberg, Yan Man and Hayden Brown have started work on developing a system that can identify the vector of a disease based on the string of characters in the genome of an RNA virus.

“The genome of an RNA virus is represented by a string of characters made up of A, C, T or G,” explains Steinberg. “There are people who have documented these genomes manually, and we’re using that data to find any significant patterns that could identify the vector.”

Brown says that some viruses are betterdocumented than others due to the prevalence of the virus or the amount of research that has gone into the virus. For instance, the dengue virus has more than 5,000 genomes recorded, while the Usutu virus has only 143 on record.

The students use a program to find how many times a k-mer appears in the genome. K-mers are a specific sequence of letters. For instance, the AAAA k-mer appears more than 1,300 times in the dengue virus but AAACG appears fewer than 100 times.

“We’re looking for instances where a k-mer appears frequently in a virus known to come from a Culex mosquito but infrequently from a virus known to come from an Aedes Aegypti mosquito,” Brown says. “That difference means that the k-mer is significant in telling us the vector of the virus.”

Knowing the vector of the virus can help people better prepare for a disease. For instance, the Aedes Aegypti mosquito is commonly found in Florida, Arizona, Texas, Louisiana and Mississippi. So, if that mosquito is the vector, health officials know where to focus resources to fight the disease.

DNA Recombination

A few other researchers are looking at DNA to understand more about how your parents’ DNA recombined to create your DNA. John Wolters ’12, PhD ’18, is currently a postdoctoral research associate at the University of Wisconsin–Madison, but he started working on DNA research as a biology PhD student at Binghamton University.

“The DNA that you inherit isn’t exactly the same as your parents,” explains Wolters. “You have unique combinations of DNA that did not exist in your parents. We want to develop bioinformatics tools to analyze that DNA sequencing data to determine where the shuffling takes place and how often.”

Wolters is working with computer science senior Alison Gim and doctoral student Shawn Bailey ’15, MS ’18.

“For [Alison and me], when we look at DNA, we see it as really long string sequences, and we deal with strings all the time in computer science,” Bailey says. “We have lots of techniques we can use to look for the features that John might be interested in.”

The team has been working with millions of pieces of DNA that were collected by mating a large number of different yeast cells together.

“By analyzing the recombination events across these cells, we can model these processes in fine detail. Using methods from computer science is absolutely essential for processing such big data,” Wolters says.

The work between biology and computer science is a perfect example of why Chiu has proposed these projects in the first place.

As Wolters explains, “This is really helpful for people in the biology world. We want to work on these biological questions, but we don’t always have the computational expertise, which is why collaborations like this are especially fruitful.”