Data Salon

Data Salon Schedule

The need to address racism and social justice issues touches all aspects of life. Data science is no exception. In looking at how data science has enabled social injustice in the past and how data scientists can instead work to improve the social good, the Data Science Transdisciplinary Area of Excellence will bring in speakers under its Equitable Data Science umbrella. These speakers will look into issues such as racism, fringe groups on the internet, the fairness of AI and other topics that will move conversations in the data science realm forward to be inclusive and equitable. All talks in this special series on Equitable Data Science will be identified with an asterisk (*) next to the title of the talk.

Custom Crowdsourcing App for Health in Premodern Mexico

Noon-1 p.m. Friday, Sept. 29, 2023, in AA-340, with lunch

In this presentation, we demo and explain the digital infrastructure for a new historical crowdsourcing application of geo-located health events for Mexico before the twentieth century. The crowdsourced database aims to facilitate international collaboration on a data-intensive health-related project and is an important teaching resource for undergraduate classrooms. It uses an Angular frontend with Spring Boot micro-services connected to a postgresql database, which subsequently feeds a website for data visualization in maps and graphics. In the future, we aim to make the database freely available to researchers for purposes of analysis, download, and output of custom cartography and other visualizations. The project is in the initial development phase in which we continue to question the underlying data design and web implementation. We hope to open a conversation with data scientists about possible path-ways forward.

About the speakers: Bradley Skopyk is an Associate Professor in the Department of History. Skopyk’s research interests include Colonial Latin America and Environmental History. David Mixter is a Research Assistant Professor of the Environmental Studies Program and the Department of Anthropology. Mixter’s research interests include Archaeology of the Ancient Maya, Comparative Studies of Societal Collapse and Recovery, Frontiers and Early Complexity, Social Dynamics of Urban Landscapes, Collective Memory, and GIS and Spatial Analysis.

While in-person attendance is strongly encouraged, those who cannot make it in person may request a Zoom link by emailing Xingye Qiao.

Polarization Games over Social Networks

Noon-1 p.m. Friday, April 28, in AA-340, with lunch

In this talk, we will provide a quantitative analysis of a game over a social network of agents, some of which are controlled by one of the two players whose objectives are, respectively, maximizing and minimizing polarization over the entire network. Opinions of agents evolve according to the Friedkin-Johnsen model, and players can change only the innate opinion of an agent of their choosing. Polarization is measured via the sample variance of the agents' steady-state opinions. The practically motivated disjointness constraint on the set of players' choice of agents transforms this simple zero-sum game into a compelling and largely unexplored research problem. We analyze the properties of the Nash equilibria for different variations of this game, and simulate a variation of the well-known fictitious play algorithm to obtain the equilibrium in synthetic and real data networks, where the disjointness constraint imposes a minor modification on the aforementioned algorithm. We will finally present several intriguing aspects of polarization games uncovered by our analysis and simulation results. All of our codes and datasets are available publicly for research purposes. This talk is based on joint work with Xilin Zhang and Dr. Zeynep Ertem of the SSIE Department at Binghamton University.

About the speaker: Emrah Akyol is an assistant professor of computer and electrical engineering at Binghamton University, where he has been since 2017. He was a postdoctoral scholar at UIUC from 2014 to 2017, and at USC from 2013 to 2014. During 2006-2007, he worked in HP Labs and NTT Docomo Labs, both in Palo Alto, Calif. He is a senior member
of the IEEE and 2020 NSF CAREER Award winner. His research interests are broadly in the area of networked systems theory and algorithms, at the intersection of control, optimization and game theory; communications, and information theory. 

While in-person attendance is strongly encouraged, those who cannot make it in person may request a Zoom link by emailing Xingye Qiao.



Modeling Heterogeneity in Cognitive Trajectoriesin the Framingham Heart Study

Noon-1 p.m. Friday, March 24, in AA-340, with lunch


This talk is a showcase of a project where unsupervised statistical learning techniques were applied to explore subgroups of patterns of cognitive decline among older adults using complex repeatedly measured data from a large community-based cohort. The prevalence of cognitive impairment in the population is growing; maintaining good cognitive health is central to successful aging, independence, and well-being. While older age is generally associated with worse cognitive function, there is substantial heterogeneity in the rate of cognitive decline across different cognitive domains and in terms of the clinical onset age of dementia. Recently, harmonized factor scores measuring global cognitive function for the memory, executive function, and language domains have been created in the community-based Framingham Heart Study (FHS) cohorts. In this work, we identified FHS participants with two or more repeated factor scores after the age of 60 (n=2339; 57% female; 17% APOEε4 carriers; 64% attended colleges) and were dementia-free at the baseline visits. We fitted latent class mixed models (LCMM) to cluster cognitive trajectories from all three domains using the harmonized factor scores. Non-linear trends with age in the trajectories were modeled via piecewise linear LCMM models followed by stepwise selections to select cluster-specific change points. We identified different latent classes of participants, some characterized by an early cognitive decline before age 70 as compared to late decliners, across different domains. Using 10-fold crossvalidation, we also showed that the subgroupings of participants we obtained are stable. Our findings indicate class-related differential patterns in cognitive aging in the FHS. Future associations between the identified subclasses, cognitive outcomes, and physical or biological markers may advance the knowledge of multiple pathways of cognitive aging.

About the speaker: Yuan Fang joined the Department of Pharmaceutical Sciences at Binghamton University as an assistant professor in 2023. She received her PhD in Mathematical Sciences with a focus on statistics from Binghamton University under the guidance of Sanjeena Dang. Her PhD research focused mainly on developing novel unsupervised learning algorithms for non-standard data types that are usually encountered in biomedical research. Following her graduate studies, Fang joined the group of Kathryn Lunetta and Joanne Murabito at Boston University School of Public Health as a postdoctoral associate where her research focused on investigating the association between circulating immune cell phenotypes in the pro-inflammatory and regulatory pathways with cognitive decline, dementia and Alzheimer’s disease. She has also been working on quantifying the heterogeneity in decline trajectories of cognitive functions in the Framingham Heart Study participants. Fang’s current research focuses on studying lipid profiles for ceramide pathways in boys with Duchenne Muscular Dystrophy using multi-omics statistical and bioinformatics approaches. She is also interested in extending existing models and statistical approaches for clustering longitudinal data. 

Panel - Machine Learning and AI 4 Sciences, Arts and Engineering

Noon-1:30 p.m. Wednesday, March 15, in AA-340

Are you curious about the cutting-edge advancements of machine learning and AI in sciences, arts and engineering? Join us for a scientific panel discussion featuring experts in the field.

Recent breakthroughs in machine learning and AI, such as the success of ChatGPT and other models, have revolutionized the way we approach scientific research, artistic expression and engineering design. From predicting disease outbreaks to creating lifelike virtual worlds, machine learning and AI have incredible potential to transform the world around us.

Our expert panelists will share their insights and experiences on how machine learning and AI are being applied in diverse fields, including physics, biomedicine, art, urban planning and more.

Join us for an exciting and informative discussion on the role of machine learning and AI in shaping our future.

Kenneth Chiu, compueter science
Alexey Komolgorov, physics
Adnan Siraj Rakin, computer science
Christopher Swift, art and design
Yingxue Zhang, computer science

Carl Lipo, anthropology and Harpur Dean's Office







Custom Crowdsourcing App for Health in Premodern Mexico

CANCELED and will be rescheduled at a future date

Abstract: In this presentation, we demo and explain the digital infrastructure for a new, historical crowdsourcing application of geo-located health events for Mexico before the 20th century. The crowdsourced database facilitates international collaboration on a data-intensive health-related project and is an important teaching resource for undergraduate classrooms. It uses an Angular frontend with Spring Boot microservices connected to a postgresql database, which subsequently feeds a website for data visualization in maps and graphics. The same web portal makes the database freely available to researchers for purposes of analysis, download, and output of custom cartography and other visualizations.

About the speakers: The speakers are both from Binghamton University. Brad Skopyk is  an associate professor of history. He earned his bachelor's and master's degrees from the University of Saskatchewan, Canada, and his PhD from York University, Canada. His research focuses on Colonial Latin America and environmental history. David Mixter received his PhD from Washington University in St. Louis. He is a research assistant professor of anthropology and environmental studies interested in societal collapse and regeneration, social change, collective memory, relationality, archaeology of communities, Ancient Maya, Geographic information Systems, activity area studies, microartifacts and soil chemistry.

Integrated Dynamic State Estimation in Power Systems

Noon-1 p.m. Friday, Dec. 2, in AA-340


Abstract: To make well-informed decisions, power system operators need accurate timely estimates of the operational conditions of the power grid. Up to the present time, conventional static state estimators have been widely deployed in utility control centers to improve the estimation accuracy and expand the monitoring areas. However, these estimators are no longer sufficient for monitoring the modern power grid, which is experiencing increasing uncertainty and variation driven by the high penetration of intermittent renewable energy sources (mainly solar and wind). In fact, conventional static state estimation methods for power grids often fail to provide any useful information during transmission-line tripping and cascading grid failures when the power system rapidly changes, and state estimation results are crucially needed. 

In this presentation, the conventional state estimation is reviewed. Also, a dynamic state estimation (DSE) approach is proposed that can not only estimate current operational conditions but also predict their future trends and quantify their uncertainty. To minimize the financial cost of measurement devices while achieving observability of important system states, observability and detectability studies are carried out to guide measurement placement and model selection. It is shown that many dynamic states in the power systems are marginally observable (virtually unobservable). If an observer model can be chosen to make the eigenvalues of the corresponding states stable, the DSE can still converge to the true value of the states. 

About the speaker: Ning Zhou, associate professor of electrical and computer engineering at Binghamton University,he received his PhD in electrical engineering with a minor in statistics from the University of Wyoming in 2005. From 2005 to 2013, he worked as a power system engineer at the Pacific Northwest National Laboratory. His research interests include power system dynamics and statistical signal processing. Zhou is a senior member of the IEEE Power and Energy Society (PES) and has been an associate editor for IET Generation Transmission and Distribution since 2016. He is the lead author of the 2009 Technical Committee Prize Paper from the IEEE/PES Power System Dynamic Performance Committee. He has been the co-chair of the IEEE PES Working Group on Data Access and secretary of IEEE PES Task Force on Oscillation Source Location since 2016. He received the 2009 Outstanding Engineer of Year Award from IEEE Power and Energy Society (PES) Richland Chapter and is the recipient of IEEE PES Outstanding Branch Counselor Award in 2017. He is the PI of the NSF CAREER award titled “Integrated Dynamic State Estimation for Monitoring Power Systems under High Uncertainty and Variation” in the year of 2019. 







Spatial-Temporal Generative Adversarial Learning

Noon-1 p.m. Friday, Nov. 4, in AD-148 (Couper Administration Building)

Abstract: With the development of sensing and communication technologies, spatial-temporal big data has been widely generated and used in urban life, which helps to solve many problems related to smart cities, public safety and human behavior analysis. However, it is challenging to deal with the spatial-temporal big data analytics problems (e.g., urban traffic estimation), because the data contains complex spatial-temporal dependencies, and is highly related to many other complicated factors. In this talk, an overview of the works that solve the spatial-temporal big data analytics problems in a deep generative adversarial perspective will be introduced. Some examples of the spatial-temporal generative adversarial learning applied in both urban traffic estimation problem and human behavior analysis problem will be presented. These works perfectly combine generative adversarial networks with unique components targeting the unique challenges in the spatial-temporal big data analytics area. And some future research on decision making, human behavior analysis and spatial-temporal data mining will also be discussed as well.

About the speaker: Yingxue Zhang is an assistant professor in the Computer Science Department at Binghamton University. She received her PhD in data science from Worcester Polytechnic Institute in 2022. Her research interests include: (1) designing novel data mining, machine learning and AI techniques to solve spatial-temporal big data analytics problems related to smart cities and public safety, and (2) human behavior analysis and decision making.

Machine and Human Roles for Mitigation of Misinformation Harms during Crises: An Activity Theory Conceptualization and Validation

Noon-1 p.m. Friday, Oct. 14, 2022, in AA-340

Abstract: During crises, there is a need for alarge amount of information in a short period. Such need creates the base for misinformation to spread within and outside the affected community.This may result in misinformation harms that can generate serious short-term or long-term consequences. In such situations, there is a need for a joint human-machine effort to mitigate misinformation. Though there has been research in the area of management of AI in the recent past, there has been scarce work in examining situations where machines and humans interact for mitigating misinformation. In order to systematically analyze misinformation and suggest mechanisms for mitigation, we draw on Activity Theory to conceptualize a suitable framework. Such a framework will enable investigating human-machine interactions through loops of “misinformation generation” and “misinformation mitigation” activities for mitigating misinformation harms. The paper also validates the framework using three different target audiences: undergraduates, graduates and professionals.

About the speaker: Thi Tran is currently an assistant professor of management information systems at the School of Management at Binghamton University. He holds a PhD in information technology specialized in cyber security research from the University of Texas at San Antonio. His research interest lies in social cybersecurity issues. Currently, his research streams focus on examining aspects related to misinformation harms during crises, and system designs to tackle crisis misinformation flows. He plans on expanding these streams on various cyber security issues related to but not limited to phishing emails, fake social media accounts, data breaches and digital privacy. He has experience utilizing various behavioral and technical research methodologies, including but not limited to behavioral surveys, longitudinal studies, experiments, Delphi technique, psychometric analyses, measurement development, machine learning, and design science research. He has published in several journals as well as domestic and international conference proceedings publications, together with some under review manuscripts at high-impact factor journals. His work has received the awarded National Science Foundation (NSF) research grant for disaster management and various other institutional grants.
Research profiles:

A novel sparse model-based algorithm to cluster categorical data in health screening for interpersonal violence

1-2 p.m. Wednesday, April 27, 2022, via Zoom at

Abstract: Technology advancements in diagnostic imaging, smart sensing, and health information systems have resulted in a Big Data environment in health care. It is now possible to track every piece of information related to a patient’s care cycle including screening diagnosis, prognosis, treatment, care delivery, and continuous monitoring. However, the size and complexity of the healthcare big data often overwhelm the modeling capability of existing statistical methods. Specifically, this talk introduces a novel sparse model-based clustering algorithm to facilitate health screening for interpersonal violence. Existing studies have reported a large variation in interpersonal violence screening rates across the U.S. healthcare settings ranging from 10% to 90%. Therefore, it is critical to delineate the heterogeneity in violence screening and promote the uptake of routine screening to mitigate the consequences of violence and improve women’s health. However, existing research is empirical and qualitative. The recently developed Health Care Provider (HCP) study is the first of its kind that enables quantitative data to be collected to measure various aspects of health screening for interpersonal violence. Motivated by the HCP data, this study develops a sparse categorical Factor Mixture Model (sc-FMM) that aims to stratify healthcare providers into relatively homogeneous clusters. The developed sc-FMM simultaneously considers mixed types of categorical nominal and ordinal variables for clustering and incorporates an L_21norm for variable selection. An Expectation Maximization framework integrated with Gauss-Hermite approximation was developed for model estimation. Simulation studies show significantly better performance of sc-FMM than competing methods. sc-FMM was applied to identify clusters of healthcare providers from the HCP data. The findings reveal how the providers’ screening rate for interpersonal violence are associated with multi-source impacting factors which inform the formation of policy and intervention development, eventually leading to improved health screening and public health. 

About the speaker: Bing Si, an assistant professor of systems science and industrial engineering at Binghamton University, received her BS in mathematics from University of Science and Technology of China in 2012, and an MS and PhD in industrial engineering from Arizona State University in 2014 and 2018, respectively. Si's research focuses on developing data analytics and statistical learning methodologies to support healthcare decisions in screening, diagnosis, prognosis, treatment and care delivery. Her research has been applied to a number of healthcare domains including Alzheimer’s disease, migraine, traumatic brain injury, and interpersonal violence screening. She is a recipient of multiple awards and scholarships, including Dean’s Dissertation Award from Ira A. Fulton Schools of Engineering at ASU, Outstanding Emerging Fulton Student Organization Leader (for serving as VP of INFORMS Student Chapter), and Grace Hopper Faculty Scholarship. She is a member of IISE, INFORMS, and IEEE. Her research lab is supported by both industry and federal sponsors including AHRQ, IBM and SUNY. 

The Map Collaboratory at Binghamton University: Collaborative Geospatial Data and Open Maps for Advanced Researchers

3:30-4:30 p.m. Friday, March 25, 2022, in AA-340

Abstract: Participants will be introduced to, and given a demonstration of, new GIS infrastructure on campus, highlighting its many new capabilities and recommending its use for advanced researchers who are either collaborating with others to build GIS datasets or are looking to unlock new possibilities of their GIS data, such as direct updating of published online map services, hosting of map services to function in open-source webmapping applications, or dynamic aggregation of datasets. These features will be briefly introduced and demonstrated, with ample time for a question and answer session, especially to discuss the fit of this infrastructure for your projects.

About the speakers: Bradley Skopyk is an associate professor of history at Binghamton University. His research interests include Colonial Latin
America and environmental history. Kevin Heard is the associate director of the GIS Campus Core Facility. His research Interests include GIS, demographics, internet mapping and GIS story maps. 

Critical Learning Periods in Federated Learning: Existence, Explanation, and Efficiency

1-2 p.m. Wednesday, March 2, 2022
Join the Zoom meeting at

Abstract: Federated learning (FL) is a popular technique to train machine learning (ML) models with decentralized data. Extensive works have studied the performance of the global model; however, it is still unclear how the training process affects the final test accuracy. Exacerbating this problem is the fact that FL executions differ significantly from traditional ML with heterogeneous data characteristics across clients, involving more hyperparameters. In this talk, we first show that the final test accuracy of FL is dramatically affected by the early phase of the training process, i.e., FL exhibits critical learning periods, in which small gradient errors can have irrecoverable impact on the final test accuracy. To further explain this phenomenon, we generalize the trace of the Fisher Information Matrix (FIM) to FL, and define a new notion called FedFIM, a quantity reflecting the local curvature of each client from the beginning of the training of FL. Our findings suggest that the initial learning phase plays a critical role in understanding the FL performance. This is in contrast to many existing works which generally do not connect the final accuracy of FL to the early phase training. Finally, we show that seizing critical learning periods in FL is of independent interest and could be useful for other problems such as the choices of hyperparameters including the number of clients selected per round, batch size, and more, so as to improve the performance of FL training and testing. 

About the speaker: Jian Li is an assistant professor of computer engineering with the Department of Electrical and Computer Engineering at Binghamton University, State University of New York (SUNY). He was a postdoc with the College of Information and Computer Sciences at the University of Massachusetts Amherst from January 2017 to Aug. 2019. He received a PhD degree in computer engineering from Texas A&M University in December 2016, and a BE degree from Shanghai Jiao Tong University in
June 2012. His current research interests lie in the areas of reinforcement learning, distributed learning, network optimization, online algorithms and their applications in emerging/NextG networked systems. 

Research Development at the Motion Analysis Research Laboratory

1-2 p.m. Wednesday, Feb. 23, 2022
Join the Zoom meeting at

Abstract: Vipul Lugade is the director of the newly established Motion Analysis Research Laboratory, which is being designed to be a high-tech facility that creates opportunities for immersive and dynamic learning opportunities for students in the physical therapy and occupational therapy programs. As part of the laboratory teaching and research initiatives, patients from the community will be thoroughly evaluated for gait, posture, strength, as well as cognitive and balance performance. A holistic evaluation will include data collected from a 10-camera motion analysis system, floor-embedded force plates, a 16-sensor electromyography system, an immersive virtual reality computerized posturography, and isokinetic dynamometer. In order to serve the community, the laboratory will provide objective evaluations of human movement, investigating the effects of morbidity, and providing individualized assessments and interventions. Of specific concern is the evaluation of fall risk in older adults, and the assessment of single- and dual-task balance performance in athletes following a concussion. To date, smartphone tools have been established which allow for home-based evaluation of standing posture, gait, and cognitive performance in both young and older adults. Furthermore, preliminary results indicate the feasibility of using smartphones to deliver standardized intervention, improve recovery, and allow for accelerated return-to-play decision making. Ongoing research includes the use of heterogeneous data sets obtained from clinical assessments, laboratory-based evaluations, and home-based tests to identify fall risk and  the potential for creating individualized treatment options to reduce risk for further injury. Past results and proposed research studies within the Motion Analysis Research Laboratory will be discussed. 

About the speaker: Lugade received his BS in Engineering from Harvey Mudd College in 2002. In 2007, he received his MS, and in 2011, his PhD, in biomechanics from the University of Oregon. He was a postdoctoral fellow at the Mayo Clinic until 2013 and a Whitaker Postdoctoral Scholar at Chiang Mai University until 2015. At this point he founded an engineering consulting company and in 2021 joined the Division of Physical Therapy at Binghamton University. 

Multiscale Mechanics of Brain Folding:Brain Wrinkles and Folds Matter

1-2 p.m. Wednesday, Feb. 2, 2022
Join the Zoom meeting at

Abstract: Cortical folding is one of the most complex processes that occur during the development of the human brain. During brain development, the cerebral cortex experiences a noticeable expansion in volume and surface area accompanied by tremendous tissue folding. Many studies attest that knowledge of cortical folding is key to interpreting the normal development of human brains during the early stages of growth. Cognitive or physiological impairments, e.g. epilepsy, autism, and schizophrenia, are believed to be the consequences of abnormal cortical folding during the early stage of the brain’s development. In this talk, I will present some of our latest findings and conclusions on the role of the mechanics in the growth and folding of the human brain. The possible origin and formation mechanism of variable and regular folding patterns in the human brain will be presented. It also will be discussed how a combined methodology of brain imaging, computational mechanical modeling, and machine learning simulation could help us to unravel the mystery of brain folding. The mechanism of normal development of the human brain will provide a foundation to uncover the origin of the abnormal folding patterns in developmental brain disorders.

About the speaker: Mir Jalil Razavi is an assistant professor of mechanical engineering at Binghamton University. He joined the Mechanical Engineering Department in fall 2018. Before joining Binghamton University, he received his MS degree in solid mechanics from the University of Tabriz in 2009, and his PhD degree from the University of Georgia in 2018. His research interests include solid mechanics, biomechanics and the mechanics of soft/bio materials. His research group, Mechanics of Soft/Bio Materials Lab, develops analytical and cutting-edge computational models to study the mechanical behavior of biological tissues/organs such as brain and skin.

Long-term prediction for nonstationary dynamicintermittency: a case study on disease onset prognosis to support smart monitoring for telehealth 

1-2 p.m. Wednesday, Jan. 26, 2022
Join the Zoom meeting at

Abstract: Real-world streaming sensor data is noisy, highly nonstationary and gathered in an unstructured form. Oftentimes, it is assumed that the observations (data) are generated by the underlying but unknown dynamic system. Predicting dynamic behaviors of realworld systems is fundamentally critical for many scientific disciplines, such as forecasting in finance, forewarning pathological symptoms in health care as well as prognostic maintenance in industry applications. In this talk, a long-term prediction approach is introduced to analyze the intermittent process, one of the most common nonstationary dynamic systems in real world. The presented approach treats the intermittent dynamics as recurrent transitions between localized stationary segments/attractors in the reconstructed phase portrait. An advanced deep learning approach incorporated with Gaussian process Bayesian filtering is then developed to capture the recurrence nature of the complex dynamics. Further investigations including the analysis of a Lorenz-like dynamic system as well as a case study on forecasting onsets of pathological symptoms via monitored Electrocardiogram (EEG) signals are then presented. The case study results suggest that the presented approach improves the forecasting outcomes by extending the prediction time horizon with order of magnitudes while maintaining high accuracies on the foreseen estimates. The implementation of this long term prediction approach substantially changes the current scheme of online monitoring and aftermath mitigation into a prognosis and timely prevention to support expert-supervised decision making for telehealth.

About the speaker: Zimo Wang is an assistant professor in the Department of System Sciences and Industrial Engineering at Binghamton University who has conducted research projects in broad aspects of manufacturing and data analytics with strengths in sensors and AI for additive manufacturing and precision manufacturing processes. His research focuses on bridging sensor techniques, manufacturing processes and data science to create smart sensing approaches, develop machine learning approaches and integrate them in the cyber-physical platform to allow in-process characterizations of materials, diagnosis/prognosis of the processes to realize smart manufacturing processes and autonomous systems. Other current research interests include smart logistics on the shop floor using unmanned aerial vehicles (UAV), sustainable material characterizations and manufacturing as well as statistical modeling toward analyzing chaotic dynamic systems. Wang is a member of IISE, ASME and INFORMS. 


Data Analytics for Quality Assurance in Additive Manufacturing

Noon-1 p.m. Wednesday, Nov. 10, 2021
Join the Zoom meeting at

: Additive manufacturing (AM) offers unprecedented opportunities for creating customized, complex, and nonparametric geometry. Modern technologies have made AM accessible to practitioners in the area of advanced manufacturing and individuals outside of traditional production environments. However, to fully benefit from AM, geometric quality assurance is required before the full potential of AM is reached. The traditional geometry inspection methods depend on coordinate measuring machines and GD&T standards. However, these methods are slow, expensive to operate, and not well adopted in AM due to the freeform nature of 3D printed objects. 3D-scanners provide fast, high-density measurements, but the current analysis techniques based on point cloud data are faced with big challenges in feature-based inspection, alignment, and deviation quantification. In this seminar, advanced data analytics approaches will be introduced for both offline and online geometric quality assurance in AM processes. Other ongoing research on data analytics in advanced manufacturing will be discussed as well.

About the speaker: Yu (Chelsea) Jin received her BS degree in 2014 from the Department of Information Science and Technology at Jinan University in Guangzhou, China. In 2015, she received her master’s degree in manufacturing engineering from the University of Michigan at Ann Arbor. She earned her PhD degree in industrial engineering at the University of Arkansas - Fayetteville in May 2020, and she joined Binghamton University as an assistant professor of systems science and industrial engineering in fall 2020. Her research focuses on sensing and analytics for advanced manufacturing and service applications. She received the IISE Gilbreth Memorial Fellowship in both 2018 and 2019; the Kuroda Graduate Fellowship in Engineering, Graduate Research Award in 2019; and the Outstanding Graduate Student Award in 2020 from the University of Arkansas. Her work has been published in IISE Transactions and ASME Journal of Manufacturing Science and Engineering. She is an active member of ISEE and INFORMS.







Machine-Learning Assisted Investigation of Quantum Phase Transition in the Ising Model

Noon-1 p.m. Wednesday, Nov. 10, 2021
Join the Zoom meeting at

Meeting passcode is 13902

Abstract: A quantum phase transition is a phase transition at zero temperature induced by external parameters like a magnetic field, electrical field, pressure, etc. Since the system is at zero temperature and consequently no thermal fluctuations are present, the quantum phase transition results from the change of the ground state wavefunction due to the external parameters. Theoretical investigation on quantum phase transition has been an important but challenging subject because the related quantum model usually contains interactions that make solving the Schrodinger’s equation very difficult.

In this talk, we discuss an approach to investigate the quantum phase transition with the help of machine learning (ML) methods. Our approach is based on the fact that since the ground state wavefunction is a solution to the Schrodinger’s equation, it will behave in certain correlated ways subject to the physical constraints imposed by the governing equations. In other words, there almost certainly exists some lower-dimensional representation for the ground state wavefunction that is controlled by a small number of parameters. Our hypothesis is that integrating human knowledge of possible forms of lower dimensional representations into the design of ML architecture is crucial in building ML models for solving theoretically challenging problems with higher accuracy and generalizability. We test our approach in the Ising model in which the quantum phase transition is driven by applying magnetic field in different orientation with different strength. We adopt the matrix product state, a well-known quantum state capable of capturing the entanglement of the ground state wavefunction, as the lower dimensional representation and design the ML model to learn it. We find that the quantum phase transition could be accurately reproduced from our ML model, and our approach could be generalized to more challenging models, e.g., the quantum XY models and Heisenberg model. We will discuss how our approach of physics-informed ML architecture can be applied to other theoretical physics problems.

About the speakers: Wei-Cheng Lee is an associate professor in the Department of Physics, Applied Physics and Astronomy at Binghamton University. His major field of focus is Condensed Matter Theory, and he is currently focusing on unconventional superconductors, Mott systems and topological phases. His research is partially supported by NSF and AFOSR.

Kenneth Chiu is an associate professor in the Department of Computer Science at Binghamton University and his research interests are centered around cyberinfrastucture and systems, including cyberinfrastucture for instruments and sensors, web services performance and middleware for scientific computing. His research is partially supported by the NSF.

K-mer identification of allopolyploid subgenomes and chromosomal subcompartments

11 a.m.-noon Wednesday, Oct. 20, 2021
Join the Zoom meeting at

Abstract: Allopolyploids are species that have recently undergone a whole genome duplication through interspecies hybridization. These species, despite having a duplicated genome, are genetically diploid and have unique characteristics compared to unduplicated and autopolyploid (genome duplication within a single species). Previously allopolyploidy was confirmed by comparing to living diploid outgroups that contribute one of the subgenomes of the allopolyploid. Recently, this has been proven through the identification of differentially retained transposable elements. This signal is intrinsic to the polyploid, and does not require comparison to diploid outgroups, allowing cheaper identification of subgenomes in polyploids. I will describe our algorithm for subgenome identification through the mapping of k-mer density along assembled chromosomes, as well as describe how we use Hidden Markov Modeling to identify post-hybridization rearrangements between the subgenomes. In addition to identification of polyploid subgenomes, we have recently started using our k-mer method to identify sub-chromosomal compartments with unique transposon signatures.

About the speaker: Adam Session, assistant professor of biological sciences at Binghamton University, started his research career in behavioral genetics at Rutgers University, but during internships at Duke and UC Berkeley, became inspired to study genome evolution. His PhD focused on studying the genome of Xenopus laevis, a tetraploid frog commonly used in developmental biology. He was
able to prove the allotetraploid hypothesis of X. laevis using data intrinsic to the genome, as well as detail the asymmetric evolution that followed. For his postdoc, he focused on plant genome duplications as polyploidy is more common in plants. He helped sequence the Brachypodium hybridum, Miscanthus sinensis, and Panicum virgatum genomes, as well as develop new methods for identifying allopolyploids, and timing their hybridization through studying transposon evolution. Looking forward, he hopes to continue to study polyploids in plants, animals, and fungi, with the goal of unifying the techniques used to study these genomes in different lineages as well as use allopolyploids to further our understanding of speciation.

Epidemic Disease Modelling/Forecasting andSocial Network Clustering

11 a.m.-noon Wednesday, Oct. 6, 2021
Join the Zoom meeting at

Abstract: The COVID-19 pandemic has become a crucial public health problem in the world that disrupted the lives of millions in many countries including the United States. In this study, we present a decision analytic approach which is an efficient tool to assess the effectiveness of early social distancing measures in communities with different population characteristics. This study shows that decision analytic tools can help policy makers simulate different social distancing scenarios at the early stages of a global outbreak. In a relative study, we explored the role that traditional and hybrid in-person schooling modes contribute to community incidence of SARS-CoV-2 infections relative to fully remote schooling remains unknown. We conducted an event study using a retrospective nationwide study evaluating the effect of school mode on SARS-CoV-2 cases during the 12-weeks following school opening (July-September 2020, before delta was predominant), stratified by US Census region.

In the second part of the talk, I will describe a new optimization-based forecasting approach applied to public health challenges. Healthcare systems are one of the largest cost items for many countries. For example, in the US 17% of the GDP is spent on healthcare, yet there is abundant evidence that the system has many inefficiencies. In this talk, I will talk about how an operations researchbased approach can help improve efficiency in public health leveraging electronic medical health records. Specifically, I will describe a new forecasting algorithm that uses multiple data sources to timely and accurately predict the flu epidemic in the U.S. This hierarchical framework uses multilinear regression to combine forecasts from multiple data sources and greedy optimization with forward-selection to sequentially choose the most predictive combinations of data sources. We show that the systematic integration of complementary data sources can substantially improve forecast accuracy over single data sources. Using multiple data sources, this method achieves 15% more accuracy than a baseline model that only uses one data source while forecasting the Center for Disease Control and Prevention (CDC) influenza-like-illness reports. Furthermore, using this framework, I show that out of the more than 600 flu-related data sources some of the best predictions come from electronic health records. Lastly, a novel clique relaxation formulation based on clustering coefficients to find cohesive subgroups in networks will be covered.

About the speaker: Zeynep Ertem received her PhD in industrial and systems engineering at Texas A&M University. She held positions at USC Marshall School of Business and UT Austin before joining Binghamton in fall 2020. Her research interests include mathematical modeling and optimization in healthcare systems, including preparedness and responses for infectious diseases. Her papers are published in high impact journals including Nature Medicine, Decision Support Systems, PloS, etc. 

Who does what (and why) to protect the public in the pandemic? Dataset on the Institutional Origins of COVID-19 Public Health Policies

2-3 p.m. Wednesday, April 28, 2021
Join the Zoom meeting at

Abstract: This talk introduces the original dataset on the governmental origins of COVID19 public health policies worldwide. The dataset contains daily observations from the start of the pandemic and through July-December of 2020 depending on the country, coded at the level of ISO-2 units. It is publicly available through the website and, more fully, upon request.

We present the data and the research already published based on this data, and discuss things that could be done but are beyond our own analytic capacity.

We also introduce our planned next “big step” based off the dataset: a project to create a public-facing ‘living history’ dashboard of the COVID-19 pandemic. The multi-tabbed dashboard will offer interactive interfaces to link the core policy dataset with existing, publically-available datasets (e.g. biomedical data on rates of infection, mortality, and vaccination; or, socio-economic datasets to explore links between policy and social inequality) and curated crowd-sourced inputs. We hope that by linking politics, health, and society in a single, multi-tabbed COVID-19 dashboard, our ‘living history’ dashboard will serve as a resource for teachers, students, and the public. It would also potentially become a reality check to combat the opportunistic narratives of pandemic management.

About the speakers: Bradley Skopyk is an associate professor of history at Binghamton University, Olga Shvetsova is a professor of political science at Binghamton University and Andrei Zhirnov is an associate lecturer in quantiative methods in social science at the University of Exeter, UK.

Data Science Modeling for Shape and Genetic Data

2-3 p.m. Friday, April 16, 2021
Join the Zoom meeting at

Abstract: Inter-individual variation in biological shape is one of the most noticeable phenomena in nature, including leaf identification, ecology and evolution, human facial recognition, etc. As shape has been reported to have high heritability, unraveling genetic mysteries of shape has attracted a lot of attention in various disciplines.  Newest cutting-edge genotyping technologies dramatically revolutionize the landscape of contemporary quantitative shape mapping research. With the number of single nucleotide polymorphisms (SNPs) increasing from thousands to millions, unprecedented challenges are raised, in particular for the “double big data” that occur from both the high-dimensional morphological and genomic variables.  In this talk, we will introduce a data science modeling strategy driven by real-world datasets to identify influential genetic variables associated with the high-dimensional shape trait at increasing complexity and scale. Shape is inputted as an image and then described as a tensor or a high dimensional curve. Prevailing approaches either described shape by a loose number of landmark points, or individually modeled each genetic variable isolating other, and hence greatly limited potential of new findings. The presented data science modeling strategy will expand on traditional statistic methods to scale to handle dynamic traits that must be analyzed for today’s technique needs. At the end, we will discuss the possible extension in future work about our preparation on the 3D facial expression data with shape variations. 

About the speakers:

Guifang Fu is an assistant professor of mathematical sciences. She is interested in developing advanced statistical models and computational methodologies to unravel the genetic and environmental mechanisms that regulate complex biological traits, including morphology/shape, biomedical problems and disease. She is particularly interested in high-dimensional, “big data” modeling, and functional data analysis. Her research has been funded by the NSF among others.

Lijun Yin is a professor of computer science. His research focuses on computer vision, graphics, HCI, and multimedia, specifically on face and gesture modeling, analysis, recognition, animation, and expression understanding. His research has been funded by the NSF, AFRL/AFOSR, NYSTAR, and SUNY Health Network of Excellence. He received the prestigious James Watson Investigator Award of NYSTAR (2006) and SUNY Chancellor Award for Excellence in Scholarship and Creative Activities (2014).

Using Secondary Data to Tell a New Story: A Cautionary Tale in Health Information Technology Research

2-3 p.m. Friday, April 9, 2021
Join the Zoom meeting at

Abstract: Through the growth of big data and open data, new opportunities have presented themselves for information systems (IS) researchers who want to investigate phenomena they cannot easily study using primary data. As a result, many scholars have “retooled” their skills to leverage the large amount of readily available secondary data for analysis. In this confessional account, we share the story about how the first and second authors faced challenges when using secondary data for a research project in the health information technology domain. Through additional analysis of studies on health information technology that have used secondary data, we identified several themes of potential pitfalls that can occur when collecting, appropriating, and analyzing secondary data for a research project. We share these themes and relevant exemplars to help IS researchers avoid mistakes when using secondary data.

About the speaker: Sumantra Sarkar is an assiociate professor in the School of Management at Binghamton University. 

Optimal Data-driven Policies for Disease Screening

2-3 p.m. Friday, March 5, 2021
Join the Zoom meeting at

: Public health screening, which involves testing a large population for diseases using diseaserelated biomarkers, is an essential tool in a wide variety of settings, including newborn screening for genetic diseases. However, noisy information on the biomarker level, caused by external or subject-specific factors, introduces significant challenges to this problem. We design optimal data-driven biomarker screening policies to minimize subject misclassification errors, under noisy and uncertain biomarker measurements. Our case study on newborn screening for cystic fibrosis, which is based on a five-year data set from the North Carolina State Laboratory of Public Health, indicates that substantial reduction in classification errors can be achieved using the proposed optimization-based models, over current practices.

About the speaker: Saloumeh Sadeghzadeh is an assistant professor in the School of Management at Binghamton University. Her research interests lie at the intersection of stochastic modeling, optimization and data analytics methodologies, with a particular focus on data-driven
decision-making in healthcare and public policy domains. 

Teaching "Racism and Big Data"*

2-3 p.m. Friday, Nov. 20, 2020
Join the Zoom meeting at

Meeting ID: 926 2283 7791
Passcode: 5-digit zip code for Binghamton University

Abstract: Big Data is the fodder for data mining, data analytics, and predictive programs. In reflecting the social world of its collected data, the exciting realm of Big Data is the new frontier of racism and sexism. Although data are presented as value-free, and algorithms are portrayed as neutral computational bits and mathematical processes, they not only conceal they also automate racist and gender biases and racial and gender inequalities. The inherent biases in data mining have resulted in data analytic programs, predictive risk models, and facial recognition tools that are wreaking havoc in the lives of African Americans, Native Americans, people of color, and many other social and economically marginalized communities. This talk highlights the damaging real-life consequences of racially compromised Big Data and considers ways that racial justice literacy can responsibly combat algorithmic oppression.

About the speaker: Nkiru Nzegwu is a distinguished professor of Africana studies at Binghamton University. Her accomplishments are far-reaching. They range from writing the book Family Matters: Feminist Concepts in African Philosophy and editing a number of books to authoring dozens of articles and book chapters. She has curated 11 art exhibitions and produced seven exhibition catalogs. Her paintings and poetry have been featured in numerous publications. She has also received fellowships from the Smithsonian Institution, the Getty Foundation, Cornell University’s A.D. White Society for the Humanities, the Canada Council and the UCLA Institute for the Study of Gender in Africa.

She has also been named Professor Extraordinarius in the School of Transdiscipliary Research and Graduate Studies at the University of South Africa.

Her research interests are in feminist philosophy, African philosophy, African women studies, and African and African Diaspora art and aesthetics.

Read more about her online.


Understanding and Predicting Chronic Absenteeism from School in Autism Spectrum Disorder: A Data-Driven Approach*

2-3 p.m. Friday, Nov. 13, 2020
Join the Zoom meeting at

Meeting ID: 926 2283 7791
Passcode: 5-digit zip code for Binghamton University

Abstract: In 2020, the Centers for Disease Control and Prevention (CDC) reported that approximately 1 in 54 children in the U.S. are diagnosed with an autism spectrum disorder (ASD) and has been accompanied by a plethora of services and supporting to treat its symptoms. Along with the technical advances, the considerable numbers of data-driven approaches have shown promising results in the diagnosis of ASD based on the clinical data. However, there is still a lack of investigations in the treatment progression and impact of the education based on the behavioral data in ASD. Absenteeism from school is a serious public health issue for educators, mental health professionals, and families. Also, chronic absenteeism, which is defined as missing 10% or more of school days due to absence for any reason, makes it hard for a student to keep pace with school. However, school absenteeism with autism spectrum disorder (ASD) has received less attention, with only a small number of studies only examining older children with higher cognitive abilities and ASD. In this talk, we introduce the data-driven approaches based on machine learning techniques to provide insightful information for educators in special education for ASD children based on the predictive performance framework. Mainly, we develop an individualized school attendance forecasting model to predict future attendance based on the deep
recurrence neural network architecture, namely Long-short term memory (LSTM), to provide daily forecasting. In addition, we present another prediction model to provide a longer forecasting horizon (e.g., month) based on the standard machine learning architecture. Lastly, we extend this approach to the problematic behavior prediction, which is based on the non-stationary and non-linearity data. Through the data-driven approaches, including artificial intelligence, this work revolutionizes knowledge of how students with disabilities attend the schools, the relationship of in-school activities, and introduces predictive modeling to provide early intervention to improve attendance and educational outcomes.

About the speakers: Daehan Won is an assistant professor of systems science and industrial engineering at Binghamton University. He received his bachelor's and master's degrees from the Korean Advanced Institute of Science and Technology (KAIST) and his PhD in industrial and systems engineering from the University of Washington. His research interests lie in mathematical programming in large-scale programming and data analytics/mining for various healthcare and manufacturing fields. He is recently working on designing new platforms for smart electronics manufacturing system to cope with advances in industry 4.0. as well as healthcare informatics in biomedical engineering and psychology to address complexity in the human-related data.

Jennifer Gilis is a professor of psychology at Binghamton University. She is a three-time Binghamton University alumnus, having earned her bachelor's master's and doctoral degrees there. Her research interests include assessment and treatment issues for individuals with autism spectrum disorders (ASD) across the lifespan as well as service provider training and
development. The goal of her research is to develop and/or adapt treatment manuals and policies primarily to improve the social development, health, and learning of individuals with ASD. Gillis  especially interested in the effectiveness of interventions delivered in naturalistic settings and addressing factors that might account for differences in the delivery of treatment in the community as opposed to laboratory settings as well as factors that prevent access to educational and intervention settings.

Segmentation of Stimulated Raman Scattering Microscopy Images using Deep Neural Networks

2-3 p.m. Friday, Nov. 6, 2020
Join the Zoom meeting at

Meeting ID: 926 2283 7791
Passcode: 5-digit zip code for Binghamton University

Abstract: Medical procedures are often limited by the precision and accuracy of modern equipment. Acquiring dyed images of cancer cells can often require upwards of half an hour in a lab using traditional histological staining. During brain surgery procedures, every minute spent analyzing cancer tissue is wasted. A promising alternative, stimulated Raman scattering (SRS) microscopy uses microscopic lipid and protein vibrations to provide rapid cell nuclei imaging without the use of stains or other processes (label-free).

However, when identifying cancer cells, it is critical for doctors to be able to visualize the density and distribution of cell nuclei. Basic thresholding techniques are commonly used for counting the nuclei of H&E stained cells.  Unfortunately, the low signal to noise contrast of SRS images hampers these techniques. The contrast between cell nuclei and their cell membrane is far less for SRS images versus traditional staining techniques.

Our research focuses on applying image segmentation techniques to highlight and count cell nuclei from SRS microscopy images. We utilize the U-Net and Mask R-CNN image segmentation architectures to segment cell nuclei from their surrounding membrane. With this result, we accurately produce many critical statistics such as nuclei counts and area. We also generate images that mimic the contrast of H&E stained cells.

Our machine learning techniques allow SRS microscopy to match the accuracy of histological staining in much less time. This research demonstrates the advantage of applying machine learning techniques in a surgical setting. 

About the speakers: Frank Lu and Kenneth Chiu, with Adiel Felsen

Frank Lu is an assistant professor of biomedical engineering at Binghamton University. He received his bachelor's and master's degrees from Zhejiang University, and his PhD from the National University of Singapore. His research interests include multiphoton microscopy, label-free digital histopathology, Stimulated Raman Scattering (SRS) microscopy, live-cell imaging, dynamic imaging into the tumor microenvironment, optical bioimaging for neurosurgical guidance and neuro-oncologic studies and computer-visi.

Kenneth Chiu joined the Department of Computer Science as an assistant professor in September 2004. He is currently an associate professor there. He received his undergraduate degree from Princeton University, and his PhD from Indiana University. He is currently a PI or co-PI on three cyberinfrastructure-related NSF grants. His research interests are in high-performance computing, big data and bioinformatics.

Adiel Felsen is an undergraduate student studying computer science and mathematics at Binghamton Univeristy.

Creating a Voter Sentiment Index by Reading Tweets

3-4 p.m. Friday, Oct. 23, 2020
Join the Zoom meeting at

Meeting ID: 926 2283 7791
Passcode: 5-digit zip code for Binghamton University

Abstract: We use natural language processing to read hundreds of thousands of tweets every day, and conduct sentiment analysis. We hope to uncover voters’ opinions towards the two presidential candidates. Preliminary results show that the indices clearly respond to news, and correlate with changes in average national polls. In September, the sentiment was slightly more pro-Biden. Verified influencers were more pro-Biden, while unverified influencers were much more pro-Trump.

About the speakers: Wei Xiao is a professor of economics and director of undergraduate studies in the Department of Economics at Binghamton University. He joined the faculty at Binghamton in 2006, after receiving his bachelor's degree from Shandong University, China; his master's degree at Peking University, China; and his PhD at the University of Pittsburgh. Prior to Binghamton, he was at the University of New Orleans. He teaches macroeconomics and monetary policy and his research area is behavior and bounded rationality in macroeconomics.

Christo Tarazi is a PhD candidate in economics at Binghamton University. His areas of specialization are macroeconomics and econometrics. His current research focuses on using artificial intelligence to answer economic questions.

A Data-Driven Approach to Understanding Socio-Technical Issues*

2-3 p.m. Friday, Oct. 16, 2020
Join the Zoom meeting at

Meeting ID: 926 2283 7791
Passcode: 5-digit zip code for Binghamton University

Abstract: Social media has changed the world in dramatic fashion. The Web's democratization of information, backed up by advances in networking, big data processing, and scalable distributed systems has had unforeseen, oftentimes negative consequences on society. While the world has become increasingly smaller and better connected due to technology, fringe, extremist communities have also benefited. In this talk, I will present a body of work that approaches this problem domain to provide large-scale, quantitative understanding of these pivotal socio-technical issues. I will begin by showing how small, relatively unknown fringe Web communities can have outsized effects on much larger social media platforms. I will also present our work on understanding how image memes, a modern form of information enabled by the Web, can be understood through the lens of data-science. In particular, I will discuss how our data-driven approach reveals the extent to which anti-Semitic ideology has been integrated into the meme ecosystem. While understanding the behavior of hateful fringe communities is a vital first step to addressing dangerous societal concerns, another important issue is understanding how these fringe communities evolve in the first place. To this end, I will show that quantitative methods can be used to map out how the Manosphere, ironically born from a male, feminist ideologue, has fractured into increasingly extreme misogynistic subcommunities. Finally, I will discuss potential mitigation strategies for the issues at hand. More specifically, I will show how it is possible to predict whether or not content will be the target of coordinated hate attacks. Additionally, I will present preliminary work that quantitatively reasons about the effects of community-level interventions from a multi-platform perspective. The scale, dynamism, and depth of the Web results in huge barriers to addressing the societal damage it enables. However, the work presented in this talk evidences the power of data-driven research for advanced modeling and analysis in understanding and mitigating these increasingly vital problems. 

About the speaker: Jeremy Blackburn joined Binghamton University in fall 2019. He is broadly interested in data science, with a focus on large-scale measurements and modeling. His largest line of work is in understanding jerks on the Internet. His research into understanding toxic behavior, hate speech, and fringe and extremist Web communities has been covered in the press by The Washington Post, The New York Times, The Atlantic, The Wall Street Journal, the BBC and New Scientist, among others.

Prior to his appointment at Binghamton, Blackburn was an assistant professor in the Department of Computer Science at the University of Alabama at Birmingham. Prior to that, he was an associate researcher at Telefonica research in Barcelona, Spain.

He received his bachelor's, master's and doctoral degrees from the University of South Florida. 

"Mining" interpreting corpora data: What has been done and how to move forward

Noon Wednesday, March 25, 2020
Offered via Zoom. Join at

Abstract: Interpreting is an activity that has existed since ancient times and occurs in different aspects of everyday life. There is a large amount of interpreting-related data that is worthy of scientific exploration. Nevertheless, the systematic collection and analysis of such data were not possible until the recent advancement of the subdiscipline of corpus-based interpreting studies.

This talk will sketch the field of corpus-based interpreting studies, including its recent developments and challenges, and then zoom in on the discussion of the Chinese/English Political Interpreting Corpus (CEPIC, hk/cepic/), a major project that the presenter worked on in the past five years. The CEPIC, result of a digital scholarship project, is a 6.5-million-word-token corpus that includes transcripts of politician talks, and their translation and interpreting during the past 21 years. The corpus is Part-of-Speech (POS) tagged and annotated with rich spoken language features. The talk will introduce the process of the CEPIC data collection, the ways to make use of the transcribed and annotated data to do research on political interpreting and translation, and then illustrate the possible further development and expansion of the corpus. The presenter will also briefly discuss her on-going work of two other interpreting corpora: one on lexical cohesion of student interpreters and translators, and the other on the perceived role of interpreters.

It is hoped that the talk will provide some thoughts and inspirations for future interdisciplinary work on data science and translation/interpreting studies, and help to identify potential synergies between the two. 

About the speaker: Jun Pan is associate professor in the Department of Translation, Interpreting and Intercultural Studies at Hong Kong Baptist University and now visiting faculty at the Translation Research and Instruction Program at Binghamton University. She works as managing editor of Bandung: Journal of the Global South (Brill) and review editor of The Interpreter and Translator Trainer (Taylor & Francis). She is also hon secretary of the Hong Kong Translators Society and chair – International Relations of the Hong Kong Association of University Women (an NGO affiliated to the Graduate Women International [GWI]). Her research interests include digital humanities and interpreting/translation studies, corpus-based interpreting/translation studies, interpreting/translation and political discourse, learner factors in interpreter training, professionalism in interpreting and bibliometric studies. Her articles are included in journals including Target: International Journal of Translation Studies, The Interpreter and Translator TrainerPerspectives: Studies in TranslatologinTRAlinea, and others. She has also published with Routledge, Springer, Peter Lang, etc. 

Pan recently accomplished a digital scholarship project: "The Chinese/English Political Interpreting Corpus" and relevant publications can be accessed online.

Tracking students to understand career outcomes

1 p.m. Tuesday, April 30, 2019, in the Benet Alumni Lounge, Old O'Connor Hall

Abstract: Binghamton University spends significant resources on understanding and addressing student career outcomes as one of the key indicators of success and campus accountability (SP2). The office of Student Affairs Assessment and Strategic Initiatives (SAASI) surveys graduating seniors to collect information about post-graduation plans. How can we optimize the use of this information to better understand the career pathways and identify the key factors behind of our graduating seniors' career success?

We combined the responses of the senior survey with a diverse set of indicators available on students' characteristics (e.g. gender, race, GPA, and financial aid) and activities (e.g. internship, career counseling) to understand factors affecting placement. Using descriptive and inferential statistics, we examined the differences between placed and unplaced students, which enabled us to identify student success factors. This work is just one part of a broader effort to use analytics to help Binghamton University not only to improve retention and graduation rates but also to ensure student employability. With SAS data management and reporting capabilities, we can use data to help students engage in the activities that ensure career readiness.

About the speaker: Manar Sabry is senior assistant director of strategic analysis at the Office of Student Affairs Assessment and Strategic Analysis at Binghamton University.

Using Big Data to analyze workforce needs: Applications for Improving equity for students with disabilities

1 p.m. Tuesday, April 9, 2019, in the Benet Alumni Lounge, Old O'Connor Hall

Abstract: The shortage of individuals fully prepared as special education teachers has been described as persistent, chronic and uneven. To date, researchers have taken advantage of large, nationally-representative surveys to demonstrate that the shortage unevenly impacts students with disabilities in high-poverty schools and those who attend schools meant to provide specialized services. The emergence of statewide longitudinal data systems that track teachers and students provides new opportunities for refining investigations, forecasting shortages and analyzing the effectiveness of policies. In the first half, I'll provide an overview of my research on the special education teacher workforce using restricted datasets from the Institute of Education Sciences within the U.S. Department of Education. In the second half, I'll share planned next steps and potential ideas for collaboration related to existing and developing statewide datasets.

About the speaker: Lucky Mason-Williams is an associate professor in the Department of Teaching, Learning and Educational Leadership in the College of Community and Public Affairs. Her research interests revolve around the preparation and qualifications of special education teachers. She is a member of the Data Science TAE Steering Committee. 

Combining machine learning methods to good effect

1 p.m. Tuesday, March 5, 2019, in the Benet Alumni Lounge, Old O'Connor Hall

Abstract: Many machine pattern discovery datasets are in the form of an array with a row for each observation and a column for each measurement (feature, variable). We might call these static machine learning tasks since there is only one feature vector for each observation (case, subject). To apply supervised machine pattern discovery methods, one also must know some outcome for each case (a correct diagnosis, a true class, ...). These outcomes are often a yes/no (i.e. binary) class, but multiclass methods are also available. In addition, the outcome may be a continuous numerical value we desire to predict.

In this talk, Schaffer will present such an example task: the detection of Alzheimer's disease from a sample of a subject's speech. He will describe a hybrid approach to this pattern discovery task involving a genetic algorithm for feature subset selection combined with a support vector machine that computes an accuracy "fitness" for each feature subset. He may show an animation of this algorithm in action. The approach employs two levels of cross-validation to combat overfitting. The method also employs an extension of the concept of area under the ROC curve. At the end, several candidate feature subsets are then combined with an ensemble method called the GRNN oracle, a maximum likelihood, minimum variance, unbiased estimator. The final results will be shown for the Alzheimer's speech data.

About the speaker: J. David Schaffer is a visiting research professor in the College of Community and Public Affairs at Binghamton University. He is affiliated with the school's Institute for Justice and Well-Being, a research institute that advances global health, progressive education and well-being for marginalized populations by implementing cutting-edge, interdisciplinary research and educational opportunities with communities and people across the lifespan and the globe. Its researchers span professions and disciplines including counseling, education, engineering, human development, medicine, nursing, pharmacy, psychology and social work.

The Role of Data Science in Evolving the Future of Binghamton and Beyond

1 p.m. Tuesday, Feb. 12, 2019, in the Anderson Center Reception Room

Abstract: The idea that public policy formulation and implementation must be a managed process of cultural evolution is gaining traction around the world. Binghamton can be a hub of activity and has made a good start in some respects, although not yet in a way that yet involves much sophisticated data science. The first half of my talk will provide an overview of the "Evolving the Future" movement, both locally and globally. The second half will be devoted to brainstorming about how data scientists at Binghamton can contribute to the movement.

About the speaker: David Sloan Wilson is SUNY Distinguished Professor of Biological Sciences and Anthropology at Binghamton University, where he founded EvoS, Binghamton University's campus-wide evolutionary studies program. He is also president of the Evolution Institute ( a nonprofit devoted to policy formulation and implementation from an evolutionary perspective. This places him at the center of the "Evolving the Future" movement, both worldwide and locally. His newest book, This View of Life: Completing the Darwinian Revolution, communicates this vision to a general audience.

Data and the Future of the Humanities: Cases from Digital Art History

Noon Monday, Nov. 19, 2018, in LN-1302C, the Zurack High-Technology Collaboration Center

Abstract: Researchers in the humanities are actively exploring new technologies and innovative computational methods to reinvigorate long-standing approaches to the study of literature, art, culture and society. Brought together under the umbrella of the digital humanities, these initiatives have opened up a new set of analytical pathways for scholars across history, art history, literature, philosophy and other disciplines. They have also brought humanities scholars into direct communication with those in otherwise distant disciplines, such as computer science, geography and mathematics. In this talk, we will explore the goals and methods of the digital humanities, using cases from art history as springboards. We will also highlight the ways in which the core practices of the digital humanities intersect, but also diverge from the goals and outlook of other domains in data science.

About the speaker: Nancy Um received her MA and PhD in Islamic art and architectural history from UCLA. She joined the art history faculty at Binghamton University in 2001.

Her research examines the visual culture and built environments of trading communities around the western Indian Ocean rim in the early modern period. She has authored two books, The Merchant Houses of Mocha: Trade and Architecture in an Indian Ocean Port (Seattle: University of Washington Press, 2009), and Shipped but Not Sold: Material Culture and the Social Protocols of Trade during Yemen's Age of Coffee (Honolulu: University of Hawai'i Press, 2017).

She has conducted field and archival research in Turkey, Yemen, the Netherlands, and England. She is the recipient of a number of research fellowships, including from the Fulbright program, the National Endowment for the Humanities, the Getty Foundation and the American Institute for Yemeni Studies.

She currently serves as reviews editor of The Art Bulletin.

Robust tracking and behavioral modeling of movements of biological collectives from ordinary video recordings

11:45 a.m. Monday, Nov. 5, 2018, in AD-148

Abstract: We developed a computational method to extract information about interactions among individuals with different behavioral states in a biological collective from ordinary video recordings. Assuming that individuals are acting as finite state machines, our method first detects discrete behavioral states of those individuals and then constructs a model of their state transitions, taking into account the positions and states of other individuals in the vicinity. We tested the proposed method through applications to two real-world biological collectives: termites in an experimental setting and human pedestrians in a university campus. For each application, a robust tracking system was developed in-house, utilizing interactive human intervention (for termite tracking) or online agent-based simulation (for pedestrian tracking). In both cases, significant interactions were detected between nearby individuals with different states, demonstrating the effectiveness of the proposed method.

About the speaker: Hiroki Sayama is a Binghamton University professor and researcher who has an extensive experience in teaching and research on complex systems science and engineering, network science, computational social science, mathematical modeling and simulation, artificial life/chemistry, and computer and information sciences. He earned his bachelor's, master's and doctoral degrees in information science from the University of Tokyo, Japan, in 1994, 1996 and 1999, respectively. He is founder and director of the Center for Collective Dynamics of Complex Systems (CoCo) at Binghamton.

Advancing brain functional imaging in early diagnosis and treatment therapy

Noon Monday, Sept. 24, 2018, AD-148

Abstract: Imaging provides the primary means for assessing brain structure and function in humans in vivo. For many clinical situations, brain imaging biomarkers have the potential to provide greater sensitivity and specificity than clinical indices for differential diagnosis and management of brain disorders. While structural changes have shown to provide valuable information to disease diagnosis, change in regional brain function may be more dynamic and provide even greater sensitivity to early diagnosis, disease progression or response to therapy. However, the potential of functional imaging has not yet been realized because of some technical hurdles and lack of effective image processing tools to detect disease-sensitive imaging biomarkers. The talk will introduce the status quo of brain image processing and challenge faced for detecting imaging biomarkers. Development of new computational methods and applications of the computational methods towards the sensitive biomarkers are in need for medical image processing. Recent advancement in the computational methods in diabetes disease and bipolar disorder will be presented.

About the speaker: Weiying Dai received a BS in mathematics from Peking University and a PhD in computer science from University of Pittsburgh. Before joining Binghamton University in 2015, she was an instructor at Beth Israel Deaconess Medical Center and Harvard Medical School. Her research interests include brain mapping, neuroimaging, blood flow imaging, biomedical image processing, pattern recognition, computer vision and information retrieval. She, together with her collaborators, invented and advanced a Magnetic Resonance Imaging (MRI) technique that has been implemented on GE MRI scanners and become a popular clinical imaging tool for quantitatively measuring blood flow as it moves through the body. Dai is a selected Junior Fellow of the International Society of Magnetic Resonance in Medicine (ISMRM).

Healthcare Data Analysis: Challenges and Opportunities

10:30 a.m. Thursday, April 26, 2018, in SW-114.

Abstract: The expansion of Binghamton University's portfolio in pharmacy, pharmaceutical sciences, and allied health sciences occurs simultaneously with new emphases in data science and analytics. The convergence of these two major themes (healthcare and data science) represents very exciting opportunities for new collaborative scholarly activities. However, healthcare data also come with unique challenges for its acquisition and management. As part of the Data Science Salon series, ​we will discuss some of the challenges and opportunities for using healthcare data in research activities.​ Examples utilizing public use (i.e. free) datasets as well as federal (Medicare) data will be presented. The data salon will also highlight current national research priorities and analytical trends in healthcare research.

About the speaker: Leon Cosler is associate professor ​and founding chair of the Department of Heath Outcomes and Administrative Sciences in Binghamton University's ​School of Pharmacy and Pharmaceutical Sciences. Cosler obtained his ​​PhD from ​Union College (Schenectady, N.Y.) in health systems administration. ​Before joining academia, he was the director of research for the NYS Medicaid Program in Albany, N.Y. His research interests ​include healthcare database analyses and economic modeling applied to therapeutic areas such as HIV/AIDS, substance abuse, oncology, long-term care and patterns of prescription drug use.

March 29, 2018, 10:30 a.m., SW-114.

Cognitively-inspired Tools for Machine Learning

As part of the Data Science Salon series, we present a perspective from the field of cognitive science. Our goal in this line of work is to take advances from our laboratory addressing psychological explanation of human concept learning and applying them to challenges in machine learning. We provide an overview of our theoretical approach and report on progress in application areas including classification and collaborative filtering.

March 1, 2018, 10:30 a.m., SW-114.

The Data Science of Gerrymandering

In early October 2017, the Supreme Court of the United States heard oral arguments in the case of Gill v. Whitford. The case calls into question the constitutionality of partisan gerrymandering – the practice of drawing boundaries of districts in such a way that one political party receives an unfair advantage. The plaintiffs in the case argued that data scientists brought to bear a set of tools that allowed Wisconsin's legislature and governor to draw and enact maps that favored the Republican party. By every estimation, the Republicans' strategy was extremely effective. For example, in the 2012 elections, Republican candidates received just 48 percent of the vote while managing to carry 60 percent of the districts in the state.

Just as data science can be used to build an unfair advantage, data science can be used to identify and remedy these unfair redistricting practices in Wisconsin and elsewhere. With the seed grant from the Binghamton University Data Science Transdisciplinary Working Group, the team will implement an algorithm that PI Magleby developed with Daniel Mosesson on a high performance computing cluster. In a forthcoming paper, the team show that the algorithm produces maps without any indication of bias. Moreover, the method the team propose is vastly more efficient that alternatives. Access to a cluster will allow to use the algorithm to draw hundreds of millions of hypothetical maps. That large number of counter-factual maps will allow to make inferences about the impact of certain redistricting criteria that mapmakers have used as a defense of partisan outcomes. In particular, it will allow to understand the ways that considerations like race, communities of interest, and other political jurisdictions interact with the biased, partisan outcomes that analysts have observed in recent decades.

November 29, 2017

The ecological genetics of species divergence in the era of big data.

In recent years, the development of next generation sequencing technology has provided an unprecedented ability to incorporate genomic techniques into the study of non-model organisms. For example, the study of ecological genetics of adaptive trait variation was previously limited to small numbers of molecular markers that required laborious methods to develop. However, recent developments put transcriptomic and genomic datasets within reach of nearly any organism, producing both opportunities and challenges. Similarly, advances in remote sensing and geographic information systems have produced global scale datasets of climatological data, presenting new opportunities to examine the interaction between ecological divergence and abiotic sources of selection. In this data salon, I will provide a brief introduction to the big data issues faced in the field of ecological genetics with non-model organisms. I will use an example study system in my lab, which includes a pair of recently diverged species of western North American wildflowers. My goal will be to illustrate how big data sets are impacting my ability to approach current and classic questions in the field, and to identify some emerging questions that arise at the intersection of biology and data science.

September 19, 2017

Big Geospatial Data: Challenges and opportunities in Geographic Information Science.

Geospatial data describes the locations of spatial features, which is the basic but unique element in Geographic Information Science (GIS). It is important to convert geospatial data into knowledge and understandings of human activities, and to apply them to support policy making. With the explosion of various remote sensing devices and human sensors, it is now possible to obtain big geospatial data to characterize every aspect of our physical and human environment. However, it is still challenging to process, analyze and integrate such big geospatial data for applied practices (such as urban sustainability), which requires effective interdisciplinary analytical methods. In this talk, spatial big data analytics and discovery will be introduced, with examples mostly from my research. Challenges and opportunities of big geospatial data will also be discussed.

August 9, 2017

It's all about regression.

Regression models have been one of the most powerful tools in statistical modeling. However, vast varieties of available models tailor-made for specific challenges arisen from different data patterns can be confusing, if not intimidating. I intend to give a brief nontechnical tour of this core branch of Statistics for data scientists and hopefully to provoke some ideas through discussions.

July 12, 2017

Statistical Machine Learning and Data-intensive Research

Statistical machine learning merges statistics with the computational sciences, namely, computer science, systems science and optimization. Much of the methods and theory in statistical machine learning is driven by applied problems in science and technology, where data streams are increasingly large-scale, dynamical and heterogeneous, and where mathematical and algorithmic creativity are required. Fields such as bioinformatics, artificial intelligence, signal processing, networking, finance, game theory and control theory are all being heavily influenced by developments in statistical machine learning. In this explorative talk, I will discuss some potential research agenda related to applications of statistical machine learning that are currently being undertaken by top-notch researchers on and outside of campus.

June 14, 2017

Big Data and the Urban Commons: Finding Our Niche

Terms such as "Smart Cities", "Responsive Cities", "Urban Commons", and "Co-production" signal a revolution in the governance of cities with the help of big data and with major cities such as Boston leading the way. An example is 311, a number that can be called to report any dysfunction, which turns the residents of a city into part of a distributed system for detecting and making improvements. A Data Science TAE can help the Binghamton area become part of this movement. In addition, we can occupy a distinctive niche within the movement in a way that involves all of the academic disciplines, including the social social sciences and humanities.