Data Salon

Data Salon Schedule

The need to address racism and social justice issues touches all aspects of life. Data science is no exception. In looking at how data science has enabled social injustice in the past and how data scientists can instead work to improve the social good, the Data Science Transdisciplinary Area of Excellence will bring in speakers under its Equitable Data Science umbrella. These speakers will look into issues such as racism, fringe groups on the internet, the fairness of AI and other topics that will move conversations in the data science realm forward to be inclusive and equitable. All talks in this special series on Equitable Data Science will be identified with an asterisk (*) next to the title of the talk.

Who does what (and why) to protect the public in the pandemic? Dataset on the Institutional Origins of COVID-19 Public Health Policies

2-3 p.m. Wednesday, April 28, 2021
Join the Zoom meeting at https://binghamton.zoom.us/j/92622837791?pwd=RWFReXU4bDVUaFFGWmlDWDU4NXd2UT09

Abstract: This talk introduces the original dataset on the governmental origins of COVID19 public health policies worldwide. The dataset contains daily observations from the start of the pandemic and through July-December of 2020 depending on the country, coded at the level of ISO-2 units. It is publicly available through the website and, more fully, upon request.

We present the data and the research already published based on this data, and discuss things that could be done but are beyond our own analytic capacity.

We also introduce our planned next “big step” based off the dataset: a project to create a public-facing ‘living history’ dashboard of the COVID-19 pandemic. The multi-tabbed dashboard will offer interactive interfaces to link the core policy dataset with existing, publically-available datasets (e.g. biomedical data on rates of infection, mortality, and vaccination; or, socio-economic datasets to explore links between policy and social inequality) and curated crowd-sourced inputs. We hope that by linking politics, health, and society in a single, multi-tabbed COVID-19 dashboard, our ‘living history’ dashboard will serve as a resource for teachers, students, and the public. It would also potentially become a reality check to combat the opportunistic narratives of pandemic management.

About the speakers: Bradley Skopyk is an associate professor of history at Binghamton University, Olga Shvetsova is a professor of political science at Binghamton University and Andrei Zhirnov is an associate lecturer in quantiative methods in social science at the University of Exeter, UK.

Data Science Modeling for Shape and Genetic Data

2-3 p.m. Friday, April 16, 2021
Join the Zoom meeting at https://binghamton.zoom.us/j/92622837791?pwd=RWFReXU4bDVUaFFGWmlDWDU4NXd2UT09

Abstract: Inter-individual variation in biological shape is one of the most noticeable phenomena in nature, including leaf identification, ecology and evolution, human facial recognition, etc. As shape has been reported to have high heritability, unraveling genetic mysteries of shape has attracted a lot of attention in various disciplines.  Newest cutting-edge genotyping technologies dramatically revolutionize the landscape of contemporary quantitative shape mapping research. With the number of single nucleotide polymorphisms (SNPs) increasing from thousands to millions, unprecedented challenges are raised, in particular for the “double big data” that occur from both the high-dimensional morphological and genomic variables.  In this talk, we will introduce a data science modeling strategy driven by real-world datasets to identify influential genetic variables associated with the high-dimensional shape trait at increasing complexity and scale. Shape is inputted as an image and then described as a tensor or a high dimensional curve. Prevailing approaches either described shape by a loose number of landmark points, or individually modeled each genetic variable isolating other, and hence greatly limited potential of new findings. The presented data science modeling strategy will expand on traditional statistic methods to scale to handle dynamic traits that must be analyzed for today’s technique needs. At the end, we will discuss the possible extension in future work about our preparation on the 3D facial expression data with shape variations. 

About the speakers:

Guifang Fu is an assistant professor of mathematical sciences. She is interested in developing advanced statistical models and computational methodologies to unravel the genetic and environmental mechanisms that regulate complex biological traits, including morphology/shape, biomedical problems and disease. She is particularly interested in high-dimensional, “big data” modeling, and functional data analysis. Her research has been funded by the NSF among others.

Lijun Yin is a professor of computer science. His research focuses on computer vision, graphics, HCI, and multimedia, specifically on face and gesture modeling, analysis, recognition, animation, and expression understanding. His research has been funded by the NSF, AFRL/AFOSR, NYSTAR, and SUNY Health Network of Excellence. He received the prestigious James Watson Investigator Award of NYSTAR (2006) and SUNY Chancellor Award for Excellence in Scholarship and Creative Activities (2014).

Using Secondary Data to Tell a New Story: A Cautionary Tale in Health Information Technology Research

2-3 p.m. Friday, April 9, 2021
Join the Zoom meeting at https://binghamton.zoom.us/j/92622837791?pwd=RWFReXU4bDVUaFFGWmlDWDU4NXd2UT09

Abstract: Through the growth of big data and open data, new opportunities have presented themselves for information systems (IS) researchers who want to investigate phenomena they cannot easily study using primary data. As a result, many scholars have “retooled” their skills to leverage the large amount of readily available secondary data for analysis. In this confessional account, we share the story about how the first and second authors faced challenges when using secondary data for a research project in the health information technology domain. Through additional analysis of studies on health information technology that have used secondary data, we identified several themes of potential pitfalls that can occur when collecting, appropriating, and analyzing secondary data for a research project. We share these themes and relevant exemplars to help IS researchers avoid mistakes when using secondary data.

About the speaker: Sumantra Sarkar is an assiociate professor in the School of Management at Binghamton University. 

Optimal Data-driven Policies for Disease Screening

2-3 p.m. Friday, March 5, 2021
Join the Zoom meeting at https://binghamton.zoom.us/j/92622837791?pwd=RWFReXU4bDVUaFFGWmlDWDU4NXd2UT09


Abstract
: Public health screening, which involves testing a large population for diseases using diseaserelated biomarkers, is an essential tool in a wide variety of settings, including newborn screening for genetic diseases. However, noisy information on the biomarker level, caused by external or subject-specific factors, introduces significant challenges to this problem. We design optimal data-driven biomarker screening policies to minimize subject misclassification errors, under noisy and uncertain biomarker measurements. Our case study on newborn screening for cystic fibrosis, which is based on a five-year data set from the North Carolina State Laboratory of Public Health, indicates that substantial reduction in classification errors can be achieved using the proposed optimization-based models, over current practices.

About the speaker: Saloumeh Sadeghzadeh is an assistant professor in the School of Management at Binghamton University. Her research interests lie at the intersection of stochastic modeling, optimization and data analytics methodologies, with a particular focus on data-driven
decision-making in healthcare and public policy domains. 

Teaching "Racism and Big Data"*

2-3 p.m. Friday, Nov. 20, 2020
Join the Zoom meeting at https://binghamton.zoom.us/j/92622837791?pwd=RWFReXU4bDVUaFFGWmlDWDU4NXd2UT09


Meeting ID: 926 2283 7791
Passcode: 5-digit zip code for Binghamton University

Abstract: Big Data is the fodder for data mining, data analytics, and predictive programs. In reflecting the social world of its collected data, the exciting realm of Big Data is the new frontier of racism and sexism. Although data are presented as value-free, and algorithms are portrayed as neutral computational bits and mathematical processes, they not only conceal they also automate racist and gender biases and racial and gender inequalities. The inherent biases in data mining have resulted in data analytic programs, predictive risk models, and facial recognition tools that are wreaking havoc in the lives of African Americans, Native Americans, people of color, and many other social and economically marginalized communities. This talk highlights the damaging real-life consequences of racially compromised Big Data and considers ways that racial justice literacy can responsibly combat algorithmic oppression.

About the speaker: Nkiru Nzegwu is a distinguished professor of Africana studies at Binghamton University. Her accomplishments are far-reaching. They range from writing the book Family Matters: Feminist Concepts in African Philosophy and editing a number of books to authoring dozens of articles and book chapters. She has curated 11 art exhibitions and produced seven exhibition catalogs. Her paintings and poetry have been featured in numerous publications. She has also received fellowships from the Smithsonian Institution, the Getty Foundation, Cornell University’s A.D. White Society for the Humanities, the Canada Council and the UCLA Institute for the Study of Gender in Africa.

She has also been named Professor Extraordinarius in the School of Transdiscipliary Research and Graduate Studies at the University of South Africa.

Her research interests are in feminist philosophy, African philosophy, African women studies, and African and African Diaspora art and aesthetics.

Read more about her online.

 

Understanding and Predicting Chronic Absenteeism from School in Autism Spectrum Disorder: A Data-Driven Approach*

2-3 p.m. Friday, Nov. 13, 2020
Join the Zoom meeting at https://binghamton.zoom.us/j/92622837791?pwd=RWFReXU4bDVUaFFGWmlDWDU4NXd2UT09

Meeting ID: 926 2283 7791
Passcode: 5-digit zip code for Binghamton University

Abstract: In 2020, the Centers for Disease Control and Prevention (CDC) reported that approximately 1 in 54 children in the U.S. are diagnosed with an autism spectrum disorder (ASD) and has been accompanied by a plethora of services and supporting to treat its symptoms. Along with the technical advances, the considerable numbers of data-driven approaches have shown promising results in the diagnosis of ASD based on the clinical data. However, there is still a lack of investigations in the treatment progression and impact of the education based on the behavioral data in ASD. Absenteeism from school is a serious public health issue for educators, mental health professionals, and families. Also, chronic absenteeism, which is defined as missing 10% or more of school days due to absence for any reason, makes it hard for a student to keep pace with school. However, school absenteeism with autism spectrum disorder (ASD) has received less attention, with only a small number of studies only examining older children with higher cognitive abilities and ASD. In this talk, we introduce the data-driven approaches based on machine learning techniques to provide insightful information for educators in special education for ASD children based on the predictive performance framework. Mainly, we develop an individualized school attendance forecasting model to predict future attendance based on the deep
recurrence neural network architecture, namely Long-short term memory (LSTM), to provide daily forecasting. In addition, we present another prediction model to provide a longer forecasting horizon (e.g., month) based on the standard machine learning architecture. Lastly, we extend this approach to the problematic behavior prediction, which is based on the non-stationary and non-linearity data. Through the data-driven approaches, including artificial intelligence, this work revolutionizes knowledge of how students with disabilities attend the schools, the relationship of in-school activities, and introduces predictive modeling to provide early intervention to improve attendance and educational outcomes.

About the speakers: Daehan Won is an assistant professor of systems science and industrial engineering at Binghamton University. He received his bachelor's and master's degrees from the Korean Advanced Institute of Science and Technology (KAIST) and his PhD in industrial and systems engineering from the University of Washington. His research interests lie in mathematical programming in large-scale programming and data analytics/mining for various healthcare and manufacturing fields. He is recently working on designing new platforms for smart electronics manufacturing system to cope with advances in industry 4.0. as well as healthcare informatics in biomedical engineering and psychology to address complexity in the human-related data.

Jennifer Gilis is a professor of psychology at Binghamton University. She is a three-time Binghamton University alumnus, having earned her bachelor's master's and doctoral degrees there. Her research interests include assessment and treatment issues for individuals with autism spectrum disorders (ASD) across the lifespan as well as service provider training and
development. The goal of her research is to develop and/or adapt treatment manuals and policies primarily to improve the social development, health, and learning of individuals with ASD. Gillis  especially interested in the effectiveness of interventions delivered in naturalistic settings and addressing factors that might account for differences in the delivery of treatment in the community as opposed to laboratory settings as well as factors that prevent access to educational and intervention settings.

Segmentation of Stimulated Raman Scattering Microscopy Images using Deep Neural Networks

2-3 p.m. Friday, Nov. 6, 2020
Join the Zoom meeting at https://binghamton.zoom.us/j/92622837791?pwd=RWFReXU4bDVUaFFGWmlDWDU4NXd2UT09


Meeting ID: 926 2283 7791
Passcode: 5-digit zip code for Binghamton University

Abstract: Medical procedures are often limited by the precision and accuracy of modern equipment. Acquiring dyed images of cancer cells can often require upwards of half an hour in a lab using traditional histological staining. During brain surgery procedures, every minute spent analyzing cancer tissue is wasted. A promising alternative, stimulated Raman scattering (SRS) microscopy uses microscopic lipid and protein vibrations to provide rapid cell nuclei imaging without the use of stains or other processes (label-free).

However, when identifying cancer cells, it is critical for doctors to be able to visualize the density and distribution of cell nuclei. Basic thresholding techniques are commonly used for counting the nuclei of H&E stained cells.  Unfortunately, the low signal to noise contrast of SRS images hampers these techniques. The contrast between cell nuclei and their cell membrane is far less for SRS images versus traditional staining techniques.

Our research focuses on applying image segmentation techniques to highlight and count cell nuclei from SRS microscopy images. We utilize the U-Net and Mask R-CNN image segmentation architectures to segment cell nuclei from their surrounding membrane. With this result, we accurately produce many critical statistics such as nuclei counts and area. We also generate images that mimic the contrast of H&E stained cells.

Our machine learning techniques allow SRS microscopy to match the accuracy of histological staining in much less time. This research demonstrates the advantage of applying machine learning techniques in a surgical setting. 

About the speakers: Frank Lu and Kenneth Chiu, with Adiel Felsen

Frank Lu is an assistant professor of biomedical engineering at Binghamton University. He received his bachelor's and master's degrees from Zhejiang University, and his PhD from the National University of Singapore. His research interests include multiphoton microscopy, label-free digital histopathology, Stimulated Raman Scattering (SRS) microscopy, live-cell imaging, dynamic imaging into the tumor microenvironment, optical bioimaging for neurosurgical guidance and neuro-oncologic studies and computer-visi.

Kenneth Chiu joined the Department of Computer Science as an assistant professor in September 2004. He is currently an associate professor there. He received his undergraduate degree from Princeton University, and his PhD from Indiana University. He is currently a PI or co-PI on three cyberinfrastructure-related NSF grants. His research interests are in high-performance computing, big data and bioinformatics.

Adiel Felsen is an undergraduate student studying computer science and mathematics at Binghamton Univeristy.

Creating a Voter Sentiment Index by Reading Tweets

3-4 p.m. Friday, Oct. 23, 2020
Join the Zoom meeting at https://binghamton.zoom.us/j/92622837791?pwd=RWFReXU4bDVUaFFGWmlDWDU4NXd2UT09

Meeting ID: 926 2283 7791
Passcode: 5-digit zip code for Binghamton University

Abstract: We use natural language processing to read hundreds of thousands of tweets every day, and conduct sentiment analysis. We hope to uncover voters’ opinions towards the two presidential candidates. Preliminary results show that the indices clearly respond to news, and correlate with changes in average national polls. In September, the sentiment was slightly more pro-Biden. Verified influencers were more pro-Biden, while unverified influencers were much more pro-Trump.

About the speakers: Wei Xiao is a professor of economics and director of undergraduate studies in the Department of Economics at Binghamton University. He joined the faculty at Binghamton in 2006, after receiving his bachelor's degree from Shandong University, China; his master's degree at Peking University, China; and his PhD at the University of Pittsburgh. Prior to Binghamton, he was at the University of New Orleans. He teaches macroeconomics and monetary policy and his research area is behavior and bounded rationality in macroeconomics.

Christo Tarazi is a PhD candidate in economics at Binghamton University. His areas of specialization are macroeconomics and econometrics. His current research focuses on using artificial intelligence to answer economic questions.

A Data-Driven Approach to Understanding Socio-Technical Issues*

2-3 p.m. Friday, Oct. 16, 2020
Join the Zoom meeting at https://binghamton.zoom.us/j/92622837791?pwd=RWFReXU4bDVUaFFGWmlDWDU4NXd2UT09

Meeting ID: 926 2283 7791
Passcode: 5-digit zip code for Binghamton University

Abstract: Social media has changed the world in dramatic fashion. The Web's democratization of information, backed up by advances in networking, big data processing, and scalable distributed systems has had unforeseen, oftentimes negative consequences on society. While the world has become increasingly smaller and better connected due to technology, fringe, extremist communities have also benefited. In this talk, I will present a body of work that approaches this problem domain to provide large-scale, quantitative understanding of these pivotal socio-technical issues. I will begin by showing how small, relatively unknown fringe Web communities can have outsized effects on much larger social media platforms. I will also present our work on understanding how image memes, a modern form of information enabled by the Web, can be understood through the lens of data-science. In particular, I will discuss how our data-driven approach reveals the extent to which anti-Semitic ideology has been integrated into the meme ecosystem. While understanding the behavior of hateful fringe communities is a vital first step to addressing dangerous societal concerns, another important issue is understanding how these fringe communities evolve in the first place. To this end, I will show that quantitative methods can be used to map out how the Manosphere, ironically born from a male, feminist ideologue, has fractured into increasingly extreme misogynistic subcommunities. Finally, I will discuss potential mitigation strategies for the issues at hand. More specifically, I will show how it is possible to predict whether or not content will be the target of coordinated hate attacks. Additionally, I will present preliminary work that quantitatively reasons about the effects of community-level interventions from a multi-platform perspective. The scale, dynamism, and depth of the Web results in huge barriers to addressing the societal damage it enables. However, the work presented in this talk evidences the power of data-driven research for advanced modeling and analysis in understanding and mitigating these increasingly vital problems. 

About the speaker: Jeremy Blackburn joined Binghamton University in fall 2019. He is broadly interested in data science, with a focus on large-scale measurements and modeling. His largest line of work is in understanding jerks on the Internet. His research into understanding toxic behavior, hate speech, and fringe and extremist Web communities has been covered in the press by The Washington Post, The New York Times, The Atlantic, The Wall Street Journal, the BBC and New Scientist, among others.

Prior to his appointment at Binghamton, Blackburn was an assistant professor in the Department of Computer Science at the University of Alabama at Birmingham. Prior to that, he was an associate researcher at Telefonica research in Barcelona, Spain.

He received his bachelor's, master's and doctoral degrees from the University of South Florida. 

"Mining" interpreting corpora data: What has been done and how to move forward

Noon Wednesday, March 25, 2020
Offered via Zoom. Join at https://binghamton.zoom.us/j/973587781


Abstract: Interpreting is an activity that has existed since ancient times and occurs in different aspects of everyday life. There is a large amount of interpreting-related data that is worthy of scientific exploration. Nevertheless, the systematic collection and analysis of such data were not possible until the recent advancement of the subdiscipline of corpus-based interpreting studies.

This talk will sketch the field of corpus-based interpreting studies, including its recent developments and challenges, and then zoom in on the discussion of the Chinese/English Political Interpreting Corpus (CEPIC, https://digital.lib.hkbu.edu. hk/cepic/), a major project that the presenter worked on in the past five years. The CEPIC, result of a digital scholarship project, is a 6.5-million-word-token corpus that includes transcripts of politician talks, and their translation and interpreting during the past 21 years. The corpus is Part-of-Speech (POS) tagged and annotated with rich spoken language features. The talk will introduce the process of the CEPIC data collection, the ways to make use of the transcribed and annotated data to do research on political interpreting and translation, and then illustrate the possible further development and expansion of the corpus. The presenter will also briefly discuss her on-going work of two other interpreting corpora: one on lexical cohesion of student interpreters and translators, and the other on the perceived role of interpreters.

It is hoped that the talk will provide some thoughts and inspirations for future interdisciplinary work on data science and translation/interpreting studies, and help to identify potential synergies between the two. 

About the speaker: Jun Pan is associate professor in the Department of Translation, Interpreting and Intercultural Studies at Hong Kong Baptist University and now visiting faculty at the Translation Research and Instruction Program at Binghamton University. She works as managing editor of Bandung: Journal of the Global South (Brill) and review editor of The Interpreter and Translator Trainer (Taylor & Francis). She is also hon secretary of the Hong Kong Translators Society and chair – International Relations of the Hong Kong Association of University Women (an NGO affiliated to the Graduate Women International [GWI]). Her research interests include digital humanities and interpreting/translation studies, corpus-based interpreting/translation studies, interpreting/translation and political discourse, learner factors in interpreter training, professionalism in interpreting and bibliometric studies. Her articles are included in journals including Target: International Journal of Translation Studies, The Interpreter and Translator TrainerPerspectives: Studies in TranslatologinTRAlinea, and others. She has also published with Routledge, Springer, Peter Lang, etc. 

Pan recently accomplished a digital scholarship project: "The Chinese/English Political Interpreting Corpus" and relevant publications can be accessed online.

Tracking students to understand career outcomes

1 p.m. Tuesday, April 30, 2019, in the Benet Alumni Lounge, Old O'Connor Hall


Abstract: Binghamton University spends significant resources on understanding and addressing student career outcomes as one of the key indicators of success and campus accountability (SP2). The office of Student Affairs Assessment and Strategic Initiatives (SAASI) surveys graduating seniors to collect information about post-graduation plans. How can we optimize the use of this information to better understand the career pathways and identify the key factors behind of our graduating seniors' career success?

We combined the responses of the senior survey with a diverse set of indicators available on students' characteristics (e.g. gender, race, GPA, and financial aid) and activities (e.g. internship, career counseling) to understand factors affecting placement. Using descriptive and inferential statistics, we examined the differences between placed and unplaced students, which enabled us to identify student success factors. This work is just one part of a broader effort to use analytics to help Binghamton University not only to improve retention and graduation rates but also to ensure student employability. With SAS data management and reporting capabilities, we can use data to help students engage in the activities that ensure career readiness.

About the speaker: Manar Sabry is senior assistant director of strategic analysis at the Office of Student Affairs Assessment and Strategic Analysis at Binghamton University.

Using Big Data to analyze workforce needs: Applications for Improving equity for students with disabilities

1 p.m. Tuesday, April 9, 2019, in the Benet Alumni Lounge, Old O'Connor Hall

Abstract: The shortage of individuals fully prepared as special education teachers has been described as persistent, chronic and uneven. To date, researchers have taken advantage of large, nationally-representative surveys to demonstrate that the shortage unevenly impacts students with disabilities in high-poverty schools and those who attend schools meant to provide specialized services. The emergence of statewide longitudinal data systems that track teachers and students provides new opportunities for refining investigations, forecasting shortages and analyzing the effectiveness of policies. In the first half, I'll provide an overview of my research on the special education teacher workforce using restricted datasets from the Institute of Education Sciences within the U.S. Department of Education. In the second half, I'll share planned next steps and potential ideas for collaboration related to existing and developing statewide datasets.

About the speaker: Lucky Mason-Williams is an associate professor in the Department of Teaching, Learning and Educational Leadership in the College of Community and Public Affairs. Her research interests revolve around the preparation and qualifications of special education teachers. She is a member of the Data Science TAE Steering Committee. 

Combining machine learning methods to good effect

1 p.m. Tuesday, March 5, 2019, in the Benet Alumni Lounge, Old O'Connor Hall

Abstract: Many machine pattern discovery datasets are in the form of an array with a row for each observation and a column for each measurement (feature, variable). We might call these static machine learning tasks since there is only one feature vector for each observation (case, subject). To apply supervised machine pattern discovery methods, one also must know some outcome for each case (a correct diagnosis, a true class, ...). These outcomes are often a yes/no (i.e. binary) class, but multiclass methods are also available. In addition, the outcome may be a continuous numerical value we desire to predict.

In this talk, Schaffer will present such an example task: the detection of Alzheimer's disease from a sample of a subject's speech. He will describe a hybrid approach to this pattern discovery task involving a genetic algorithm for feature subset selection combined with a support vector machine that computes an accuracy "fitness" for each feature subset. He may show an animation of this algorithm in action. The approach employs two levels of cross-validation to combat overfitting. The method also employs an extension of the concept of area under the ROC curve. At the end, several candidate feature subsets are then combined with an ensemble method called the GRNN oracle, a maximum likelihood, minimum variance, unbiased estimator. The final results will be shown for the Alzheimer's speech data.

About the speaker: J. David Schaffer is a visiting research professor in the College of Community and Public Affairs at Binghamton University. He is affiliated with the school's Institute for Justice and Well-Being, a research institute that advances global health, progressive education and well-being for marginalized populations by implementing cutting-edge, interdisciplinary research and educational opportunities with communities and people across the lifespan and the globe. Its researchers span professions and disciplines including counseling, education, engineering, human development, medicine, nursing, pharmacy, psychology and social work.

The Role of Data Science in Evolving the Future of Binghamton and Beyond

1 p.m. Tuesday, Feb. 12, 2019, in the Anderson Center Reception Room

Abstract: The idea that public policy formulation and implementation must be a managed process of cultural evolution is gaining traction around the world. Binghamton can be a hub of activity and has made a good start in some respects, although not yet in a way that yet involves much sophisticated data science. The first half of my talk will provide an overview of the "Evolving the Future" movement, both locally and globally. The second half will be devoted to brainstorming about how data scientists at Binghamton can contribute to the movement.

About the speaker: David Sloan Wilson is SUNY Distinguished Professor of Biological Sciences and Anthropology at Binghamton University, where he founded EvoS, Binghamton University's campus-wide evolutionary studies program. He is also president of the Evolution Institute (https://evolution-institute.org) a nonprofit devoted to policy formulation and implementation from an evolutionary perspective. This places him at the center of the "Evolving the Future" movement, both worldwide and locally. His newest book, This View of Life: Completing the Darwinian Revolution, communicates this vision to a general audience.

Data and the Future of the Humanities: Cases from Digital Art History

Noon Monday, Nov. 19, 2018, in LN-1302C, the Zurack High-Technology Collaboration Center

Abstract: Researchers in the humanities are actively exploring new technologies and innovative computational methods to reinvigorate long-standing approaches to the study of literature, art, culture and society. Brought together under the umbrella of the digital humanities, these initiatives have opened up a new set of analytical pathways for scholars across history, art history, literature, philosophy and other disciplines. They have also brought humanities scholars into direct communication with those in otherwise distant disciplines, such as computer science, geography and mathematics. In this talk, we will explore the goals and methods of the digital humanities, using cases from art history as springboards. We will also highlight the ways in which the core practices of the digital humanities intersect, but also diverge from the goals and outlook of other domains in data science.

About the speaker: Nancy Um received her MA and PhD in Islamic art and architectural history from UCLA. She joined the art history faculty at Binghamton University in 2001.

Her research examines the visual culture and built environments of trading communities around the western Indian Ocean rim in the early modern period. She has authored two books, The Merchant Houses of Mocha: Trade and Architecture in an Indian Ocean Port (Seattle: University of Washington Press, 2009), and Shipped but Not Sold: Material Culture and the Social Protocols of Trade during Yemen's Age of Coffee (Honolulu: University of Hawai'i Press, 2017).

She has conducted field and archival research in Turkey, Yemen, the Netherlands, and England. She is the recipient of a number of research fellowships, including from the Fulbright program, the National Endowment for the Humanities, the Getty Foundation and the American Institute for Yemeni Studies.

She currently serves as reviews editor of The Art Bulletin.

Robust tracking and behavioral modeling of movements of biological collectives from ordinary video recordings

11:45 a.m. Monday, Nov. 5, 2018, in AD-148

Abstract: We developed a computational method to extract information about interactions among individuals with different behavioral states in a biological collective from ordinary video recordings. Assuming that individuals are acting as finite state machines, our method first detects discrete behavioral states of those individuals and then constructs a model of their state transitions, taking into account the positions and states of other individuals in the vicinity. We tested the proposed method through applications to two real-world biological collectives: termites in an experimental setting and human pedestrians in a university campus. For each application, a robust tracking system was developed in-house, utilizing interactive human intervention (for termite tracking) or online agent-based simulation (for pedestrian tracking). In both cases, significant interactions were detected between nearby individuals with different states, demonstrating the effectiveness of the proposed method.

About the speaker: Hiroki Sayama is a Binghamton University professor and researcher who has an extensive experience in teaching and research on complex systems science and engineering, network science, computational social science, mathematical modeling and simulation, artificial life/chemistry, and computer and information sciences. He earned his bachelor's, master's and doctoral degrees in information science from the University of Tokyo, Japan, in 1994, 1996 and 1999, respectively. He is founder and director of the Center for Collective Dynamics of Complex Systems (CoCo) at Binghamton.

Advancing brain functional imaging in early diagnosis and treatment therapy

Noon Monday, Sept. 24, 2018, AD-148

Abstract: Imaging provides the primary means for assessing brain structure and function in humans in vivo. For many clinical situations, brain imaging biomarkers have the potential to provide greater sensitivity and specificity than clinical indices for differential diagnosis and management of brain disorders. While structural changes have shown to provide valuable information to disease diagnosis, change in regional brain function may be more dynamic and provide even greater sensitivity to early diagnosis, disease progression or response to therapy. However, the potential of functional imaging has not yet been realized because of some technical hurdles and lack of effective image processing tools to detect disease-sensitive imaging biomarkers. The talk will introduce the status quo of brain image processing and challenge faced for detecting imaging biomarkers. Development of new computational methods and applications of the computational methods towards the sensitive biomarkers are in need for medical image processing. Recent advancement in the computational methods in diabetes disease and bipolar disorder will be presented.

About the speaker: Weiying Dai received a BS in mathematics from Peking University and a PhD in computer science from University of Pittsburgh. Before joining Binghamton University in 2015, she was an instructor at Beth Israel Deaconess Medical Center and Harvard Medical School. Her research interests include brain mapping, neuroimaging, blood flow imaging, biomedical image processing, pattern recognition, computer vision and information retrieval. She, together with her collaborators, invented and advanced a Magnetic Resonance Imaging (MRI) technique that has been implemented on GE MRI scanners and become a popular clinical imaging tool for quantitatively measuring blood flow as it moves through the body. Dai is a selected Junior Fellow of the International Society of Magnetic Resonance in Medicine (ISMRM).

Healthcare Data Analysis: Challenges and Opportunities

10:30 a.m. Thursday, April 26, 2018, in SW-114.

Abstract: The expansion of Binghamton University's portfolio in pharmacy, pharmaceutical sciences, and allied health sciences occurs simultaneously with new emphases in data science and analytics. The convergence of these two major themes (healthcare and data science) represents very exciting opportunities for new collaborative scholarly activities. However, healthcare data also come with unique challenges for its acquisition and management. As part of the Data Science Salon series, ​we will discuss some of the challenges and opportunities for using healthcare data in research activities.​ Examples utilizing public use (i.e. free) datasets as well as federal (Medicare) data will be presented. The data salon will also highlight current national research priorities and analytical trends in healthcare research.

About the speaker: Leon Cosler is associate professor ​and founding chair of the Department of Heath Outcomes and Administrative Sciences in Binghamton University's ​School of Pharmacy and Pharmaceutical Sciences. Cosler obtained his ​​PhD from ​Union College (Schenectady, N.Y.) in health systems administration. ​Before joining academia, he was the director of research for the NYS Medicaid Program in Albany, N.Y. His research interests ​include healthcare database analyses and economic modeling applied to therapeutic areas such as HIV/AIDS, substance abuse, oncology, long-term care and patterns of prescription drug use.

March 29, 2018, 10:30 a.m., SW-114.

Cognitively-inspired Tools for Machine Learning

As part of the Data Science Salon series, we present a perspective from the field of cognitive science. Our goal in this line of work is to take advances from our laboratory addressing psychological explanation of human concept learning and applying them to challenges in machine learning. We provide an overview of our theoretical approach and report on progress in application areas including classification and collaborative filtering.

March 1, 2018, 10:30 a.m., SW-114.

The Data Science of Gerrymandering

In early October 2017, the Supreme Court of the United States heard oral arguments in the case of Gill v. Whitford. The case calls into question the constitutionality of partisan gerrymandering – the practice of drawing boundaries of districts in such a way that one political party receives an unfair advantage. The plaintiffs in the case argued that data scientists brought to bear a set of tools that allowed Wisconsin's legislature and governor to draw and enact maps that favored the Republican party. By every estimation, the Republicans' strategy was extremely effective. For example, in the 2012 elections, Republican candidates received just 48 percent of the vote while managing to carry 60 percent of the districts in the state.

Just as data science can be used to build an unfair advantage, data science can be used to identify and remedy these unfair redistricting practices in Wisconsin and elsewhere. With the seed grant from the Binghamton University Data Science Transdisciplinary Working Group, the team will implement an algorithm that PI Magleby developed with Daniel Mosesson on a high performance computing cluster. In a forthcoming paper, the team show that the algorithm produces maps without any indication of bias. Moreover, the method the team propose is vastly more efficient that alternatives. Access to a cluster will allow to use the algorithm to draw hundreds of millions of hypothetical maps. That large number of counter-factual maps will allow to make inferences about the impact of certain redistricting criteria that mapmakers have used as a defense of partisan outcomes. In particular, it will allow to understand the ways that considerations like race, communities of interest, and other political jurisdictions interact with the biased, partisan outcomes that analysts have observed in recent decades.

November 29, 2017

The ecological genetics of species divergence in the era of big data.

In recent years, the development of next generation sequencing technology has provided an unprecedented ability to incorporate genomic techniques into the study of non-model organisms. For example, the study of ecological genetics of adaptive trait variation was previously limited to small numbers of molecular markers that required laborious methods to develop. However, recent developments put transcriptomic and genomic datasets within reach of nearly any organism, producing both opportunities and challenges. Similarly, advances in remote sensing and geographic information systems have produced global scale datasets of climatological data, presenting new opportunities to examine the interaction between ecological divergence and abiotic sources of selection. In this data salon, I will provide a brief introduction to the big data issues faced in the field of ecological genetics with non-model organisms. I will use an example study system in my lab, which includes a pair of recently diverged species of western North American wildflowers. My goal will be to illustrate how big data sets are impacting my ability to approach current and classic questions in the field, and to identify some emerging questions that arise at the intersection of biology and data science.

September 19, 2017

Big Geospatial Data: Challenges and opportunities in Geographic Information Science.

Geospatial data describes the locations of spatial features, which is the basic but unique element in Geographic Information Science (GIS). It is important to convert geospatial data into knowledge and understandings of human activities, and to apply them to support policy making. With the explosion of various remote sensing devices and human sensors, it is now possible to obtain big geospatial data to characterize every aspect of our physical and human environment. However, it is still challenging to process, analyze and integrate such big geospatial data for applied practices (such as urban sustainability), which requires effective interdisciplinary analytical methods. In this talk, spatial big data analytics and discovery will be introduced, with examples mostly from my research. Challenges and opportunities of big geospatial data will also be discussed.

August 9, 2017

It's all about regression.

Regression models have been one of the most powerful tools in statistical modeling. However, vast varieties of available models tailor-made for specific challenges arisen from different data patterns can be confusing, if not intimidating. I intend to give a brief nontechnical tour of this core branch of Statistics for data scientists and hopefully to provoke some ideas through discussions.

July 12, 2017

Statistical Machine Learning and Data-intensive Research

Statistical machine learning merges statistics with the computational sciences, namely, computer science, systems science and optimization. Much of the methods and theory in statistical machine learning is driven by applied problems in science and technology, where data streams are increasingly large-scale, dynamical and heterogeneous, and where mathematical and algorithmic creativity are required. Fields such as bioinformatics, artificial intelligence, signal processing, networking, finance, game theory and control theory are all being heavily influenced by developments in statistical machine learning. In this explorative talk, I will discuss some potential research agenda related to applications of statistical machine learning that are currently being undertaken by top-notch researchers on and outside of campus.

June 14, 2017

Big Data and the Urban Commons: Finding Our Niche

Terms such as "Smart Cities", "Responsive Cities", "Urban Commons", and "Co-production" signal a revolution in the governance of cities with the help of big data and with major cities such as Boston leading the way. An example is 311, a number that can be called to report any dysfunction, which turns the residents of a city into part of a distributed system for detecting and making improvements. A Data Science TAE can help the Binghamton area become part of this movement. In addition, we can occupy a distinctive niche within the movement in a way that involves all of the academic disciplines, including the social social sciences and humanities.