Seed Grant Program

Seed grants are awarded with funding provided by the Binghamton University Road Map through the Provost's Office and the Division of Research.

The goal of these seed grants is to encourage faculty to develop collaborative projects that stimulate the advancement of new ideas that can build Binghamton University's expertise toward a national reputation in the broad area of data science. This competitive, peer-reviewed program is providing initial support for proposed long-term programs of collaborative research that have strong potential to attract external funding.

The call for proposals for seed grant funding for the 2020–2021 academic year, including an overview, an explanation of the process and eligibility, a proposal cover page and a proposal budget page is available on this website. The Data Science TAE has the following supplemental information for those interested in applying for seed grant funding:

Data Science TAE Supplementary Guideline to the Seed Grant Program (2020-2021)


Any project that proposes to study the foundation of data science by developing theories, novel computational, mathematical and statistical methods, algorithms, software, data curation, analysis, visualization and mining tools; to apply these theories, methods, algorithms, software, analysis and tools to address questions or generate hypotheses in the natural sciences, health sciences, social sciences, management science, the humanities and engineering disciplines; or to explore the implications of big data, data science or artificial intelligence (including their societal and ethical impacts), is eligible for the Seed Grant Program.

Broader Impacts

The mission of the Data Science TAE is to advance the frontiers of data science by fostering collaborative and innovative research; provide transdisciplinary learning opportunities in data science at all levels; engage with local and global partners to develop solutions for real-world problems; establish a progressive and scalable infrastructure to support the dynamic needs of all data science activities. For this reason, we encourage proposals, when applicable, to include the following information in the project narrative: 

  • A description of a plan of educational activities in terms of students’ research supervision and cross-disciplinary training, and the project’s potential for curricular development.
  • A description of the potential of the project to engage with local, regional or global communities, industries and research institutes. 

Capacity Building

In addition to regular research projects (see details in the university-wide RFP), we will also consider projects for our Seed Grant Program that build individuals’ and groups’ capacity to carry out high-quality research, enhance communities of researchers on campus and broaden the pool of researchers that can conduct fundamental research in data-related or -enabled fields. We seek to support projects that build competencies to carry out research in these fields, particularly in areas in which a shift to a quantitative, computational and analytical approach can greatly broaden the avenues of research and enrich the funding opportunities. These projects may propose activities such as organizing workshops and training events; developing courses; developing mentoring and coaching initiatives; and fostering community networks.

In addition, teams may use a capacity building grant through our Seed Grant Program to plan for applying for large center or institute grants (such as national centers or NYS Centers of Excellence). The funds may be used to develop ideas, facilitate team formation and foster stakeholder community networks. As a result of planning-grant activities, potential teams should be better equipped to carry out center-scale research with large societal impact.

Proposals must clearly identify the community or the constituent that the proposed capacity building project will serve. If the identified community or constituents largely overlap with one or a few departments/units, support letters from the department chairs or associate deans for research are preferred.

Funding Opportunities

All proposals to our Seed Grant Program, including regular research proposals and capacity building proposals, should include a plan for submitting proposals to external funding agencies and demonstrate the ability to attract future federal, state, philanthropic or private funding.

In particular, capacity building proposals should make a case for expanded funding opportunities due to the planned activities. 

Staff in the Division of Research and the Steering Committee of the Data Science TAE can assist teams in identifying external funding opportunities if needed.

Speaking to the Data Science TAE community

Funded teams are obligated to speak at an event organized by the Data Science TAE to report the outcomes of the funded project.

For the 2018–2019 academic year, the following seed grant was awarded:

Using Data Science to Decipher Processing-Structure-Property-Performance Relationships of Additively manufactured metals

Lead scientist: Congrui Jin, mechanical engineering 

During the last decade, various additive manufacturing techniques have been developed for the processing of complex metallic components. However, our understanding of the Processing-Structure-Property-Performance (PSPP) relationships of additively manufactured metals has not kept pace with the proliferation of the systems put into service. In particular, for additively manufactured high-temperature components, accurate prediction of their mechanical properties, such as creep rupture and fatigue strength, becomes a fundamentally significant issue. The overarching goal of the proposed research is to explore data science techniques to decipher PSPP relationships of additively manufactured metals, especially to predict creep rupture and fatigue strength of additively manufactured high-temperature components based on the processing parameters and material micro-structures. The proposed project will be the first application of data science techniques to study additively manufactured materials. Successful accomplishment of this research will result in highly reliable causal linkages among processing parameters, material micro-structures, and their mechanical properties, which can be utilized to provide us multiple optimal solutions for a specific application. This interdisciplinary effort couples the expertise of Congrui Jin and Pu Zhang in additive manufacturing and the expertise of Sanjeena Dang in data science. This work will provide the necessary preliminary results to aggressively seek external grants.      

For the 2017-2018 academic year, the following seed grants were awarded:

Adaptive Network Modeling of Real-World Temporal Social Networks

Lead scientist: Hiroky Sayama, systems science and industrial engineering

The objective of this proposal is to develop algorithms and software that can overcome the challenges identified in existing temporal network analysis methods and effectively produce mechanistic, dynamical models from real-world temporal social network data. The data can involve temporarily varying network size and state-topology coevolution, which would not be captured in existing analytical methods.

Modeling and analysis of temporal social networks has attracted a lot of attention in various disciplines. A number of research methods have been proposed for temporal network analysis, but they are limited in capturing certain temporal dynamics, such as addition or removal of nodes, changes of node states, transitions of mesoscopic structures, and state-topology coevolution. An illustrative example is customers' network---new customers may join, some old customers may leave, their preferences may change because of social influence, and their social ties may also change based on their preferences. These temporal social network dynamics are essential in understanding the customers' behaviors, but they are not fully captured by existing methods. What is currently missing is a modeling/analysis tool for generating more detailed, more mechanistic dynamical models that can describe those nontrivial temporal social network dynamics in a uniform, tractable way.

To meet the aforementioned need, the PIs have adopted a unique, unconventional approach to model temporal network dynamics as a "computational" process, represented by repeated extraction and replacement of subgraphs. Prototype versions of algorithms and software have demonstrated promising results for small-scale, simulated network data, yet there are still algorithmic challenges: How can one handle a high volume of noise and temporal sparseness of real-world temporal social network data, and how can one automatically discover nontrivial dynamical models beyond user-provided ones and generalize them to unobserved situations? The proposed project aims to address these challenges.

Automated Generation of Urban Land Use Data by Integrating Remote Sensing and Social Sensing

Lead scientist: Chengbin Deng, geography

Land use and land cover (LULC) data provides invaluable spatial-explicit and functional information of urban lands transformed by human beings. There are a large number of detailed land use types in a heterogeneous urban environment, including single-family, multi-family, commercial, industrial, transportation, and civic land. Such information is helpful to city administrators, scholars and researchers, public health officials, and especially, urban planners for a variety of purposes. Detailed land use data has served as an important input in socioeconomic studies and planning practices. Nowadays, detailed urban land use information relies heavily on manual digitizing, local knowledge from field surveys, as well as other data sources (e.g., building permit records, appraisal materials, census information). Rapid urban expansion requires frequent updates of urban land use data, which is always time consuming and labor intensive. Public information such as tax payment or tax status are also updated and included in the latest databases. It is still very difficult, and almost impossible, to implement automated urban land use updates. Therefore, generating accurate and timely urban land use products in a more manageable time framework can provide a more intelligent approach for a variety of applied practices and urban studies.

In this proposal, we proposed a new method to address the major gaps in traditional urban land use acquisition. This will be done by state-of-the-art statistical learning methods, including random forests, to integrate and analyze geospatial and social big data. On the one hand, remote sensing data provides environmental information of urban physical environments. On the other hand, social media data provides sufficient information of human activities. Eventually, our long term goal is to automatically generate and update land use products by integrating such geospatial open datasets. This will significantly improve the efficiency of LULC mapping to support sustainable urban planning and other practices.

Development of an Intelligent Mental Disease Prediction System Prototype based on Dietary Pattern Analysis: a Pilot Study

Lead scientist: Lina Begdache, health and wellness studies

Nutrition and mental health research is an emerging interdisciplinary field. Nutrition is one of the modifiable risk factors for mental health. Traditionally, studies on the association of diet and mental distress have focused on single nutrients; however current trends in nutritional epidemiology research is leaning toward assessing dietary patterns in relation to comorbidities. This rationale considers the complexity of nutrient interaction and the daily variation in diet. The human brain is continuously changing during development or with age. Therefore, dietary changes may necessitate with age. Our lab has established a prototype that describes the relationship between a healthy diet, exercise, healthy practices and mental wellbeing. Eating healthy may promote healthy habits and mental wellbeing by elevating dopamine levels in the brain. Mental wellbeing then acts as a positive reinforcement to further healthy diet, healthy practices and exercise to improve health. This loop can become a virtuous cycle optimizing mental health. When healthy diet, exercise or healthy practices are absent, lower dopamine levels depresses mood which in turn reduces healthy diet, exercise and healthy practices resulting in a vicious cycle which reflects that mental distress is multidimensional. In addition, individuals have genetic variations, and so the approach of "one size fits all" is losing ground as often medications don't work effectively. Personalized therapy is at the forefront of Precision Medicine, an emerging approach for disease treatment. The significance of this research is that it will support development of targeted nutritional interventions to better mood which will increase precision of other therapies.

Multitask Transfer Learning Enhanced Rare Event Detection using Sensing Data

Lead scientist: Changqing Cheng, systems science and industrial engineering

Rare events are those that often occur at low frequency but with catastrophic consequence, e.g., seismic activity, stock market flash crash, and terrorism attacks. While most of such events are not preventable, the accurate and timely detection will enable promote actions to significantly reduce the severity of the effect and the associated cost. Recently, the widespread of wireless sensors and smart devices have offered an unprecedented opportunity to monitor various complex systems, from manufacturing to healthcare. Remarkably, the time series sensing data contain considerable causal information about the underlying dynamics, and enable us to harness fundamental patterns for diagnosis, prognosis and decision making. Thus, the objective of this study is to design an integrated platform for process monitoring, particularly the rare event detection, using the sensing data. Nonetheless, the inherent nonlinearity and nonstationarity of the sensing data have increasingly become a persistent challenge for sensing-driven process monitoring. Therefore, we propose to design a multitask transfer learning approach to fuse information from multiple sensing sources to enhance the monitoring resolution.

Particularly, abrupt changes in ultra-precision machining exemplify an immense challenge faced by the modern advanced manufacturing. As shown in the Figure, the machining process experiences a scratch on the workpiece surface at time index 10,000. Offline measurement indicates that the surface roughness deteriorates to 82 nm from 35 nm before the scratch occurs. A timely detection of such events from the in situ vibration signals will enable corrective actions to avoid escalating cost.

The (Data) Science of Gerrymandering

Lead scientist: Daniel B. Magleby, political science

In early October 2017, the Supreme Court of the United States heard oral arguments in the case of Gill v. Whitford. The case calls into question the constitutionality of partisan gerrymandering -- the practice of drawing boundaries of districts in such a way that one political party receives an unfair advantage. The plaintiffs in the case argued that data scientists brought to bear a set of tools that allowed Wisconsin's legislature and governor to draw and enact maps that favored the Republican party. By every estimation, the Republicans' strategy was extremely effective. For example, in the 2012 elections, Republican candidates received just 48% of the vote while managing to carry 60% of the districts in the state.

Just as data science can be used to build an unfair advantage, data science can be used to identify and remedy these unfair redistricting practices in Wisconsin and elsewhere. With the seed grant from the Binghamton University Data Science Transdisciplinary Working Group, the team will implement an algorithm that PI Magleby developed with Daniel Mosesson on a high performance computing cluster. In a forthcoming paper, the team show that the algorithm produces maps without any indication of bias. Moreover, the method the team propose is vastly more efficient that alternatives. Access to a cluster will allow to use the algorithm to draw hundreds of millions of hypothetical maps. That large number of counter-factual maps will allow to make inferences about the impact of certain redistricting criteria that mapmakers have used as a defense of partisan outcomes. In particular, it will allow to understand the ways that considerations like race, communities of interest, and other political jurisdictions interact with the biased, partisan outcomes that analysts have observed in recent decades.