Lazy data, faster responses
Can data-centric laziness make data-intensive software more effective? A new study believes laziness may be the key to improving big data systems.
Associate Professor Yu “David” Liu from the Department of Computer Science wants us to embrace laziness – data-centric laziness that is.
He recently received a nearly $450,000 grant from the National Science Foundation to prove the merits of laziness as a solution to several challenges inherent in data-intensive software development.
“From Facebook to Twitter, from telescopes to microscopes, from MRIs to brain simulators, from the Internet of Things to smartphones, we live in an era when a staggering amount of data — both in size and in complexity — are produced and processed. That’s why data-intensive applications are taking center stage in the world of modern computing,” explained Liu. “Our study will look at making those applications effective by strengthening the support for data dynamism, structural complexity, program-data integration and language-algorithm-system integration.”
These four areas of improvement are critical to what we currently need from data-intensive applications:
Data Dynamism
“During a breaking news event, a graph of information shared on Twitter may change very quickly. Not only do we need programs that can handle a massive amount of information, but the programs should also be update-intensive and query-intensive,” said Liu.
Structural complexity
As for the structural complexity, Liu pointed out that data is not just big because of the quantity of data, it is often considered “big” because of the complexity of the data and its relation to other pieces of data.
Program-data integration
Program-data integration means that instead of thinking of pieces of data as separate items analyzed by the program, it may be more efficient to find ways to integrate that data as the top priority within the program’s runtime.
Language-algorithm-system integration
While program-data integration is important, Liu said it also needs to address a way to bridge algorithm development and system building for it to be most effective.
His solution to these four challenges involves embracing what may seem like a counter-intuitive idea: laziness.
Data-centric laziness refers to delaying the evaluation of data. Liu has developed a new programming language design that builds that delay into the program. A delay in data evaluation frees up the program to respond faster.
However, there is still more work to be done to make sure the program’s increased responsiveness also comes with a higher end-to-end performance. Liu’s system will identify the moments when a data-intensive application may behave in a sub-optimal state and intelligently delay data processing to a time when productivity is improved.
Liu explained this idea by saying, “Imagine Alice and Bob are both asked to analyze a huge amount of data at a busy and stressful time for them. Alice decides to take a day off and, with her renewed energy, completes the data analysis in the second day. Bob instead takes on the task immediately but drudges along for five days due to fatigue. Who is the better data analyst?”
The study is titled “SHF: Small: Lazy Data Structures for Data-Intensive Applications” and is scheduled to be completed in September 2021.