David Bader profile photo

David Bader

Distinguished Professor, Data Science | Director of the Institute for Data Science

Newark, NJ, UNITED STATES

Interests lie at the intersection of data science & high-performance computing, with applications in cybersecurity

Spotlight

Media

Publications

Image for publication on Massive Graph Analytics (Chapman & Hall/CRC Data Science Series)Image for publication on Petascale Computing: Algorithms and Applications (Chapman & Hall/CRC Computational Science Series)Image for publication on Scientific Computing with Multicore and Accelerators (Chapman & Hall/CRC Computational Science)Image for publication on Graph Partitioning and Graph Clustering (Contemporary Mathematics)

Documents

Photos

Audio/Podcasts

Image for podchaser audio on Large-Scale Data Analytics For Cybersecurity And Solving Real-World Grand Challenges | Redefining CyberSecurity With Professor David Bader

Video

Image for vimeo videos on Predictive Analysis from Massive Knowledge Graphs on Neo4j – David BaderImage for vimeo videos on Interview: David Bader on Real World Challenges for Big Data Analytics,Image for vimeo videos on 5-Minute Interview: Dave Bader, Professor at Georgia Tech College of Computing

Social

Biography

David A. Bader is a Distinguished Professor and founder of the Department of Data Science and inaugural Director of the Institute for Data Science at New Jersey Institute of Technology.

Dr. Bader is a Fellow of the IEEE, ACM, AAAS, and SIAM; a recipient of the IEEE Sidney Fernbach Award; and the 2022 Innovation Hall of Fame inductee of the University of Maryland’s A. James Clark School of Engineering. He advises the White House, most recently on the National Strategic Computing Initiative (NSCI) and Future Advanced Computing Ecosystem (FACE).

Bader is a leading expert in solving global grand challenges in science, engineering, computing, and data science. His interests are at the intersection of high-performance computing and real-world applications, including cybersecurity, massive-scale analytics, and computational genomics, and he has co-authored over 300 scholarly papers and has best paper awards from ISC, IEEE HPEC, and IEEE/ACM SC. Dr. Bader has served as a lead scientist in several DARPA programs including High Productivity Computing Systems (HPCS) with IBM, Ubiquitous High Performance Computing (UHPC) with NVIDIA, Anomaly Detection at Multiple Scales (ADAMS), Power Efficiency Revolution For Embedded Computing Technologies (PERFECT), Hierarchical Identify Verify Exploit (HIVE), and Software-Defined Hardware (SDH).

Dr. Bader is Editor-in-Chief of the ACM Transactions on Parallel Computing, and previously served as Editor-in-Chief of the IEEE Transactions on Parallel and Distributed Systems. He serves on the leadership team of Northeast Big Data Innovation Hub as the inaugural chair of the Seed Fund Steering Committee. ROI-NJ recognized Bader as a technology influencer on its 2021 inaugural and 2022 lists.

In 2012, Bader was the inaugural recipient of University of Maryland’s Electrical and Computer Engineering Distinguished Alumni Award. In 2014, Bader received the Outstanding Senior Faculty Research Award from Georgia Tech. Bader has also served as Director of the Sony-Toshiba-IBM Center of Competence for the Cell Broadband Engine Processor and Director of an NVIDIA GPU Center of Excellence.

In 1998, Bader built the first Linux supercomputer that led to a high-performance computing (HPC) revolution, and Hyperion Research estimates that the total economic value of Linux supercomputing pioneered by Bader has been over $100 trillion over the past 25 years.

Areas of Expertise

Graph AnalyticsMassive-Scale AnalyticsHigh-Performance ComputingData ScienceApplications in CybersecurityComputational Genomics

Accomplishments

Inductee into University of Maryland's A. James Clark School of Engineering Innovator Hall of Fame

2022

NVIDIA AI Lab (NVAIL) Award

2019

Invited attendee to the White House’s National Strategic Computing Initiative (NSCI) Anniversary Workshop.

2019

Facebook AI System Hardware/Software Co-Design Research Award

2019

Named a member of "People to Watch" by HPC Wire

2014

The first recipient of the University of Maryland's Distinguished Alumni Award

2012 Department of Electrical and Computer Engineering

Named a member of "People to Watch" by HPC Wire

2012

Selected by Sony, Toshiba, and IBM to direct the first Center of Competence for the Cell Processor

2006

Education

University of Maryland

Ph.D., Electrical and Computer Engineering

1996

Lehigh University

M.S., Electrical Engineering

1991

Lehigh University

B.S., Computer Engineering

1990

Affiliations

  • AAAS Fellow
  • IEEE Fellow
  • SIAM Fellow
  • ACM Fellow

Media Appearances

Academic Data Science Alliance Picks Up Steam

Datanami  online

2022-11-22

Universities looking for resources to build their data science curriculums and degree programs have a new resource at their disposal in the form of the Academic Data Science Alliance. Founded just prior to the pandemic, the ADSA survived COVID and now it’s working to foster a community of data science leaders at universities across North America and Europe...

view more

‘Weaponised app’: Is Egypt spying on COP27 delegates’ phones?

Al Jazeera  online

2022-11-12

Cybersecurity concerns have been raised at the United Nations’ COP27 climate talks over an official smartphone app that reportedly has carte blanche to monitor locations, private conversations and photographs. About 35,000 people are expected to attend the two-week climate conference in Egypt, and the app has been downloaded more than 10,000 times on Google Play, including by officials from France, Germany and Canada...

view more

Your Hard Drive May One Day Use Diamonds for Storage

Lifewire  online

2022-05-03

Diamonds could one day be used to store vast amounts of information. Researchers are trying to use the strange effects of quantum mechanics to hold information. However, experts say don’t expect a quantum hard drive in your PC anytime soon.

view more

Big Data Career Notes: July 2019 Edition

Datanami  online

2019-07-16

The New Jersey Institute of Technology has announced that it will establish a new Institute for Data Science, directed by Distinguished Professor David Bader. Bader recently joined NJIT’s Ying Wu College of Computing from Georgia Tech, where he was chair of the School of Computational Science and Engineering within the College of Computing. Bader was recognized as one of HPCwire’s People to Watch in 2014.

view more

David Bader to Lead New Institute for Data Science at NJIT

Inside HPC  online

2019-07-10

Professor David Bader will lead the new Institute for Data Science at the New Jersey Institute of Technology. Focused on cutting-edge interdisciplinary research and development in all areas pertinent to digital data, the institute will bring existing research centers in big data, medical informatics and cybersecurity together to conduct both basic and applied research.

view more

Event Appearances

Massive-scale Analytics

13th International Conference on Parallel Processing and Applied Mathematics (PPAM)  BIalystok, Poland

2019-09-09

Predictive Analytics from Massive Streaming Data

44th Annual GOMACTech Conference: Artificial Intelligence & Cyber Security: Challenges and Opportunities for the Government  Albuquerque, NM

2019-03-26

Massive-Scale Analytics Applied to Real-World Problems

2018 Platform for Advanced Scientific Computing (PASC) Conference  Basel, Switzerland

2018-07-04

Research Focus

NVIDIA AI Lab (NVAIL) for Scalable Graph Algorithms

2019-08-05

Graph algorithms represent some of the most challenging known problems in computer science for modern processors. These algorithms contain far more memory access per unit of computation than traditional scientific computing. Access patterns are not known until execution time and are heavily dependent on the input data set. Graph algorithms vary widely in the volume of spatial and temporal locality that is usable my modern architectures. In today’s rapidly evolving world, graph algorithms are used to make sense of large volumes of data from news reports, distributed sensors, and lab test equipment, among other sources connected to worldwide networks. As data is created and collected, dynamic graph algorithms make it possible to compute highly specialized and complex relationship metrics over the entire web of data in near-real time, reducing the latency between data collection and the capability to take action. With this partnership with NVIDIA, we collaborate on the design and implementation of scalable graph algorithms and graph primitives that will bring new capabilities to the broader community of data scientists. Leveraging existing open frameworks, this effort will improve the experience of graph data analysis using GPUs by improving tools for analyzing graph data, speeding up graph traversal using optimized data structures, and accelerating computations with better runtime support for dynamic work stealing and load balancing.

view more

Facebook AI Systems Hardware/Software Co-Design research award on Scalable Graph Learning Algorithms

2019-05-10

Deep learning has boosted the machine learning field at large and created significant increases in the performance of tasks including speech recognition, image classification, object detection, and recommendation. It has opened the door to complex tasks, such as self-driving and super-human image recognition. However, the important techniques used in deep learning, e.g. convolutional neural networks, are designed for Euclidean data type and do not directly apply on graphs. This problem is solved by embedding graphs into a lower dimensional Euclidean space, generating a regular structure. There is also prior work on applying convolutions directly on graphs and using sampling to choose neighbor elements. Systems that use this technique are called graph convolution networks (GCNs). GCNs have proven to be successful at graph learning tasks like link prediction and graph classification. Recent work has pushed the scale of GCNs to billions of edges but significant work remains to extend learned graph systems beyond recommendation systems with specific structure and to support big data models such as streaming graphs. This project will focus on developing scalable graph learning algorithms and implementations that open the door for learned graph models on massive graphs. We plan to approach this problem in two ways. First, developing a scalable high performance graph learning system based on existing GCNs algorithms, like GraphSage, by improving the workflow on shared-memory NUMA machines, balancing computation between threads, optimizing data movement, and improving memory locality. Second, we will investigate graph learning algorithm-specific decompositions and develop new strategies for graph learning that can inherently scale well while maintaining high accuracy. This includes traditional partitioning, however in general we consider breaking the problem into smaller pieces, which, when solved will result in a solution to the bigger problem. We will explore decomposition results from graph theory, for example, forbidden graphs and the Embedding Lemma, and determine how to apply such results into the field of graph learning. We will investigate whether these decompositions could assist in a dynamic graph setting.

view more

Research Grants

Echelon: Extreme-scale Compute Hierarchies with Efficient Locality-Optimized Nodes

DARPA/NVIDIA $25,000,000

2010-06-01

Goal: Develop highly parallel, security enabled, power efficient processing systems, supporting ease of programming, with resilient execution through all failure modes and intrusion attacks

view more

Center for Adaptive Supercomputing Software for Multithreaded Architectures (CASS-MT): Analyzing Massive Social Networks

Department of Defense $24,000,000

2008-08-01

Exascale Streaming Data Analytics for social networks: understanding communities, intentions, population dynamics, pandemic spread, transportation and evacuation.

view more

Proactive Detection of Insider Threats with Graph Analysis at Multiple Scales (PRODIGAL), under Anomoly Detection at Multiple Scales (ADAMS)

DARPA $9,000,000

2011-05-01

This paper reports on insider threat detection research, during which a prototype system (PRODIGAL)1 was developed and operated as a testbed for exploring a range of detection and analysis methods. The data and test environment, system components, and the core method of unsupervised detection of insider threat leads are presented to document this work and benefit others working in the insider threat domain...

view more

Challenge Applications and Scalable Metrics (CHASM) for Ubiquitous High Performance Computing

DARPA $ 7,500,000.00

2010-06-01

Develop highly parallel, security enabled, power efficient processing systems, supporting ease of programming, with resilient execution through all failure modes and intrusion attacks.

SHARP: Software Toolkit for Accelerating Graph Algorithms on Hive Processors

DARPA $6,760,425

2017-04-23

The aim of SHARP is to enable platform independent implementation of fast, scalable and approximate, static and streaming graph algorithms. SHARP will develop a software tool-kit for seamless acceleration of graph analytics (GA) applications, for a first of its kind collection of graph processors...

view more

GRATEFUL: GRaph Analysis Tackling power EFficiency, Uncertainty, and Locality

DARPA $2,929,819

2012-10-19

Think of the perfect embedded computer. Think of a computer so energy-efficient that it can last 75 times longer than today’s systems. Researchers at Georgia Tech are helping the Defense Advanced Projects Research Agency (DARPA) develop such a computer as part of an initiative called Power Efficiency Revolution for Embedded Computing Technologies, or PERFECT. “The program is looking at how do we come to a new paradigm of computing where running time isn’t necessarily the constraint, but how much power and battery that we have available is really the new constraint,” says David Bader, executive director of high-performance computing at the School of Computational Science and Engineering. If the project is successful, it could result in computers far smaller and orders of magnitude more efficient than today’s machines. It could also mean that the computer mounted tomorrow on an unmanned aircraft or ground vehicle, or even worn by a soldier would use less energy than a larger device, while still being as powerful. Georgia Tech’s part in the DARPA-led PERFECT effort is called GRATEFUL, which stands for Graph Analysis Tackling power-Efficiency, Uncertainty and Locality. Headed by Bader and co-investigator Jason Riedy, GRATEFUL focuses on algorithms that would process vast stores of data and turn it into a graphical representation in the most energy-efficient way possible.

view more

Articles

Tailoring parallel alternating criteria search for domain specific MIPs: Application to maritime inventory routing

Computers & Operations Research

Lluís-Miquel Munguía, Shabbir Ahmed, David A Bader, George L Nemhauser, Yufen Shao, Dimitri J Papageorgiou

2019

Parallel Alternating Criteria Search (PACS) relies on the combination of computer parallelism and Large Neighborhood Searches to attempt to deliver high quality solutions to any generic Mixed-Integer Program (MIP) quickly. While general-purpose primal heuristics are widely used due to their universal application, they are usually outperformed by domain-specific heuristics when optimizing a particular problem class.

view more

High-Performance Phylogenetic Inference

Bioinformatics and Phylogenetics

David A Bader, Kamesh Madduri

2019

Software tools based on the maximum likelihood method and Bayesian methods are widely used for phylogenetic tree inference. This article surveys recent research on parallelization and performance optimization of state-of-the-art tree inference tools. We outline advances in shared-memory multicore parallelization, optimizations for efficient Graphics Processing Unit (GPU) execution, as well as large-scale distributed-memory parallelization.

view more

Numerically approximating centrality for graph ranking guarantees

Journal of Computational Science

Eisha Nathan, Geoffrey Sanders, David A Bader

2018

Many real-world datasets can be represented as graphs. Using iterative solvers to approximate graph centrality measures allows us to obtain a ranking vector on the nodes of the graph, consisting of a number for each vertex in the graph identifying its relative importance. In this work the centrality measures we use are Katz Centrality and PageRank. Given an approximate solution, we use the residual to accurately estimate how much of the ranking matches the ranking given by the exact solution.

view more

Ranking in dynamic graphs using exponential centrality

International Conference on Complex Networks and their Applications

Eisha Nathan, James Fairbanks, David Bader

2017

Many large datasets from several fields of research such as biology or society can be represented as graphs. Additionally in many real applications, data is constantly being produced, leading to the notion of dynamic graphs. A heavily studied problem is identification of the most important vertices in a graph. This can be done using centrality measures, where a centrality metric computes a numerical value for each vertex in the graph.

view more

Scalable and High Performance Betweenness Centrality on the GPU [Best Student Paper Finalist]

Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

A. McLaughlin, D. A. Bader

2014-11-01

raphs that model social networks, numerical simulations, and the structure of the Internet are enormous and cannot be manually inspected. A popular metric used to analyze these networks is between ness centrality, which has applications in community detection, power grid contingency analysis, and the study of the human brain. However, these analyses come with a high computational cost that prevents the examination of large graphs of interest. Prior GPU implementations suffer from large local data structures and inefficient graph traversals that limit scalability and performance. Here we present several hybrid GPU implementations, providing good performance on graphs of arbitrary structure rather than just scale-free graphs as was done previously. We achieve up to 13x speedup on high-diameter graphs and an average of 2.71x speedup overall over the best existing GPU algorithm. We observe near linear speedup and performance exceeding tens of GTEPS when running between ness centrality on 192 GPUs.

view more

STINGER: High performance data structure for streaming graphs [Best Paper Award]

IEEE Conference on High Performance Extreme Computing

D. Ediger, R. McColl, J. Riedy, D. A. Bader

2012-09-01

The current research focus on “big data” problems highlights the scale and complexity of analytics required and the high rate at which data may be changing. In this paper, we present our high performance, scalable and portable software, Spatio-Temporal Interaction Networks and Graphs Extensible Representation (STINGER), that includes a graph data structure that enables these applications. Key attributes of STINGER are fast insertions, deletions, and updates on semantic graphs with skewed degree distributions. We demonstrate a process of algorithmic and architectural optimizations that enable high performance on the Cray XMT family and Intel multicore servers. Our implementation of STINGER on the Cray XMT processes over 3 million updates per second on a scale-free graph with 537 million edges.

view more

powered byPowered By