f Loading...
Data Works MD Logo

About Us

Data Works MD consists of professionals, students, and enthusiasts living and working in the Maryland area that are interested in topics related to data science, data analytics, data products, software engineering, machine learning, and other data engineering topics.

0+ Members
0+ Events
0+ Newsletters


Register for one of our upcoming events!


Recent videos of our events can be found below. More are available at YouTube.


July 24, 2021

Introducing Datawave - Scalable Data Ingest and Query

In this talk, we introduce Datawave, a complete ingest, query, and analytic framework for Accumulo. Datawave, recently open-sourced by the National Security Agency, capitalizes on Accumulo's capabilities, provides an API for working with structured and unstructured data, and boasts a robust, flexible, and scalable backend. We'll do a deep dive into Datawave's project layout, table structures, and APIs in addition to demonstrating the Datawave quickstart—a tool that makes it incredibly easy to hit the ground running with Accumulo and Datawave without having to develop a complete application.


June 17, 2021

Graph Analytics - Rich Relationships and Powerful Insights

Demo-Driven exploration of graph analytics to identify criminals, discover trolls, analyze social networks, and more using community detection, centrality, link prediction, and graph embedding for incorporation into machine learning models, along with creating graphs from Wikipedia via wikidata. Includes all you need to get started and a review and use of two graph query languages – cypher and SPARQL across multiple environments including Neo4J, Amazon Neptune, and Nvidia Cuda graphs for large scale graph processing using GPUs. Relationships are what it is all about


April 17, 2021

The Role of Data During Apocalyptic Times

The COVID-19 pandemic is the most profound health crisis to impact the United States and the world in the past 100 years. One critical challenge since the beginning of the pandemic included accurate models to inform organizations’ responses. Numerous models and analytics emerged to address disease spread, hospital utilization, PPE demand and allocation, vaccine allocation, and mortality. Underlying these models and analytics’ efficacy is the need for quality data that provides a high degree of trust. This talk will describe our experiences at the Johns Hopkins University/ Applied Physics Laboratory since the beginning of the COVID-19 pandemic in curating and building high-quality data pipelines to inform the global response.


March 20, 2021

Data Science Product Management

When put into service solving customer needs, data science can be a critical differentiator for digital products in ever-more-competitive markets. But productizing data science presents a unique set of challenges, and often leaves product managers and data scientists struggling to find common ground and a shared language. In this talk, product coach and consultant Matt LeMay shares the lessons he's learned building bridges between product management and data science at companies like Bitly, Songza, and Spotify. Expect a candid, direct, and entertaining conversation about mistakes made, lessons learned, and suggestions for how to move forward.


February 24, 2021

ML Design Patterns and Designing ML Infrastructure

Design patterns are formalized best practices to solve common problems when designing a software system. As machine learning moves from being a research discipline to a software one, it is useful to catalog tried-and-proven methods to help engineers tackle frequently occurring problems that crop up during the ML process. In this talk, I will cover five patterns (Workflow Pipelines, Transform, Multimodal Input, Feature Store, Cascade) that are useful in the context of adding flexibility, resilience and reproducibility to ML in production. For data scientists and ML engineers, these patterns provide a way to apply hard-won knowledge from hundreds of ML experts to your own projects.


January 16, 2021

Malware Detection, Enabled by Machine Learning

With the scale of new malware being created each year growing, as well as the expanding market opportunities for malware reuse, protecting systems can’t rely solely on downloading a vendor’s updated virus signature files. Our customers need ways to detect and cordon likely threats, by using data retrieved from a combination of static and behavioral characteristics, and comparing it to other classes of “good” versus “bad” files. Optimally, the solution cordons risky files, force ranks them according to their likelihood of causing harm, correlates some metadata to help with further learning and to provide context to analysts, and lets an analyst “release” a file after further analysis and a request from a user. Oh, with that feedback relayed back into the model to support further tuning.


Interesting articles, tools, and tutorials. More are available at our newsletter archive.

Data Works MD June 2021 Issue

DataOps, data platforms, best practices for Docker, ...

Data Works MD May 2021 Issue

Product teams, defensible ML, time series forecasting, DAX...

Data Works MD April 2021 Issue

ML in game development, graphing COVID, Pokemon or Big Data, ...


We are proudly supported by the following organizations.