Our monthly events feature presentations and discussions from local experts.
Our monthly newsletter features links to interesting articles, tutorials, and tools related to data science, analytics, and big data.
Data Works MD consists of professionals, students, and enthusiasts living and working in the Maryland area that are interested in topics related to data science, data analytics, data products, software engineering, machine learning, and other data engineering topics.
Register for one of our upcoming events!
September 18, 2021
The DAX Conference 2021 will focus on data science, analytics, and general data exploration. Engineers, data scientists, analytic developers, system architects, and business leaders are encouraged to share their experiences and present a topic that would be of interest to the local data community. Expected attendees include engineers, thought leaders, business leaders, and professionals from local government, government defense and intelligence agencies, start-up companies, large data analytic and data science companies, and local universities.
June 17, 2021
Demo-Driven exploration of graph analytics to identify criminals, discover trolls, analyze social networks, and more using community detection, centrality, link prediction, and graph embedding for incorporation into machine learning models, along with creating graphs from Wikipedia via wikidata. Includes all you need to get started and a review and use of two graph query languages – cypher and SPARQL across multiple environments including Neo4J, Amazon Neptune, and Nvidia Cuda graphs for large scale graph processing using GPUs. Relationships are what it is all about
July 24, 2021
In this talk, we introduce Datawave, a complete ingest, query, and analytic framework for Accumulo. Datawave, recently open-sourced by the National Security Agency, capitalizes on Accumulo's capabilities, provides an API for working with structured and unstructured data, and boasts a robust, flexible, and scalable backend. We'll do a deep dive into Datawave's project layout, table structures, and APIs in addition to demonstrating the Datawave quickstart—a tool that makes it incredibly easy to hit the ground running with Accumulo and Datawave without having to develop a complete application.
The COVID-19 pandemic is the most profound health crisis to impact the United States and the world in the past 100 years. One critical challenge since the beginning of the pandemic included accurate models to inform organizations’ responses. Numerous models and analytics emerged to address disease spread, hospital utilization, PPE demand and allocation, vaccine allocation, and mortality. Underlying these models and analytics’ efficacy is the need for quality data that provides a high degree of trust. This talk will describe our experiences at the Johns Hopkins University/ Applied Physics Laboratory since the beginning of the COVID-19 pandemic in curating and building high-quality data pipelines to inform the global response.
When put into service solving customer needs, data science can be a critical differentiator for digital products in ever-more-competitive markets. But productizing data science presents a unique set of challenges, and often leaves product managers and data scientists struggling to find common ground and a shared language. In this talk, product coach and consultant Matt LeMay shares the lessons he's learned building bridges between product management and data science at companies like Bitly, Songza, and Spotify. Expect a candid, direct, and entertaining conversation about mistakes made, lessons learned, and suggestions for how to move forward.
Design patterns are formalized best practices to solve common problems when designing a software system. As machine learning moves from being a research discipline to a software one, it is useful to catalog tried-and-proven methods to help engineers tackle frequently occurring problems that crop up during the ML process. In this talk, I will cover five patterns (Workflow Pipelines, Transform, Multimodal Input, Feature Store, Cascade) that are useful in the context of adding flexibility, resilience and reproducibility to ML in production. For data scientists and ML engineers, these patterns provide a way to apply hard-won knowledge from hundreds of ML experts to your own projects.
With the scale of new malware being created each year growing, as well as the expanding market opportunities for malware reuse, protecting systems can’t rely solely on downloading a vendor’s updated virus signature files. Our customers need ways to detect and cordon likely threats, by using data retrieved from a combination of static and behavioral characteristics, and comparing it to other classes of “good” versus “bad” files. Optimally, the solution cordons risky files, force ranks them according to their likelihood of causing harm, correlates some metadata to help with further learning and to provide context to analysts, and lets an analyst “release” a file after further analysis and a request from a user. Oh, with that feedback relayed back into the model to support further tuning.
Edge computing is a distributed computing model in which computing takes place near the physical location where data is being collected and analyzed, rather than on a centralized server or in the cloud. According to Gartner "91% of today’s data is created and processed in centralized data centers. By 2022 about 75% of all data will need analysis and action at the edge."
A geographic information system (GIS) is a framework for gathering, managing, and analyzing data. Rooted in the science of geography, GIS integrates many types of data. GIS data is used for a variety of purposes including mapping, urban planning, agriculture, and banking. Join us in October to learn how you can use Python to explore, analyze, and work with GIS data.
We are proudly supported by the following organizations.
Erias Ventures was founded to serve its customers with an entrepreneurship mindset. We value taking action, having the courage to commit, and persevering through challenges and failures.
Varen Technologies is a trusted industry leader delivering innovative solutions in cyber security, analytics, augmented intelligence, Agile Software Development and IT/maintenance for our partner clients including the federal government, Department of Defense, Homeland Security and other Cyber Defense organizations.
ClearEdge is a mission-driven technology thought leader grown out of the Intelligence Community that provides software engineering, big data, cloud, data analytics, and data science solutions and services. We are committed to exceeding our customer’s expectations by attracting and retaining top-tier engineers and experts.
Captivation is a provider of mission-focused software engineering services supporting the Department of Defense and Intelligence Community. Join Captivation to work with true software experts and build something to be proud of.