Our monthly events feature presentations and discussions from local experts.
Our monthly newsletter features links to interesting articles, tutorials, and tools related to data science, analytics, and big data.
Data Works MD consists of professionals, students, and enthusiasts living and working in the Maryland area that are interested in topics related to data science, data analytics, data products, software engineering, machine learning, and other data engineering topics.
Register for one of our upcoming events!
May 8, 2021
Looking for a job? Looking to hire someone? Trying to get your project started? Grab your best pitch and come share with the Data Works MD community.
Recent videos of our events can be found below. More are available at YouTube.
The COVID-19 pandemic is the most profound health crisis to impact the United States and the world in the past 100 years. One critical challenge since the beginning of the pandemic included accurate models to inform organizations’ responses. Numerous models and analytics emerged to address disease spread, hospital utilization, PPE demand and allocation, vaccine allocation, and mortality. Underlying these models and analytics’ efficacy is the need for quality data that provides a high degree of trust. This talk will describe our experiences at the Johns Hopkins University/ Applied Physics Laboratory since the beginning of the COVID-19 pandemic in curating and building high-quality data pipelines to inform the global response.
When put into service solving customer needs, data science can be a critical differentiator for digital products in ever-more-competitive markets. But productizing data science presents a unique set of challenges, and often leaves product managers and data scientists struggling to find common ground and a shared language. In this talk, product coach and consultant Matt LeMay shares the lessons he's learned building bridges between product management and data science at companies like Bitly, Songza, and Spotify. Expect a candid, direct, and entertaining conversation about mistakes made, lessons learned, and suggestions for how to move forward.
Design patterns are formalized best practices to solve common problems when designing a software system. As machine learning moves from being a research discipline to a software one, it is useful to catalog tried-and-proven methods to help engineers tackle frequently occurring problems that crop up during the ML process. In this talk, I will cover five patterns (Workflow Pipelines, Transform, Multimodal Input, Feature Store, Cascade) that are useful in the context of adding flexibility, resilience and reproducibility to ML in production. For data scientists and ML engineers, these patterns provide a way to apply hard-won knowledge from hundreds of ML experts to your own projects.
With the scale of new malware being created each year growing, as well as the expanding market opportunities for malware reuse, protecting systems can’t rely solely on downloading a vendor’s updated virus signature files. Our customers need ways to detect and cordon likely threats, by using data retrieved from a combination of static and behavioral characteristics, and comparing it to other classes of “good” versus “bad” files. Optimally, the solution cordons risky files, force ranks them according to their likelihood of causing harm, correlates some metadata to help with further learning and to provide context to analysts, and lets an analyst “release” a file after further analysis and a request from a user. Oh, with that feedback relayed back into the model to support further tuning.
Edge computing is a distributed computing model in which computing takes place near the physical location where data is being collected and analyzed, rather than on a centralized server or in the cloud. According to Gartner "91% of today’s data is created and processed in centralized data centers. By 2022 about 75% of all data will need analysis and action at the edge."
A geographic information system (GIS) is a framework for gathering, managing, and analyzing data. Rooted in the science of geography, GIS integrates many types of data. GIS data is used for a variety of purposes including mapping, urban planning, agriculture, and banking. Join us in October to learn how you can use Python to explore, analyze, and work with GIS data.
Interesting articles, tools, and tutorials. More are available at our newsletter archive.
Gender equality in AI, Julia as a replacement for Python, a framework for easier documentation, ...
Data-driven company, data leakage, sentiment analysis, and top Python libraries, ...
AlphaFold, Netflix, 2020 trends, awesome data engineering, ...
We are proudly supported by the following organizations.
Varen Technologies is a trusted industry leader delivering innovative solutions in cyber security, analytics, augmented intelligence, Agile Software Development and IT/maintenance for our partner clients including the federal government, Department of Defense, Homeland Security and other Cyber Defense organizations.
ClearEdge is a mission-driven technology thought leader grown out of the Intelligence Community that provides software engineering, big data, cloud, data analytics, and data science solutions and services. We are committed to exceeding our customer’s expectations by attracting and retaining top-tier engineers and experts.