Our monthly events feature presentations and discussions from local experts.
Our monthly newsletter features links to interesting articles, tutorials, and tools related to data science, analytics, and big data.
Data Works MD consists of professionals, students, and enthusiasts living and working in the Maryland area that are interested in topics related to data science, data analytics, data products, software engineering, machine learning, and other data engineering topics.
Register for one of our upcoming events!
March 20, 2021
When put into service solving customer needs, data science can be a critical differentiator for digital products in ever-more-competitive markets. But productizing data science presents a unique set of challenges, and often leaves product managers and data scientists struggling to find common ground and a shared language. In this talk, product coach and consultant Matt LeMay shares the lessons he's learned building bridges between product management and data science at companies like Bitly, Songza, and Spotify. Expect a candid, direct, and entertaining conversation about mistakes made, lessons learned, and suggestions for how to move forward.
April 3, 2021
Looking for a job? Looking to hire someone? Trying to get your project started? Grab your best pitch and come share with the Data Works MD community.
Recent videos of our events can be found below. More are available at YouTube.
Design patterns are formalized best practices to solve common problems when designing a software system. As machine learning moves from being a research discipline to a software one, it is useful to catalog tried-and-proven methods to help engineers tackle frequently occurring problems that crop up during the ML process. In this talk, I will cover five patterns (Workflow Pipelines, Transform, Multimodal Input, Feature Store, Cascade) that are useful in the context of adding flexibility, resilience and reproducibility to ML in production. For data scientists and ML engineers, these patterns provide a way to apply hard-won knowledge from hundreds of ML experts to your own projects.
With the scale of new malware being created each year growing, as well as the expanding market opportunities for malware reuse, protecting systems can’t rely solely on downloading a vendor’s updated virus signature files. Our customers need ways to detect and cordon likely threats, by using data retrieved from a combination of static and behavioral characteristics, and comparing it to other classes of “good” versus “bad” files. Optimally, the solution cordons risky files, force ranks them according to their likelihood of causing harm, correlates some metadata to help with further learning and to provide context to analysts, and lets an analyst “release” a file after further analysis and a request from a user. Oh, with that feedback relayed back into the model to support further tuning.
Edge computing is a distributed computing model in which computing takes place near the physical location where data is being collected and analyzed, rather than on a centralized server or in the cloud. According to Gartner "91% of today’s data is created and processed in centralized data centers. By 2022 about 75% of all data will need analysis and action at the edge."
A geographic information system (GIS) is a framework for gathering, managing, and analyzing data. Rooted in the science of geography, GIS integrates many types of data. GIS data is used for a variety of purposes including mapping, urban planning, agriculture, and banking. Join us in October to learn how you can use Python to explore, analyze, and work with GIS data.
In partnership with TEDCO, we are featuring two speakers from the successful Maryland-based data-focused companies, Yet Analytics and Protenus. Protenus will be discussing how they built their Protenus Healthcare Compliance Analytics platform with a discussion on Random Forests. Yet Analytics will discuss the unique approach of their xAPI as a specification based in the world of semantic technology and talk about how xAPI is implemented for data simulation, analytics, and advanced visualization and reporting in the learning and training space.
As a field, we often hear about success stories. This is true in research, where a publishing incentive can pressure authors to focus on consistently exceeding state of the art results. It is also true in industry, where companies attempt to attract engineering talent by describing how impressive their production ML systems are. However, every practitioner here knows that in engineering and in ML, the road to success is paved with failures. The field of ML in production is new, and so has a lack of cautionary tales of things that can go wrong with models.
Interesting articles, tools, and tutorials. More are available at our newsletter archive.
Data-driven company, data leakage, sentiment analysis, and top Python libraries, ...
AlphaFold, Netflix, 2020 trends, awesome data engineering, ...
State of AI in 2020, evil data science, data orchestration, and how to win Kaggle competitions, ...
We are proudly supported by the following organizations.
Varen Technologies is a trusted industry leader delivering innovative solutions in cyber security, analytics, augmented intelligence, Agile Software Development and IT/maintenance for our partner clients including the federal government, Department of Defense, Homeland Security and other Cyber Defense organizations.
ClearEdge is a mission-driven technology thought leader grown out of the Intelligence Community that provides software engineering, big data, cloud, data analytics, and data science solutions and services. We are committed to exceeding our customer’s expectations by attracting and retaining top-tier engineers and experts.