

With Merlin, each use case runs in a dedicated environment that can be defined by its tasks, dependencies and required resources - we call these environments Merlin Workspaces. Merlin uses these features and datasets as inputs to the machine learning tasks it runs, such as preprocessing, training, and batch inference. The data and features are then saved to our data lake or Pano, our feature store. Typically, large scale data modeling and processing at Shopify happens in other parts of our data platform, using tools such as Spark. Merlin gives our users the tools to run their machine learning workflows. Merlin Architecture A high level diagram of Merlin’s architecture Flexibility: users can use any libraries or packages they need for their modelsįor the first iteration of Merlin, we focused on enabling training and batch inference on the platform.Fast Iterations: tools that reduce friction and increase productivity for our data scientists and machine learning engineers by minimizes the gap between prototyping and production.Scalability: robust infrastructure that can scale up our machine learning workflows.Merlin’s objective is to enable Shopify's teams to train, test, deploy, serve and monitor machine learning models efficiently and quickly. Using open source tooling end-to-end was important to us because we wanted to both draw from and contribute to the most up-to-date technologies and their communities as well as provide the agility in evolving the platform to our users’ needs. Our new machine learning platform is based on an open source stack and technologies. We dive into the architecture, working with the platform, and a product use case. In this post, we walk through how we built Merlin, our magical new machine learning platform. The platform should be flexible enough to support the different aspects of building machine learning solutions in production, and enable our data scientists to use the best tools for the job. We need a machine learning platform that can handle different (often conflicting) requirements, inputs, data types, dependencies and integrations. External use cases are merchant and buyer facing, and include projects such as product categorization and recommendation systems.Īt Shopify we build for the long term, and last year we decided to redesign our machine learning platform. Internal use cases are being developed and used in specialized domains like fraud detection and revenue predictions. There are many different kinds of machine learning use cases at Shopify, internal and external. Shopify's machine learning platform team builds the infrastructure, tools and abstracted layers to help data scientists streamline, accelerate and simplify their machine learning workflows.
