Case Study How Pinterest Built a Stream Processing Platform with Apache Flink
Facing rapid growth and competition in its online business, Pinterest tech stacks demanded large-scale stateful online processing technology to unlock multiple top initiatives. From lambda functions like micro services to Kafka Stream and Spark streaming, Pinterest explored all of these options and...
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Online |
Sprache: | eng |
Veröffentlicht: |
Erscheinungsort nicht ermittelbar
O'Reilly Media, Inc.
2022
Sebastopol, CA O'Reilly Media Inc. |
Ausgabe: | 1st edition |
Schlagworte: | |
Online Zugang: | https://learning.oreilly.com/library/view/-/0636920672371/?ar |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Facing rapid growth and competition in its online business, Pinterest tech stacks demanded large-scale stateful online processing technology to unlock multiple top initiatives. From lambda functions like micro services to Kafka Stream and Spark streaming, Pinterest explored all of these options and decided to go with a single open source stream processing platform, powered by Apache Flink. Starting from a legacy batch-only data stack and varying levels of expertise among the user team, Pinterest’s stream processing platform has evolved in multiple stages into a reliable platform that enables data-driven products and timely decision making. In the first stage, facing the demand of running large stateful applications, we built connectors that worked with Pinterest infrastructure, and streamlined user education with embedded training to fine tune each use case. In the next stage, we onboarded ads real-time ingestion which predict user match with 50% + accuracy in near real time. We also onboarded tier 1 advertiser spend unified calculation that has very stringent SLO in a very tight timeline. As the platform grew, we had to build a more scalable way to offer non-stream processing or infrastructure background engineered to unblock real-time machine learning use cases. Thus, reprocess, backfill source, live debugging automation, verification, and deployment automation, as well as a dependency injection based job editor, were built into the platform and became widely adopted, empowering a large number of jobs to production in a relatively small amount of time. Eventually, features like unified shopping catalog indexing, content safety and trust, as well as deduplicate 4b+ images on the platform in near real-time were built and launched. And with additional investment in technology and ecosystem teams, Pinterest’s stream processing platform is accelerating transforming machine learning use cases, online processing, and data warehousing into near real-time. This case study reviews how Pinterest chose Apache Flink as the technology behind its stream processing platform, how the platform enabled critical use cases and a user base that scaled out and evolved along with product innovation, and lessons learned in implementing and developing this platform. What you will learn—and how to apply it By the end of this case study the viewer will understand: How and why Pinterest chose Apache Flink over other stream processing offerings How Pinterest uses stream ... |
---|---|
Beschreibung: | 1 Online-Ressource (1 video file, approximately 58 min.) |