Tagged | data-pipeline
-
Auto-Diagnosis and Remediation in Netflix Data Platform
(netflixtechblog.com) -
Presentation: Robust Foundation for Data Pipelines at Scale - Lessons from Netflix
(www.infoq.com) -
Redesigning Etsy’s Machine Learning Platform
(codeascraft.com) -
Evolving LinkedIn’s analytics tech stack
(engineering.linkedin.com) -
Applying the Micro Batching Pattern to Data Transfer
(engineering.salesforce.com)#data-pipeline #software-engineering #software-design #software-architecture
-
4 Key Design Principles and Guarantees of Streaming Databases
(www.confluent.io)#data-pipeline #software-architecture #DBMS #data-engineering
-
Running ML Inference Services in Shared Hosting Environments
(engblog.nextdoor.com) -
Function pipelines: Building functional programming into PostgreSQL using custom operators
(blog.timescale.com) -
Automating Data Protection at Scale, Part 2
(medium.com) -
The Airflow Smart Sensor Service
(medium.com) -
How Airbnb Enables Consistent Data Consumption at Scale
(medium.com)#data-pipeline #software-architecture #scaling #distributed-systems
-
Sourcerer: Data Ingestion at Myntra
(medium.com) -
Streaming Real-Time Analytics with Redis, AWS Fargate, and Dash Framework
(eng.uber.com) -
Infrastructure Design for Real-time Machine Learning Inference
(databricks.com)#data-pipeline #software-engineering #infra #machine-learning
-
Enabling Seamless Kafka Async Queuing with Consumer Proxy
(eng.uber.com)#data-pipeline #software-architecture #distributed-systems #apache-kafka
-
Pinterest’s Analytics as a Platform on Druid (Part 2 of 3)
(medium.com) -
Building Scalable Streaming Pipelines for Near Real-Time Features
(eng.uber.com) -
Pinterest’s Analytics as a Platform on Druid (Part 1 of 3)
(medium.com) -
Data Lineage at Slack
(slack.engineering) -
Lambda Learner: Nearline learning on data streams
(engineering.linkedin.com) -
Real-time Einstein Insights Using Kafka Streams
(engineering.salesforce.com)#data-pipeline #software-engineering #software-architecture #machine-learning
-
Pinot Real-Time Ingestion with Cloud Segment Storage
(eng.uber.com) -
Unified Flink Source at Pinterest: Streaming Data Processing
(stackshare.io) -
Data Movement in Netflix Studio via Data Mesh
(netflixtechblog.com) -
Building Data Pipelines Using Kotlin
(engineering.salesforce.com) -
From daily dashboards to enterprise grade data pipelines
(engineering.linkedin.com) -
Processing ETL tasks with Ratchet
(engineering.grab.com) -
Unified Flink Source at Pinterest: Streaming Data Processing
(medium.com) -
Analyzing Customer Issues to Improve User Experience
(eng.uber.com) -
How Airbnb Measures Future Value to Standardize Tradeoffs
(medium.com) -
Presentation: Designing IoT Data Pipelines for Deep Observability
(www.infoq.com) -
POV: A streaming/communication platform for the data mesh
(blog.octo.com) -
The Evolution of Data Science Workbench
(eng.uber.com) -
Optimizing Analytics Data Processing on eBay’s New Open-Source-Based Platform
(tech.ebayinc.com) -
What actually is a Data Mesh? And is it really a thing?
(blog.scottlogic.com) -
Why Elasticsearch is an indispensable component of the Adyen stack
(www.elastic.co) -
Building a Hyper Self-Service, Distributed Tracing and Feedback System for Rule & Machine Learning (ML) Predictions
(engineering.grab.com)#data-pipeline #infra #machine-learning #scaling #distributed-systems
-
Building a Label-Based Enforcement Pipeline for Trust & Safety
(medium.com) -
Vinted Search Scaling Chapter 4: Query Log
(engineering.vinted.com) -
From Vendor to In-house: How eBay Reimagined Its Analytics Landscape
(tech.ebayinc.com) -
From pipeline to beyond
(tech.gc.com)#data-pipeline #software-engineering #software-architecture #performance
-
Automating Merchant Live Monitoring with Real-Time Analytics: Charon
(eng.uber.com) -
Stream Processing is Nothing Without Action
(www.confluent.io) -
Building Personalisation at Scale
(lambda.grofers.com) -
Performance Testing a data pipeline at scale
(engineering.wingify.com)#data-pipeline #software-engineering #testing #performance #practices
-
Broadcom Modernizes Machine Learning and Anomaly Detection with ksqlDB
(www.confluent.io) -
Turbine: Facebook’s service management platform for stream processing
(engineering.fb.com) -
An Architecture for Secure COVID-19 Contact Tracing
(blog.cloudera.com) -
Preventing Fraud and Fighting Account Takeovers with Kafka Streams
(www.confluent.io)#data-pipeline #software-architecture #security #distributed-systems
-
Classifying 4M Reddit posts in 4k subreddits: an end-to-end machine learning pipeline
(towardsdatascience.com) -
Scaling Machine Learning
(towardsdatascience.com) -
Building a scalable online product recommender with Keras, Docker, GCP, and GKE
(blog.insightdatascience.com)#data-pipeline #software-architecture #infra #machine-learning #cloud
-
Presentation: Monitoring and Tracing @Netflix Streaming Data Infrastructure
(www.infoq.com) -
Presentation: From Spark to Elasticsearch and Back - Learning Large-scale Models for Content Recommendation
(www.infoq.com) -
Knowing PySpark and Kafka: A 100 Million Events Use-Case
(towardsdatascience.com)#data-pipeline #software-engineering #software-architecture #infra
-
MLOps: not as Boring as it Sounds
(itnext.io)#data-pipeline #software-engineering #infra #machine-learning
-
A Scalable Prediction Engine for Automating Structured Data Prep
(towardsdatascience.com)#data-pipeline #data-science #software-engineering #machine-learning
-
Data Sentinel: Automating data validation
(engineering.linkedin.com)#data-pipeline #dev-tools #data-science #software-architecture
-
Under the Hood of Uber ATG’s Machine Learning Infrastructure and Versioning Control Platform for Self-Driving Vehicles
(eng.uber.com)#data-pipeline #software-engineering #infra #machine-learning
-
Tubi: Scaling Up Machine Experimentation with Scylla and Scala
(www.scylladb.com)#data-pipeline #software-engineering #machine-learning #DBMS
-
Beyond fashion: Deep Learning with Catalyst
(evilmartians.com)#data-pipeline #deep-learning #software-engineering #image-processing
-
Supporting Spark as a First-Class Citizen in Yelp’s Computing Platform
(engineeringblog.yelp.com)#data-pipeline #distributed-systems #apache-spark #big-data #backend
-
Building an Adaptive, Multi-Tenant Stream Bus with Kafka and Golang
(eng.lyft.com) -
How to build a real-time fraud detection pipeline using Faust and MLFlow
(towardsdatascience.com) -
How to enable data scientists to stop managing ETL pipelines and get back to doing data science: Part I
(tech.wayfair.com) -
pakkr™ (Part I), One Pipeline to Rule Them All
(medium.com) -
Streaming Machine Learning with Tiered Storage and Without a Data Lake
(www.confluent.io)#data-pipeline #machine-learning #distributed-systems #apache-kafka
-
Streams and Monk – How Yelp is Approaching Kafka in 2020
(engineeringblog.yelp.com) -
Designing a Production-Ready Kappa Architecture for Timely Data Stream Processing
(eng.uber.com)#data-pipeline #software-architecture #distributed-systems #data-engineering
-
Asynchronous Programming : A Cautionary tale
(medium.com) -
Streams and Tables in Apache Kafka: Elasticity, Fault Tolerance, and Other Advanced Concepts
(www.confluent.io) -
Streams and Tables in Apache Kafka: Processing Fundamentals
(www.confluent.io) -
Streams and Tables in Apache Kafka: A Primer
(www.confluent.io) -
Pipeline to the Cloud – Streaming On-Premises Data for Cloud Analytics
(www.confluent.io)#data-pipeline #distributed-systems #apache-kafka #analytics
-
Plumbing At Scale
(engineering.grab.com)#data-pipeline #software-architecture #scaling #distributed-systems
-
Introducing Flyte: Cloud Native Machine Learning and Data Processing Platform
(eng.lyft.com) -
How ads indexing works at Pinterest
(medium.com)#data-pipeline #software-architecture #scaling #data-engineering
-
Streaming Cassandra into Kafka in (Near) Real-Time: Part 2
(engineeringblog.yelp.com)#data-pipeline #distributed-systems #real-time #data-engineering
-
Uber’s Data Platform in 2019: Transforming Information to Intelligence
(eng.uber.com)#data-pipeline #scaling #distributed-systems #data-engineering
-
Scalable and non-blocking event ingestion pipeline? Here’s how
(engineering.vinted.com) -
Presentation: Batch Processing in 2019
(www.infoq.com)#data-pipeline #software-architecture #backend #data-engineering
-
Streaming Cassandra into Kafka in (Near) Real-Time: Part 1
(engineeringblog.yelp.com)#data-pipeline #software-architecture #distributed-systems #data-engineering
-
Eng Blog: Real-time User Signal Serving for Feature Engineering
(medium.com) -
Presentation: Future of Data Engineering
(www.infoq.com) -
Dropbox Predicts What File You Need Next With Content-Specific ML Pipelines
(www.infoq.com) -
Evolution of Data Ingestion and Product Instrumentation at Prezi
(engineering.prezi.com) -
An inside look at LinkedIn’s data pipeline monitoring system
(engineering.linkedin.com)#data-pipeline #software-architecture #monitoring #data-engineering
-
Using Grab’s Trust Counter Service to Detect Fraud Successfully
(engineering.grab.com)#data-pipeline #software-architecture #machine-learning #analytics
-
🚂 On Track with Apache Kafka – Building a Streaming ETL Solution with Rail Data
(www.confluent.io) -
Presentation: Building and Operating a Serverless Data Pipeline
(www.infoq.com)#data-pipeline #software-architecture #scaling #cloud #serialization
-
Real-time experiment analytics at Pinterest using Apache Flink
(medium.com) -
Scaling a Mature Data Pipeline — Managing Overhead
(medium.com)#data-pipeline #software-architecture #scaling #distributed-systems
-
Migrating our ETL pipeline to Luigi on a Cloud
(devblog.songkick.com) -
Low Latency and High Throughput in CAL Ingress
(tech.ebayinc.com) -
Introduction to Stream Mining
(towardsdatascience.com) -
Article: Rethinking Flink’s APIs for a Unified Data Processing Framework
(www.infoq.com) -
A look inside Kafka Mirrormaker 2
(blog.cloudera.com) -
Machine Learning Powered Content Moderation: Computer Vision Applications at Expedia
(towardsdatascience.com) -
Design Decisions for the First Embedded Analytics Open-Source Framework
(blog.statsbot.co)#data-pipeline #software-design #software-architecture #analytics #web
-
Demand Forecasting Tech Stack @ Walmart
(medium.com) -
Presentation: Metrics-driven Machine Learning Development at Salesforce Einstein
(www.infoq.com)#data-pipeline #software-engineering #machine-learning #scaling #production
-
BIG Data, Fast Data - Part I
(www.thoughtworks.com) -
Bringing Data Sources Together with PipelineWise
(tech.transferwise.com) -
Using Graph Processing for Kafka Stream Visualizations
(www.confluent.io)#data-pipeline #apache-kafka #graph-processing #visualisation
-
Cultivating your Data Lake
(stackshare.io)#data-pipeline #software-architecture #infra #data-engineering
-
Auditing Content Features in FollowFeed
(engineering.linkedin.com) -
Building a Fault-Tolerant Data Pipeline for Chatbots
(engineering.salesforce.com)#data-pipeline #software-engineering #software-architecture #distributed-systems #backend
-
Pin2Interest: A scalable system for content classification
(medium.com) -
Data Engineering in Badoo: Handling 20 Billion Events per Day
(www.infoq.com)#data-pipeline #software-architecture #scaling #data-engineering
-
Our Journey to Optimal Job Sizes for Apache Spark
(engineering.salesforce.com)#data-pipeline #software-architecture #distributed-systems #apache-spark #backend
-
Presentation: A Dive Into Streams @LinkedIn With Brooklin
(www.infoq.com)#data-pipeline #stream-processing #software-architecture #backend
-
Improving the scalability of a Spark pipeline for conversion attribution
(medium.com) -
Presentation: Streaming Log Analytics with Kafka
(www.infoq.com) -
Accelerating NiFi flows delivery: Part 1
(blog.octo.com)#data-pipeline #software-architecture #performance #optimisation #data-engineering
-
Open Sourcing Brooklin: Near Real-Time Data Streaming at Scale
(engineering.linkedin.com)#data-pipeline #software-architecture #scaling #distributed-systems #data-engineering
-
Simulacra And Selection
(multithreaded.stitchfix.com) -
Recommendation Systems at Scale — Making Grab’s everyday app super
(towardsdatascience.com) -
Catwalk: Serving Machine Learning Models at Scale
(engineering.grab.com)#data-pipeline #software-architecture #machine-learning #scaling #backend
-
Using Virtual Private Clusters for Testing Apache Samza
(engineering.linkedin.com) -
Distributed Deep Learning Pipelines with PySpark and Keras
(towardsdatascience.com) -
Presentation: Petastorm: A Light-Weight Approach to Building ML Pipelines
(www.infoq.com)#data-pipeline #machine-learning #big-data #data-engineering
-
Building A Scalable Data Management System for Computer Vision Tasks
(medium.com)#data-pipeline #software-architecture #image-processing #data-engineering
-
Maintainable ETLs: Tips for Making Your Pipelines Easier to Support and Extend
(multithreaded.stitchfix.com)#data-pipeline #software-engineering #practices #data-engineering
-
Presentation: Docker Data Science Pipeline
(www.infoq.com)#data-pipeline #software-architecture #docker #distributed-systems #hadoop
-
Introducing LINE Games analytics environment
(engineering.linecorp.com)#data-pipeline #software-architecture #big-data #data-engineering
-
Presentation: Productionizing H2O Models with Apache Spark
(www.infoq.com)#data-pipeline #data-science #software-engineering #machine-learning #apache-spark
-
Preventing Pipeline Calls from Crashing Redis Clusters
(engineering.grab.com) -
Real-time data processing for monitoring and reporting — A practical use case of spark structured…
(medium.com)#data-pipeline #stream-processing #distributed-systems #apache-spark #data-engineering
-
How Natural Language Processing Helps LinkedIn Members Get Support Easily
(engineering.linkedin.com)#data-pipeline #software-architecture #machine-learning #NLP
-
How we reduced the time complexity from 18 days to 4.5 minutes.
(hackernoon.com)#data-pipeline #software-engineering #performance #optimisation
-
Building efficient data pipelines using TensorFlow
(towardsdatascience.com) -
Building Pin cohesion
(medium.com)#data-pipeline #machine-learning #image-processing #search #tensor-flow
-
Presentation: Michelangelo - Machine Learning @Uber
(www.infoq.com)#data-pipeline #data-science #machine-learning #data-engineering
-
Kafka Streams’ Take on Watermarks and Triggers
(www.confluent.io)#data-pipeline #stream-processing #distributed-systems #apache-kafka
-
DBEvents: A Standardized Framework for Efficiently Ingesting Data into Uber’s Apache Hadoop Data Lake
(eng.uber.com)#data-pipeline #distributed-systems #hadoop #data-engineering
-
Bullet Updates - Windowing, Apache Pulsar PubSub, Configuration-based Data Ingestion, and More
(yahooeng.tumblr.com)#data-pipeline #software-architecture #backend #data-engineering
-
How we simplified our Data Ingestion & Transformation Process
(engineering.grab.com)#data-pipeline #software-architecture #distributed-systems #backend
-
How We Built an Automated Anomaly Detection System onto a Streaming Pipeline
(engineering.salesforce.com) -
Managing Uber’s Data Workflows at Scale
(eng.uber.com)#data-pipeline #DBMS #scaling #distributed-systems #big-data
-
Journey to Event Driven – Part 3: The Affinity Between Events, Streams and Serverless
(www.confluent.io) -
Lambda architecture— how to build a Big data pipeline part 1
(towardsdatascience.com) -
Real-Time Streaming and Anomaly detection Pipeline on AWS
(towardsdatascience.com) -
Improving Stream Data Quality with Protobuf Schema Validation
(www.confluent.io) -
Sysmon Security Event Processing in Real Time with KSQL and HELK
(www.confluent.io)#data-pipeline #software-architecture #apache-kafka #real-time
-
Presentation: The Whys and Hows of Database Streaming
(www.infoq.com)#data-pipeline #stream-processing #DBMS #distributed-systems
-
Presentation: Patterns of Streaming Applications
(www.infoq.com)#data-pipeline #stream-processing #software-architecture #distributed-systems
-
Complementary Item Recommendations at eBay Scale
(www.ebayinc.com)#data-pipeline #software-architecture #machine-learning #big-data
-
A Beginner’s Perspective on Kafka Streams: Building Real-Time Walkthrough Detection
(www.confluent.io)#data-pipeline #stream-processing #distributed-systems #apache-kafka
-
Improving Stream Data Quality With Protobuf Schema Validation
(deliveroo.engineering) -
Building a Scalable Event Pipeline with Heroku and Salesforce
(engineering.salesforce.com) -
Bridging Offline and Nearline Computations with Apache Calcite
(engineering.linkedin.com)#data-pipeline #software-architecture #distributed-systems #backend
-
A Lean and Scalable Data Pipeline To Capture Large Scale Events and Support Experimentation Platform
(engineering.grab.com) -
Zendesk ML Model Building Pipeline on AWS Batch: Monitoring and Load Testing
(medium.com)#data-pipeline #software-architecture #machine-learning #AWS
-
Presentation: Designing Automated Pipelines for Unseen Custom Data
(www.infoq.com) -
Data Science Project Flow for Startups
(towardsdatascience.com) -
Presentation: Crisis to Calm: Story of Data Validation @ Netflix
(www.infoq.com) -
Running Apache Airflow At Lyft
(eng.lyft.com)#data-pipeline #software-architecture #distributed-systems #backend
-
Boosting Big Data workloads with Presto Auto Scaling
(www.eventbrite.com) -
Providing Metadata Discovery on Large-Volume Data Sets
(www.ebayinc.com) -
How Pinterest runs Kafka at scale
(medium.com)#data-pipeline #software-architecture #scaling #apache-kafka #backend
-
Scaling Spark Streaming for Logging Event Ingestion
(medium.com) -
Using Apache Kafka to Drive Cutting-Edge Machine Learning
(www.confluent.io)#data-pipeline #software-architecture #machine-learning #backend #systems
-
Measuring What Makes Readers Subscribe to The New York Times
(open.nytimes.com) -
Kafka Connect Deep Dive – Converters and Serialization Explained
(www.confluent.io)#data-pipeline #distributed-systems #apache-kafka #internals #backend
-
Druid @ Airbnb Data Platform
(medium.com)#data-pipeline #software-architecture #analytics #big-data #druid
-
Proactive Data Pipeline Alerting with Pulse
(blog.cloudera.com) -
Peloton: Uber’s Unified Resource Scheduler for Diverse Cluster Workloads
(eng.uber.com)#data-pipeline #software-architecture #distributed-systems #big-data #backend
-
ATM Fraud Detection with Apache Kafka and KSQL
(www.confluent.io) -
Imperative Loop or Functional Stream Pipeline? Beware of the Performance Impact!
(blog.jooq.org) -
Building the Contacts Platform at LinkedIn
(engineering.linkedin.com)#data-pipeline #software-architecture #distributed-systems #backend
-
Open Sourcing Mirus
(engineering.salesforce.com)#data-pipeline #software-architecture #distributed-systems #apache-kafka
-
Introducing Petastorm: Uber ATG’s Data Access Library for Deep Learning
(eng.uber.com)#data-pipeline #deep-learning #software-architecture #big-data
-
Embeddings@Twitter
(blog.twitter.com) -
Converting a Batch Job to Real-time
(blog.wallaroolabs.com) -
Marmaray: An Open Source Generic Data Ingestion and Dispersal Framework and Library for Apache Hadoop
(ubereng.wpengine.com) -
Keystone Real-time Stream Processing Platform
(medium.com)#data-pipeline #stream-processing #software-architecture #real-time
-
Data Wrangling with Apache Kafka and KSQL
(www.confluent.io) -
Serializable ACID Transactions on Streaming Data
(data-artisans.com) -
Multi-Agent Reinforcement Learning in Beer Distribution Game
(towardsdatascience.com) -
Airflow, Meta Data Engineering, and a Data Platform for the World’s Largest Democracy
(hackernoon.com) -
Scaling Uber’s Customer Support Ticket Assistant (COTA) System with Deep Learning
(eng.uber.com)#data-pipeline #deep-learning #data-science #software-architecture
-
Presentation: ML Data Pipelines for Real-time Fraud Prevention @PayPal
(www.infoq.com) -
Utilizing Elixir as a lightweight tool to store real-time metrics data
(blog.wallaroolabs.com) -
Upcoming Improvements to Scylla Streaming Performance
(www.scylladb.com) -
The Dawn of Zendesk’s Machine Learning Model Building Platform with AWS Batch
(medium.com)#data-pipeline #software-architecture #infra #machine-learning
-
M3: Uber’s Open Source, Large-scale Metrics Platform for Prometheus
(eng.uber.com) -
How we build a robust analytics platform using Spark, Kafka and Cassandra Lambda architecture
(medium.com) -
Databook: Turning Big Data into Knowledge with Metadata at Uber
(eng.uber.com) -
Event Triggered Customer Segmentation
(blog.wallaroolabs.com) -
Blueprint: Qualitative and Quantitative Clickstream Event Analysis
(medium.com) -
Building a Graph Data Pipeline With Zeppelin Spark and Neo4j
(towardsdatascience.com) -
Keeping Counts In Sync
(developers.soundcloud.com)#data-pipeline #software-architecture #distributed-systems #apache-kafka
-
Comparing Apache Spark, Storm, Flink and Samza stream processing engines - Part 1
(blog.scottlogic.com) -
Transforming Financial Forecasting with Data Science and Machine Learning at Uber
(eng.uber.com) -
Real-time Streaming Pattern: Triggering Alerts
(blog.wallaroolabs.com) -
How we built a data pipeline with Lambda Architecture using Spark/Spark Streaming
(medium.com)#data-pipeline #distributed-systems #data-stream #apache-spark
-
GUI-fying the Machine Learning Workflow: Towards Rapid Discovery of Viable Pipelines
(towardsdatascience.com) -
Productionizing ML with Workflows at Twitter
(blog.twitter.com)#data-pipeline #software-architecture #machine-learning #big-data
-
Implementing Time Windowing in an Evented Streaming System
(blog.wallaroolabs.com) -
Presentation: Streaming SQL to Unify Batch & Stream Processing w/ Apache Flink @Uberu
(www.infoq.com) -
Presentation: Simplifying ML Workflows with Apache Beam
(www.infoq.com) -
Metacat: Making Big Data Discoverable and Meaningful at Netflix
(medium.com) -
Introducing Commute Time for Jobs
(engineering.linkedin.com) -
Fast Order Search Using Yelp’s Data Pipeline and Elasticsearch
(engineeringblog.yelp.com) -
The EventHorizon Saga
(codeascraft.com) -
Streaming with Wallaroo: Fast Algorithmic Trading Checks
(blog.wallaroolabs.com) -
Looking under the hood of the Eventbrite data pipeline!
(www.eventbrite.com) -
Utilizing MapReduce Combiners and HyperLogLog++ to process millions of queries over datasets with billions of records
(liveramp.com) -
Challenges of monitoring sparse data, and what to do about it.
(engblog.nextdoor.com) -
Many-to-Many Relationships Using Kafka
(jobs.zalando.com)#data-pipeline #stream-processing #microservices #event-driven
-
Confluent.io – Part 2: BUILD A STREAMING PIPELINE
(blog.octo.com) -
New data pipeline management platform at Khan Academy
(engineering.khanacademy.org) -
Gimel: PayPal’s Analytics Data Processing Platform
(www.paypal-engineering.com) -
Optimizing CAL Report Hadoop MapReduce Jobs
(www.ebayinc.com) -
Continuous Deployment with Spark Streaming (Part II)
(eng.wealthfront.com) -
5 tips for architecting fast data applications
(www.oreilly.com) -
A brief introduction to two data processing architectures — Lambda and Kappa for Big Data
(towardsdatascience.com) -
Tuning the Kafka Connect Cassandra Source (part 2)
(medium.com) -
Performance testing a low-latency stream processing system
(blog.wallaroolabs.com) -
HTTP Analytics for 6M requests per second using ClickHouse
(blog.cloudflare.com) -
Air Traffic Controller: Member-First Notifications at LinkedIn
(engineering.linkedin.com) -
Introducing LogFeeder - A log collection system
(engineeringblog.yelp.com) -
Building a scalable ELK stack
(webuild.envato.com) -
How to hack Spark to do some data lineage
(blog.octo.com) -
Creating a musical (data) pipeline
(devblog.songkick.com) -
A Scikit-learn pipeline in Wallaroo
(blog.wallaroolabs.com) -
Making 30x performance improvements on Yelp’s MySQLStreamer
(engineeringblog.yelp.com) -
Idiomatic Python Stream Processing in Wallaroo
(blog.wallaroolabs.com) -
How Apache Kafka Inspired Our Platform Events Architecture
(engineering.salesforce.com)#data-pipeline #software-architecture #apache-kafka #event-driven
-
From big data to fast data
(www.oreilly.com) -
Go Go, Go! Stream Processing for Go
(blog.wallaroolabs.com) -
Return to the Temple of ELK-emental evil, Part 1
(kickstarter.engineering) -
Scaling Gradient Boosted Trees for CTR Prediction - Part I
(engineeringblog.yelp.com)#data-pipeline #data-science #machine-learning #apache-spark
-
Our Journey to a Near Perfect Log Pipeline
(engineering.salesforce.com) -
Surviving Data Loss
(jobs.zalando.com) -
Stateful Multi-Stream Processing in Python with Wallaroo
(blog.wallaroolabs.com)#data-pipeline #stream-processing #distributed-systems #python
-
Running Kafka Streams applications in AWS
(jobs.zalando.com)#data-pipeline #stream-processing #distributed-systems #apache-kafka
-
Building Data Science Pipelines with Luigi and Jupyter Notebooks
(intoli.com) -
Event Stream Analytics at Walmart with Druid
(medium.com) -
Incremental Data Capture for Oracle Databases at LinkedIn: Then and Now
(engineering.linkedin.com) -
Dali Views: Functions as a Service for Big Data
(engineering.linkedin.com) -
Data quality checkers
(drivy.engineering) -
Taking KSQL for a Spin Using Real-time Device Data
(www.confluent.io) -
Why we used Pony to write Wallaroo
(blog.wallaroolabs.com)#data-pipeline #stream-processing #software-architecture #design-choice
-
Publishing with Apache Kafka at The New York Times
(open.nytimes.com) -
Big Data Processing at Spotify: The Road to Scio (Part 1)
(labs.spotify.com) -
Streaming Data Pipelines with Brooklin
(engineering.linkedin.com) -
Introducing AthenaX, Uber Engineering’s Open Source Streaming Analytics Platform
(eng.uber.com) -
Using Kafka Streams API for predictive budgeting
(medium.com) -
Building Complex Data Pipelines with Unified Analytics Platform
(databricks.com) -
SoundCloud's Data Science Process
(developers.soundcloud.com) -
How to Build and Deploy Scalable Machine Learning in Production with Apache Kafka
(www.confluent.io) -
How Stitch Consolidates A Billion Records Per Day
(stackshare.io) -
Zalando Fulfillment Solutions and our FAST Replenishment Algorithm
(jobs.zalando.com) -
Serving Top Comments in Professional Social Networks
(engineering.linkedin.com) -
Semantic Search — Innovation at scale!
(medium.com) -
Stream Processing with Apache Flink and DC/OS
(mesosphere.com) -
Streaming SQL in Apache Flink, KSQL, and Stream Processing for Everyone
(data-artisans.com) -
Steering oceans of content to the world
(code.facebook.com) -
Genie in a Box : Making Spark Easy for Stitch Fix Data Scientists
(multithreaded.stitchfix.com) -
The Simplest Useful Kafka Connect Data Pipeline In The World … or Thereabouts (Part 1)
(www.confluent.io) -
Sankey Diagrams: Six Tools for Visualizing Flow Data
(www.azavea.com) -
Exploring Presto and Zeppelin for fast data analytics and visualization
(medium.com)#data-pipeline #data-visualisation #apache-Zeppelin #prestoDB
-
Cube Planner – Build an Apache Kylin OLAP Cube Efficiently and Intelligently
(www.ebaytechblog.com) -
BigDB - an ad data pipeline for LINE
(engineering.linecorp.com) -
Engineering Uber Trip Distance and Duration Predictions in Real Time with ELK
(eng.uber.com) -
Presto - a small step for DevOps engineer but a big step for BigData analyst
(allegro.tech) -
Delivering Billions of Messages Exactly Once
(segment.com) -
Deep learning on Apache Spark and Apache Hadoop with Deeplearning4j
(blog.cloudera.com) -
The Modern Architecture of Search
(tech.zalando.com) -
Building a Real-Time Streaming ETL Pipeline in 20 Minutes
(www.confluent.io) -
The data engineering ecosystem in 2017
(blog.insightdatascience.com)