Auto-Diagnosis and Remediation in Netflix Data Platform

(netflixtechblog.com)

#data-pipeline #infra #data-engineering

Presentation: Robust Foundation for Data Pipelines at Scale - Lessons from Netflix

(www.infoq.com)

#data-pipeline #performance #scaling #data-engineering

Redesigning Etsy’s Machine Learning Platform

(codeascraft.com)

#data-pipeline #software-architecture #machine-learning

Evolving LinkedIn’s analytics tech stack

(engineering.linkedin.com)

#data-pipeline #infra #analytics #data-engineering

Applying the Micro Batching Pattern to Data Transfer

(engineering.salesforce.com)

#data-pipeline #software-engineering #software-design #software-architecture

4 Key Design Principles and Guarantees of Streaming Databases

(www.confluent.io)

#data-pipeline #software-architecture #DBMS #data-engineering

Running ML Inference Services in Shared Hosting Environments

(engblog.nextdoor.com)

#data-pipeline #infra #devops #machine-learning

Function pipelines: Building functional programming into PostgreSQL using custom operators

(blog.timescale.com)

#data-pipeline #DBMS #PostgreSQL

Automating Data Protection at Scale, Part 2

(medium.com)

#data-pipeline #software-engineering #scaling #practices

The Airflow Smart Sensor Service

(medium.com)

#data-pipeline #distributed-systems #backend

How Airbnb Enables Consistent Data Consumption at Scale

(medium.com)

#data-pipeline #software-architecture #scaling #distributed-systems

Sourcerer: Data Ingestion at Myntra

(medium.com)

#data-pipeline #software-architecture #scaling

Streaming Real-Time Analytics with Redis, AWS Fargate, and Dash Framework

(eng.uber.com)

#data-pipeline #software-architecture #analytics #real-time

Infrastructure Design for Real-time Machine Learning Inference

(databricks.com)

#data-pipeline #software-engineering #infra #machine-learning

Enabling Seamless Kafka Async Queuing with Consumer Proxy

(eng.uber.com)

#data-pipeline #software-architecture #distributed-systems #apache-kafka

Pinterest’s Analytics as a Platform on Druid (Part 2 of 3)

(medium.com)

#data-pipeline #DBMS #analytics

Building Scalable Streaming Pipelines for Near Real-Time Features

(eng.uber.com)

#data-pipeline #scaling #distributed-systems #real-time

Pinterest’s Analytics as a Platform on Druid (Part 1 of 3)

(medium.com)

#data-pipeline #DBMS #analytics

Data Lineage at Slack

(slack.engineering)

#data-pipeline #monitoring #infra

Lambda Learner: Nearline learning on data streams

(engineering.linkedin.com)

#data-pipeline #software-architecture #machine-learning

Real-time Einstein Insights Using Kafka Streams

(engineering.salesforce.com)

#data-pipeline #software-engineering #software-architecture #machine-learning

Pinot Real-Time Ingestion with Cloud Segment Storage

(eng.uber.com)

#data-pipeline #software-architecture #scaling #real-time

Unified Flink Source at Pinterest: Streaming Data Processing

(stackshare.io)

#data-pipeline #software-architecture #distributed-systems

Data Movement in Netflix Studio via Data Mesh

(netflixtechblog.com)

#data-pipeline #dev-tools #infra

Building Data Pipelines Using Kotlin

(engineering.salesforce.com)

#data-pipeline #software-engineering #kotlin

From daily dashboards to enterprise grade data pipelines

(engineering.linkedin.com)

#data-pipeline #software-architecture #data-engineering

Processing ETL tasks with Ratchet

(engineering.grab.com)

#data-pipeline #software-architecture

Unified Flink Source at Pinterest: Streaming Data Processing

(medium.com)

#data-pipeline #software-architecture #data-engineering

Analyzing Customer Issues to Improve User Experience

(eng.uber.com)

#data-pipeline #software-architecture #machine-learning

How Airbnb Measures Future Value to Standardize Tradeoffs

(medium.com)

#data-pipeline #analytics #math #statistics

Presentation: Designing IoT Data Pipelines for Deep Observability

(www.infoq.com)

#data-pipeline #monitoring #devops #IoT

POV: A streaming/communication platform for the data mesh

(blog.octo.com)

#data-pipeline #software-engineering #software-architecture

The Evolution of Data Science Workbench

(eng.uber.com)

#data-pipeline #data-science #software-engineering

Optimizing Analytics Data Processing on eBay’s New Open-Source-Based Platform

(tech.ebayinc.com)

#data-pipeline #analytics #big-data #data-engineering

What actually is a Data Mesh? And is it really a thing?

(blog.scottlogic.com)

#data-pipeline #infra #data-engineering

Why Elasticsearch is an indispensable component of the Adyen stack

(www.elastic.co)

#data-pipeline #infra #search #scaling

Building a Hyper Self-Service, Distributed Tracing and Feedback System for Rule & Machine Learning (ML) Predictions

(engineering.grab.com)

#data-pipeline #infra #machine-learning #scaling #distributed-systems

Building a Label-Based Enforcement Pipeline for Trust & Safety

(medium.com)

#data-pipeline #software-architecture #data-engineering

Vinted Search Scaling Chapter 4: Query Log

(engineering.vinted.com)

#data-pipeline #search #scaling #data-engineering

From Vendor to In-house: How eBay Reimagined Its Analytics Landscape

(tech.ebayinc.com)

#data-pipeline #analytics #data-engineering

From pipeline to beyond

(tech.gc.com)

#data-pipeline #software-engineering #software-architecture #performance

Automating Merchant Live Monitoring with Real-Time Analytics: Charon

(eng.uber.com)

#data-pipeline #software-architecture #analytics #real-time

Stream Processing is Nothing Without Action

(www.confluent.io)

#data-pipeline #software-architecture #apache-kafka

Building Personalisation at Scale

(lambda.grofers.com)

#data-pipeline #data-science #machine-learning

Performance Testing a data pipeline at scale

(engineering.wingify.com)

#data-pipeline #software-engineering #testing #performance #practices

Broadcom Modernizes Machine Learning and Anomaly Detection with ksqlDB

(www.confluent.io)

#data-pipeline #machine-learning #apache-kafka

Turbine: Facebook’s service management platform for stream processing

(engineering.fb.com)

#data-pipeline #scaling #distributed-systems #research

An Architecture for Secure COVID-19 Contact Tracing

(blog.cloudera.com)

#data-pipeline #software-architecture #privacy

Preventing Fraud and Fighting Account Takeovers with Kafka Streams

(www.confluent.io)

#data-pipeline #software-architecture #security #distributed-systems

Classifying 4M Reddit posts in 4k subreddits: an end-to-end machine learning pipeline

(towardsdatascience.com)

#data-pipeline #machine-learning #NLP

Scaling Machine Learning

(towardsdatascience.com)

#data-pipeline #infra #machine-learning #scaling

Building a scalable online product recommender with Keras, Docker, GCP, and GKE

(blog.insightdatascience.com)

#data-pipeline #software-architecture #infra #machine-learning #cloud

Presentation: Monitoring and Tracing @Netflix Streaming Data Infrastructure

(www.infoq.com)

#data-pipeline #monitoring #infra #devops

Presentation: From Spark to Elasticsearch and Back - Learning Large-scale Models for Content Recommendation

(www.infoq.com)

#data-pipeline #infra #machine-learning #big-data

Knowing PySpark and Kafka: A 100 Million Events Use-Case

(towardsdatascience.com)

#data-pipeline #software-engineering #software-architecture #infra

MLOps: not as Boring as it Sounds

(itnext.io)

#data-pipeline #software-engineering #infra #machine-learning

A Scalable Prediction Engine for Automating Structured Data Prep

(towardsdatascience.com)

#data-pipeline #data-science #software-engineering #machine-learning

Data Sentinel: Automating data validation

(engineering.linkedin.com)

#data-pipeline #dev-tools #data-science #software-architecture

Under the Hood of Uber ATG’s Machine Learning Infrastructure and Versioning Control Platform for Self-Driving Vehicles

(eng.uber.com)

#data-pipeline #software-engineering #infra #machine-learning

Tubi: Scaling Up Machine Experimentation with Scylla and Scala

(www.scylladb.com)

#data-pipeline #software-engineering #machine-learning #DBMS

Beyond fashion: Deep Learning with Catalyst

(evilmartians.com)

#data-pipeline #deep-learning #software-engineering #image-processing

Supporting Spark as a First-Class Citizen in Yelp’s Computing Platform

(engineeringblog.yelp.com)

#data-pipeline #distributed-systems #apache-spark #big-data #backend

Building an Adaptive, Multi-Tenant Stream Bus with Kafka and Golang

(eng.lyft.com)

#data-pipeline #distributed-systems #apache-kafka #GoLang

How to build a real-time fraud detection pipeline using Faust and MLFlow

(towardsdatascience.com)

#data-pipeline #data-science #infra #machine-learning

How to enable data scientists to stop managing ETL pipelines and get back to doing data science: Part I

(tech.wayfair.com)

#data-pipeline #data-science #infra #data-engineering

pakkr™ (Part I), One Pipeline to Rule Them All

(medium.com)

#data-pipeline #data-science #machine-learning

Streaming Machine Learning with Tiered Storage and Without a Data Lake

(www.confluent.io)

#data-pipeline #machine-learning #distributed-systems #apache-kafka

Streams and Monk – How Yelp is Approaching Kafka in 2020

(engineeringblog.yelp.com)

#data-pipeline #apache-kafka #backend #data-engineering

Designing a Production-Ready Kappa Architecture for Timely Data Stream Processing

(eng.uber.com)

#data-pipeline #software-architecture #distributed-systems #data-engineering

Asynchronous Programming : A Cautionary tale

(medium.com)

#data-pipeline #software-architecture #microservices #async

Streams and Tables in Apache Kafka: Elasticity, Fault Tolerance, and Other Advanced Concepts

(www.confluent.io)

#data-pipeline #DBMS #distributed-systems #data-engineering

Streams and Tables in Apache Kafka: Processing Fundamentals

(www.confluent.io)

#data-pipeline #DBMS #distributed-systems #apache-kafka

Streams and Tables in Apache Kafka: A Primer

(www.confluent.io)

#data-pipeline #DBMS #apache-kafka #data-engineering

Pipeline to the Cloud – Streaming On-Premises Data for Cloud Analytics

(www.confluent.io)

#data-pipeline #distributed-systems #apache-kafka #analytics

Plumbing At Scale

(engineering.grab.com)

#data-pipeline #software-architecture #scaling #distributed-systems

Introducing Flyte: Cloud Native Machine Learning and Data Processing Platform

(eng.lyft.com)

#data-pipeline #machine-learning #cloud #data-engineering

How ads indexing works at Pinterest

(medium.com)

#data-pipeline #software-architecture #scaling #data-engineering

Streaming Cassandra into Kafka in (Near) Real-Time: Part 2

(engineeringblog.yelp.com)

#data-pipeline #distributed-systems #real-time #data-engineering

Uber’s Data Platform in 2019: Transforming Information to Intelligence

(eng.uber.com)

#data-pipeline #scaling #distributed-systems #data-engineering

Scalable and non-blocking event ingestion pipeline? Here’s how

(engineering.vinted.com)

#data-pipeline #software-architecture #scaling

Presentation: Batch Processing in 2019

(www.infoq.com)

#data-pipeline #software-architecture #backend #data-engineering

Streaming Cassandra into Kafka in (Near) Real-Time: Part 1

(engineeringblog.yelp.com)

#data-pipeline #software-architecture #distributed-systems #data-engineering

Eng Blog: Real-time User Signal Serving for Feature Engineering

(medium.com)

#data-pipeline #software-architecture #real-time #backend

Presentation: Future of Data Engineering

(www.infoq.com)

#data-pipeline #DBMS #scaling #data-engineering

Dropbox Predicts What File You Need Next With Content-Specific ML Pipelines

(www.infoq.com)

#data-pipeline #data-science #machine-learning

Evolution of Data Ingestion and Product Instrumentation at Prezi

(engineering.prezi.com)

#data-pipeline #software-design #software-architecture

An inside look at LinkedIn’s data pipeline monitoring system

(engineering.linkedin.com)

#data-pipeline #software-architecture #monitoring #data-engineering

Using Grab’s Trust Counter Service to Detect Fraud Successfully

(engineering.grab.com)

#data-pipeline #software-architecture #machine-learning #analytics

🚂 On Track with Apache Kafka – Building a Streaming ETL Solution with Rail Data

(www.confluent.io)

#data-pipeline #scaling #distributed-systems #apache-kafka

Presentation: Building and Operating a Serverless Data Pipeline

(www.infoq.com)

#data-pipeline #software-architecture #scaling #cloud #serialization

Real-time experiment analytics at Pinterest using Apache Flink

(medium.com)

#data-pipeline #distributed-systems #analytics #big-data

Scaling a Mature Data Pipeline — Managing Overhead

(medium.com)

#data-pipeline #software-architecture #scaling #distributed-systems

Migrating our ETL pipeline to Luigi on a Cloud

(devblog.songkick.com)

#data-pipeline #software-architecture #migration #cloud

Low Latency and High Throughput in CAL Ingress

(tech.ebayinc.com)

#data-pipeline #software-architecture #performance #systems

Introduction to Stream Mining

(towardsdatascience.com)

#data-pipeline #stream-processing #data-science #analytics

Article: Rethinking Flink’s APIs for a Unified Data Processing Framework

(www.infoq.com)

#data-pipeline #stream-processing #software-architecture

A look inside Kafka Mirrormaker 2

(blog.cloudera.com)

#data-pipeline #distributed-systems #apache-kafka

Machine Learning Powered Content Moderation: Computer Vision Applications at Expedia

(towardsdatascience.com)

#data-pipeline #AI #machine-learning #image-processing

Design Decisions for the First Embedded Analytics Open-Source Framework

(blog.statsbot.co)

#data-pipeline #software-design #software-architecture #analytics #web

Demand Forecasting Tech Stack @ Walmart

(medium.com)

#data-pipeline #data-science #software-architecture #infra

Presentation: Metrics-driven Machine Learning Development at Salesforce Einstein

(www.infoq.com)

#data-pipeline #software-engineering #machine-learning #scaling #production

BIG Data, Fast Data - Part I

(www.thoughtworks.com)

#data-pipeline #IoT #networking #big-data #real-time

Bringing Data Sources Together with PipelineWise

(tech.transferwise.com)

#data-pipeline #software-architecture #backend

Using Graph Processing for Kafka Stream Visualizations

(www.confluent.io)

#data-pipeline #apache-kafka #graph-processing #visualisation

Cultivating your Data Lake

(stackshare.io)

#data-pipeline #software-architecture #infra #data-engineering

Auditing Content Features in FollowFeed

(engineering.linkedin.com)

#data-pipeline #software-architecture #infra #backend

Building a Fault-Tolerant Data Pipeline for Chatbots

(engineering.salesforce.com)

#data-pipeline #software-engineering #software-architecture #distributed-systems #backend

Pin2Interest: A scalable system for content classification

(medium.com)

#data-pipeline #data-science #NLP #big-data

Data Engineering in Badoo: Handling 20 Billion Events per Day

(www.infoq.com)

#data-pipeline #software-architecture #scaling #data-engineering

Our Journey to Optimal Job Sizes for Apache Spark

(engineering.salesforce.com)

#data-pipeline #software-architecture #distributed-systems #apache-spark #backend

Presentation: A Dive Into Streams @LinkedIn With Brooklin

(www.infoq.com)

#data-pipeline #stream-processing #software-architecture #backend

Improving the scalability of a Spark pipeline for conversion attribution

(medium.com)

#data-pipeline #algorithms #performance #data-engineering

Presentation: Streaming Log Analytics with Kafka

(www.infoq.com)

#data-pipeline #distributed-systems #apache-kafka #backend

Accelerating NiFi flows delivery: Part 1

(blog.octo.com)

#data-pipeline #software-architecture #performance #optimisation #data-engineering

Open Sourcing Brooklin: Near Real-Time Data Streaming at Scale

(engineering.linkedin.com)

#data-pipeline #software-architecture #scaling #distributed-systems #data-engineering

Simulacra And Selection

(multithreaded.stitchfix.com)

#data-pipeline #data-science #math

Recommendation Systems at Scale — Making Grab’s everyday app super

(towardsdatascience.com)

#data-pipeline #data-science #scaling #analytics

Catwalk: Serving Machine Learning Models at Scale

(engineering.grab.com)

#data-pipeline #software-architecture #machine-learning #scaling #backend

Using Virtual Private Clusters for Testing Apache Samza

(engineering.linkedin.com)

#data-pipeline #software-engineering #testing #backend

Distributed Deep Learning Pipelines with PySpark and Keras

(towardsdatascience.com)

#data-pipeline #deep-learning #distributed-systems #python

Presentation: Petastorm: A Light-Weight Approach to Building ML Pipelines

(www.infoq.com)

#data-pipeline #machine-learning #big-data #data-engineering

Building A Scalable Data Management System for Computer Vision Tasks

(medium.com)

#data-pipeline #software-architecture #image-processing #data-engineering

Maintainable ETLs: Tips for Making Your Pipelines Easier to Support and Extend

(multithreaded.stitchfix.com)

#data-pipeline #software-engineering #practices #data-engineering

Presentation: Docker Data Science Pipeline

(www.infoq.com)

#data-pipeline #software-architecture #docker #distributed-systems #hadoop

Introducing LINE Games analytics environment

(engineering.linecorp.com)

#data-pipeline #software-architecture #big-data #data-engineering

Presentation: Productionizing H2O Models with Apache Spark

(www.infoq.com)

#data-pipeline #data-science #software-engineering #machine-learning #apache-spark

Preventing Pipeline Calls from Crashing Redis Clusters

(engineering.grab.com)

#data-pipeline #debugging #software-architecture #redis

Real-time data processing for monitoring and reporting — A practical use case of spark structured…

(medium.com)

#data-pipeline #stream-processing #distributed-systems #apache-spark #data-engineering

How Natural Language Processing Helps LinkedIn Members Get Support Easily

(engineering.linkedin.com)

#data-pipeline #software-architecture #machine-learning #NLP

How we reduced the time complexity from 18 days to 4.5 minutes.

(hackernoon.com)

#data-pipeline #software-engineering #performance #optimisation

Building efficient data pipelines using TensorFlow

(towardsdatascience.com)

#data-pipeline #machine-learning #performance #tensor-flow

Building Pin cohesion

(medium.com)

#data-pipeline #machine-learning #image-processing #search #tensor-flow

Presentation: Michelangelo - Machine Learning @Uber

(www.infoq.com)

#data-pipeline #data-science #machine-learning #data-engineering

Kafka Streams’ Take on Watermarks and Triggers

(www.confluent.io)

#data-pipeline #stream-processing #distributed-systems #apache-kafka

DBEvents: A Standardized Framework for Efficiently Ingesting Data into Uber’s Apache Hadoop Data Lake

(eng.uber.com)

#data-pipeline #distributed-systems #hadoop #data-engineering

Bullet Updates - Windowing, Apache Pulsar PubSub, Configuration-based Data Ingestion, and More

(yahooeng.tumblr.com)

#data-pipeline #software-architecture #backend #data-engineering

How we simplified our Data Ingestion & Transformation Process

(engineering.grab.com)

#data-pipeline #software-architecture #distributed-systems #backend

How We Built an Automated Anomaly Detection System onto a Streaming Pipeline

(engineering.salesforce.com)

#data-pipeline #stream-processing #automation #backend

Managing Uber’s Data Workflows at Scale

(eng.uber.com)

#data-pipeline #DBMS #scaling #distributed-systems #big-data

Journey to Event Driven – Part 3: The Affinity Between Events, Streams and Serverless

(www.confluent.io)

#data-pipeline #serverless #apache-kafka #event-driven

Lambda architecture— how to build a Big data pipeline part 1

(towardsdatascience.com)

#data-pipeline #software-architecture #big-data

Real-Time Streaming and Anomaly detection Pipeline on AWS

(towardsdatascience.com)

#data-pipeline #stream-processing #AWS #real-time

Improving Stream Data Quality with Protobuf Schema Validation

(www.confluent.io)

#data-pipeline #stream-processing #software-architecture

Sysmon Security Event Processing in Real Time with KSQL and HELK

(www.confluent.io)

#data-pipeline #software-architecture #apache-kafka #real-time

Presentation: The Whys and Hows of Database Streaming

(www.infoq.com)

#data-pipeline #stream-processing #DBMS #distributed-systems

Presentation: Patterns of Streaming Applications

(www.infoq.com)

#data-pipeline #stream-processing #software-architecture #distributed-systems

Complementary Item Recommendations at eBay Scale

(www.ebayinc.com)

#data-pipeline #software-architecture #machine-learning #big-data

A Beginner’s Perspective on Kafka Streams: Building Real-Time Walkthrough Detection

(www.confluent.io)

#data-pipeline #stream-processing #distributed-systems #apache-kafka

Improving Stream Data Quality With Protobuf Schema Validation

(deliveroo.engineering)

#data-pipeline #data-stream #backend

Building a Scalable Event Pipeline with Heroku and Salesforce

(engineering.salesforce.com)

#data-pipeline #scaling #apache-kafka #event-driven

Bridging Offline and Nearline Computations with Apache Calcite

(engineering.linkedin.com)

#data-pipeline #software-architecture #distributed-systems #backend

A Lean and Scalable Data Pipeline To Capture Large Scale Events and Support Experimentation Platform

(engineering.grab.com)

#data-pipeline #software-architecture #scaling #backend

Zendesk ML Model Building Pipeline on AWS Batch: Monitoring and Load Testing

(medium.com)

#data-pipeline #software-architecture #machine-learning #AWS

Presentation: Designing Automated Pipelines for Unseen Custom Data

(www.infoq.com)

#data-pipeline #automation #machine-learning #big-data

Data Science Project Flow for Startups

(towardsdatascience.com)

#data-pipeline #data-science #software-engineering

Presentation: Crisis to Calm: Story of Data Validation @ Netflix

(www.infoq.com)

#data-pipeline #scaling #backend #availability

Running Apache Airflow At Lyft

(eng.lyft.com)

#data-pipeline #software-architecture #distributed-systems #backend

Boosting Big Data workloads with Presto Auto Scaling

(www.eventbrite.com)

#data-pipeline #infra #scaling #big-data

Providing Metadata Discovery on Large-Volume Data Sets

(www.ebayinc.com)

#data-pipeline #search #analytics #big-data

How Pinterest runs Kafka at scale

(medium.com)

#data-pipeline #software-architecture #scaling #apache-kafka #backend

Scaling Spark Streaming for Logging Event Ingestion

(medium.com)

#data-pipeline #logging #apache-spark #event-driven

Using Apache Kafka to Drive Cutting-Edge Machine Learning

(www.confluent.io)

#data-pipeline #software-architecture #machine-learning #backend #systems

Measuring What Makes Readers Subscribe to The New York Times

(open.nytimes.com)

#data-pipeline #analytics #data-modeling

Kafka Connect Deep Dive – Converters and Serialization Explained

(www.confluent.io)

#data-pipeline #distributed-systems #apache-kafka #internals #backend

Druid @ Airbnb Data Platform

(medium.com)

#data-pipeline #software-architecture #analytics #big-data #druid

Proactive Data Pipeline Alerting with Pulse

(blog.cloudera.com)

#data-pipeline #monitoring #infra #logging

Peloton: Uber’s Unified Resource Scheduler for Diverse Cluster Workloads

(eng.uber.com)

#data-pipeline #software-architecture #distributed-systems #big-data #backend

ATM Fraud Detection with Apache Kafka and KSQL

(www.confluent.io)

#data-pipeline #machine-learning #apache-kafka #SQL

Imperative Loop or Functional Stream Pipeline? Beware of the Performance Impact!

(blog.jooq.org)

#data-pipeline #performance #data-stream #java

Building the Contacts Platform at LinkedIn

(engineering.linkedin.com)

#data-pipeline #software-architecture #distributed-systems #backend

Open Sourcing Mirus

(engineering.salesforce.com)

#data-pipeline #software-architecture #distributed-systems #apache-kafka

Introducing Petastorm: Uber ATG’s Data Access Library for Deep Learning

(eng.uber.com)

#data-pipeline #deep-learning #software-architecture #big-data

Embeddings@Twitter

(blog.twitter.com)

#data-pipeline #machine-learning #NLP

Converting a Batch Job to Real-time

(blog.wallaroolabs.com)

#data-pipeline #stream-processing #real-time

Marmaray: An Open Source Generic Data Ingestion and Dispersal Framework and Library for Apache Hadoop

(ubereng.wpengine.com)

#data-pipeline #software-architecture #big-data #hadoop

Keystone Real-time Stream Processing Platform

(medium.com)

#data-pipeline #stream-processing #software-architecture #real-time

Data Wrangling with Apache Kafka and KSQL

(www.confluent.io)

#data-pipeline #data-stream #big-data

Serializable ACID Transactions on Streaming Data

(data-artisans.com)

#data-pipeline #stream-processing #software-architecture

Multi-Agent Reinforcement Learning in Beer Distribution Game

(towardsdatascience.com)

#data-pipeline #data-science #algorithms

Airflow, Meta Data Engineering, and a Data Platform for the World’s Largest Democracy

(hackernoon.com)

#data-pipeline #big-data #python #backend

Scaling Uber’s Customer Support Ticket Assistant (COTA) System with Deep Learning

(eng.uber.com)

#data-pipeline #deep-learning #data-science #software-architecture

Presentation: ML Data Pipelines for Real-time Fraud Prevention @PayPal

(www.infoq.com)

#data-pipeline #machine-learning #security

Utilizing Elixir as a lightweight tool to store real-time metrics data

(blog.wallaroolabs.com)

#data-pipeline #elixir #real-time

Upcoming Improvements to Scylla Streaming Performance

(www.scylladb.com)

#data-pipeline #stream-processing #performance #scyllaDB

The Dawn of Zendesk’s Machine Learning Model Building Platform with AWS Batch

(medium.com)

#data-pipeline #software-architecture #infra #machine-learning

M3: Uber’s Open Source, Large-scale Metrics Platform for Prometheus

(eng.uber.com)

#data-pipeline #monitoring #analytics #systems

How we build a robust analytics platform using Spark, Kafka and Cassandra Lambda architecture

(medium.com)

#data-pipeline #software-architecture #infra #big-data

Databook: Turning Big Data into Knowledge with Metadata at Uber

(eng.uber.com)

#data-pipeline #software-architecture #big-data

Event Triggered Customer Segmentation

(blog.wallaroolabs.com)

#data-pipeline #data-stream #event-driven

Blueprint: Qualitative and Quantitative Clickstream Event Analysis

(medium.com)

#data-pipeline #analytics #event-driven

Building a Graph Data Pipeline With Zeppelin Spark and Neo4j

(towardsdatascience.com)

#data-pipeline #big-data #graph

Keeping Counts In Sync

(developers.soundcloud.com)

#data-pipeline #software-architecture #distributed-systems #apache-kafka

Comparing Apache Spark, Storm, Flink and Samza stream processing engines - Part 1

(blog.scottlogic.com)

#data-pipeline #stream-processing

Transforming Financial Forecasting with Data Science and Machine Learning at Uber

(eng.uber.com)

#data-pipeline #data-science #machine-learning

Real-time Streaming Pattern: Triggering Alerts

(blog.wallaroolabs.com)

#data-pipeline #data-stream #real-time

How we built a data pipeline with Lambda Architecture using Spark/Spark Streaming

(medium.com)

#data-pipeline #distributed-systems #data-stream #apache-spark

GUI-fying the Machine Learning Workflow: Towards Rapid Discovery of Viable Pipelines

(towardsdatascience.com)

#data-pipeline #infra #machine-learning

Productionizing ML with Workflows at Twitter

(blog.twitter.com)

#data-pipeline #software-architecture #machine-learning #big-data

Implementing Time Windowing in an Evented Streaming System

(blog.wallaroolabs.com)

#data-pipeline #stream-processing #event-driven

Presentation: Streaming SQL to Unify Batch & Stream Processing w/ Apache Flink @Uberu

(www.infoq.com)

#data-pipeline #stream-processing #SQL

Presentation: Simplifying ML Workflows with Apache Beam

(www.infoq.com)

#data-pipeline #machine-learning #big-data

Metacat: Making Big Data Discoverable and Meaningful at Netflix

(medium.com)

#data-pipeline #infra #backed

Introducing Commute Time for Jobs

(engineering.linkedin.com)

#data-pipeline #big-data #GeoData #maps

Fast Order Search Using Yelp’s Data Pipeline and Elasticsearch

(engineeringblog.yelp.com)

#data-pipeline #search #elastisearch

The EventHorizon Saga

(codeascraft.com)

#data-pipeline #event-queue #backend

Streaming with Wallaroo: Fast Algorithmic Trading Checks

(blog.wallaroolabs.com)

#data-pipeline #data-stream

Looking under the hood of the Eventbrite data pipeline!

(www.eventbrite.com)

#data-pipeline #software-architecture #big-data #backend

Utilizing MapReduce Combiners and HyperLogLog++ to process millions of queries over datasets with billions of records

(liveramp.com)

#data-pipeline #big-data

Challenges of monitoring sparse data, and what to do about it.

(engblog.nextdoor.com)

#data-pipeline #monitoring #analytics

Many-to-Many Relationships Using Kafka

(jobs.zalando.com)

#data-pipeline #stream-processing #microservices #event-driven

Confluent.io – Part 2: BUILD A STREAMING PIPELINE

(blog.octo.com)

#data-pipeline #stream-processing #apache-kafka

New data pipeline management platform at Khan Academy

(engineering.khanacademy.org)

#data-pipeline #software-engineering #infra

Gimel: PayPal’s Analytics Data Processing Platform

(www.paypal-engineering.com)

#data-pipeline #data-analytics

Optimizing CAL Report Hadoop MapReduce Jobs

(www.ebayinc.com)

#data-pipeline #distributed-systems #hadoop

Continuous Deployment with Spark Streaming (Part II)

(eng.wealthfront.com)

#data-pipeline #automation #CI-CD #apache-spark

5 tips for architecting fast data applications

(www.oreilly.com)

#data-pipeline #software-architecture #performance #backend

A brief introduction to two data processing architectures — Lambda and Kappa for Big Data

(towardsdatascience.com)

#data-pipeline #big-data

Tuning the Kafka Connect Cassandra Source (part 2)

(medium.com)

#data-pipeline #apache-kafka #cassandra

Performance testing a low-latency stream processing system

(blog.wallaroolabs.com)

#data-pipeline #stream-processing #testing #performance

HTTP Analytics for 6M requests per second using ClickHouse

(blog.cloudflare.com)

#data-pipeline #analytics #backend #data-center

Air Traffic Controller: Member-First Notifications at LinkedIn

(engineering.linkedin.com)

#data-pipeline #software-architecture #backend

Introducing LogFeeder - A log collection system

(engineeringblog.yelp.com)

#data-pipeline #logging #elastisearch

Building a scalable ELK stack

(webuild.envato.com)

#data-pipeline #data-stream #event-queue #elastisearch

How to hack Spark to do some data lineage

(blog.octo.com)

#data-pipeline #apache-spark #big-data #hadoop

Creating a musical (data) pipeline

(devblog.songkick.com)

#data-pipeline #big-data

A Scikit-learn pipeline in Wallaroo

(blog.wallaroolabs.com)

#data-pipeline #machine-learning #python

Making 30x performance improvements on Yelp’s MySQLStreamer

(engineeringblog.yelp.com)

#data-pipeline #DBMS #optimisation #MySql

Idiomatic Python Stream Processing in Wallaroo

(blog.wallaroolabs.com)

#data-pipeline #stream-processing #python

How Apache Kafka Inspired Our Platform Events Architecture

(engineering.salesforce.com)

#data-pipeline #software-architecture #apache-kafka #event-driven

From big data to fast data

(www.oreilly.com)

#data-pipeline #analytics #big-data

Go Go, Go! Stream Processing for Go

(blog.wallaroolabs.com)

#data-pipeline #stream-processing #GoLang

Return to the Temple of ELK-emental evil, Part 1

(kickstarter.engineering)

#data-pipeline #AWS #cloud #elastisearch

Scaling Gradient Boosted Trees for CTR Prediction - Part I

(engineeringblog.yelp.com)

#data-pipeline #data-science #machine-learning #apache-spark

Our Journey to a Near Perfect Log Pipeline

(engineering.salesforce.com)

#data-pipeline #infra #logging #backend

Surviving Data Loss

(jobs.zalando.com)

#data-pipeline #AWS #apache-kafka #apache-zookeeper

Stateful Multi-Stream Processing in Python with Wallaroo

(blog.wallaroolabs.com)

#data-pipeline #stream-processing #distributed-systems #python

Running Kafka Streams applications in AWS

(jobs.zalando.com)

#data-pipeline #stream-processing #distributed-systems #apache-kafka

Building Data Science Pipelines with Luigi and Jupyter Notebooks

(intoli.com)

#data-pipeline #data-science #visualisation #Jupyter

Event Stream Analytics at Walmart with Druid

(medium.com)

#data-pipeline #data-stream #analytics #druid

Incremental Data Capture for Oracle Databases at LinkedIn: Then and Now

(engineering.linkedin.com)

#data-pipeline #infra #big-data #backend

Dali Views: Functions as a Service for Big Data

(engineering.linkedin.com)

#data-pipeline #software-architecture #big-data

Data quality checkers

(drivy.engineering)

#data-pipeline #devops #CI-CD

Taking KSQL for a Spin Using Real-time Device Data

(www.confluent.io)

#data-pipeline #stream-processing #apache-kafka #SQL

Why we used Pony to write Wallaroo

(blog.wallaroolabs.com)

#data-pipeline #stream-processing #software-architecture #design-choice

Publishing with Apache Kafka at The New York Times

(open.nytimes.com)

#data-pipeline #infra #apache-kafka #backend

Big Data Processing at Spotify: The Road to Scio (Part 1)

(labs.spotify.com)

#data-pipeline #big-data #scala

Streaming Data Pipelines with Brooklin

(engineering.linkedin.com)

#data-pipeline #stream-processing #backend

Introducing AthenaX, Uber Engineering’s Open Source Streaming Analytics Platform

(eng.uber.com)

#data-pipeline #infra #data-stream #backend

Using Kafka Streams API for predictive budgeting

(medium.com)

#data-pipeline #stream-processing #apache-kafka #big-data

Building Complex Data Pipelines with Unified Analytics Platform

(databricks.com)

#data-pipeline #software-architecture #infra #data-stream

SoundCloud's Data Science Process

(developers.soundcloud.com)

#data-pipeline #software-engineering #data-analytics

How to Build and Deploy Scalable Machine Learning in Production with Apache Kafka

(www.confluent.io)

#data-pipeline #machine-learning #apache-kafka

How Stitch Consolidates A Billion Records Per Day

(stackshare.io)

#data-pipeline #microservices #data-stream #backend

Zalando Fulfillment Solutions and our FAST Replenishment Algorithm

(jobs.zalando.com)

#data-pipeline #algorithms #data-analytics

Serving Top Comments in Professional Social Networks

(engineering.linkedin.com)

#data-pipeline #machine-learning #data-analytics

Semantic Search — Innovation at scale!

(medium.com)

#data-pipeline #search

Stream Processing with Apache Flink and DC/OS

(mesosphere.com)

#data-pipeline #stream-processing #distributed-systems

Streaming SQL in Apache Flink, KSQL, and Stream Processing for Everyone

(data-artisans.com)

#data-pipeline #stream-processing #analytics #real-time

Steering oceans of content to the world

(code.facebook.com)

#data-pipeline #infra #big-data #backend

Genie in a Box &colon; Making Spark Easy for Stitch Fix Data Scientists

(multithreaded.stitchfix.com)

#data-pipeline #infra #backend

The Simplest Useful Kafka Connect Data Pipeline In The World … or Thereabouts (Part 1)

(www.confluent.io)

#data-pipeline #distributed-systems #apache-kafka

Sankey Diagrams: Six Tools for Visualizing Flow Data

(www.azavea.com)

#data-pipeline #data-visualisation #analytics

Exploring Presto and Zeppelin for fast data analytics and visualization

(medium.com)

#data-pipeline #data-visualisation #apache-Zeppelin #prestoDB

Cube Planner – Build an Apache Kylin OLAP Cube Efficiently and Intelligently

(www.ebaytechblog.com)

#data-pipeline #data-analytics #OLAP

BigDB - an ad data pipeline for LINE

(engineering.linecorp.com)

#data-pipeline #infra #DBMS #big-data

Engineering Uber Trip Distance and Duration Predictions in Real Time with ELK

(eng.uber.com)

#data-pipeline #machine-learning #algorithms

Presto - a small step for DevOps engineer but a big step for BigData analyst

(allegro.tech)

#data-pipeline #infra #DBMS #distributed-systems

Delivering Billions of Messages Exactly Once

(segment.com)

#data-pipeline #software-architecture #web-backend #scaling

Deep learning on Apache Spark and Apache Hadoop with Deeplearning4j

(blog.cloudera.com)

#data-pipeline #deep-learning #noSQL

The Modern Architecture of Search

(tech.zalando.com)

#data-pipeline #search #information-retrieval

Building a Real-Time Streaming ETL Pipeline in 20 Minutes

(www.confluent.io)

#dev #data-pipeline #ETL #stream-processing

The data engineering ecosystem in 2017

(blog.insightdatascience.com)

#dev #data-pipeline #data-science

Tagged | data-pipeline