Tagged | data-engineering
-
Designing Tinder
(highscalability.com)#software-design #software-architecture #infra #data-engineering
-
Auto-Diagnosis and Remediation in Netflix Data Platform
(netflixtechblog.com) -
Improving Reliability: Building a Vitess Balancer to Minimize MySQL Downtime
(product.hubspot.com) -
Pinterest Druid Holiday Load Testing
(stackshare.io) -
Presentation: Robust Foundation for Data Pipelines at Scale - Lessons from Netflix
(www.infoq.com) -
Evolving LinkedIn’s analytics tech stack
(engineering.linkedin.com) -
eBay’s Global Secondary Indexes
(tech.ebayinc.com) -
MemQ: An efficient, scalable cloud native PubSub system
(medium.com)#software-architecture #scaling #distributed-systems #data-engineering
-
How Uber Migrated Financial Data from DynamoDB to Docstore
(eng.uber.com)#software-engineering #software-architecture #DBMS #migration #data-engineering
-
Scaling Apache Druid for Real-Time Cloud Analytics at Confluent
(www.confluent.io) -
4 Key Design Principles and Guarantees of Streaming Databases
(www.confluent.io)#data-pipeline #software-architecture #DBMS #data-engineering
-
CarbonJ: A high performance, high-scale, drop-in replacement for carbon-cache and carbon-relay
(engineering.salesforce.com)#software-architecture #scaling #distributed-systems #data-engineering
-
How we built a forever-free serverless SQL database
(www.cockroachlabs.com) -
Introducing uGroup: Uber’s Consumer Management Framework
(eng.uber.com)#software-architecture #distributed-systems #data-engineering
-
Processing billions of events in real time at Twitter
(blog.twitter.com)#software-architecture #scaling #distributed-systems #data-engineering
-
How to ETL at Petabyte-Scale with Trino
(engineering.salesforce.com) -
Improving HDFS I/O Utilization for Efficiency
(eng.uber.com)#performance #distributed-systems #big-data #data-engineering
-
Scaling indexing and search - Algolia New Search Architecture Part 2
(highscalability.com) -
An Engineer's Guide to Building a Database for Data-Intensive Applications
(www.singlestore.com) -
Evolution of Region Assignment in the Apache HBase Architecture — Part 3
(engineering.salesforce.com)#performance #scaling #distributed-systems #data-engineering
-
Search indexing optimisation
(engineering.grab.com) -
Real-Time Exactly-Once Ad Event Processing with Apache Flink, Kafka, and Pinot
(eng.uber.com)#software-architecture #distributed-systems #data-engineering
-
Migrating to Elasticsearch with dense vector for Carousell Spotlight search engine
(medium.com) -
Scaling LinkedIn's Hadoop YARN cluster beyond 10,000 nodes
(engineering.linkedin.com) -
Jellyfish: Cost-Effective Data Tiering for Uber’s Largest Storage System
(eng.uber.com)#software-architecture #performance #distributed-systems #data-engineering
-
Pinterest’s Analytics as a Platform on Druid (Part 3 of 3)
(medium.com) -
Cost-Efficient Open Source Big Data Platform at Uber
(eng.uber.com)#optimisation #distributed-systems #big-data #data-engineering
-
Challenges and Opportunities to Dramatically Reduce the Cost of Uber’s Big Data
(eng.uber.com) -
How we built a general purpose key value store for Facebook with ZippyDB
(engineering.fb.com) -
How we scaled the size of Pinterest’s ad corpus by 60x
(medium.com) -
How Airbnb Built “Wall” to prevent data bugs
(medium.com)#software-design #software-architecture #QA #data-engineering
-
The Antidote for Data Architecture Complexity: A Unified Database
(www.singlestore.com) -
How to Identify and Tune a Problematic Query with SQL EXPLAIN
(www.cockroachlabs.com) -
‘Orders Near You’ and User-Facing Analytics on Real-Time Geospatial Data
(eng.uber.com) -
Building scalable near-real time indexing on HBase
(medium.com) -
From daily dashboards to enterprise grade data pipelines
(engineering.linkedin.com) -
Unified Flink Source at Pinterest: Streaming Data Processing
(medium.com) -
Interactive Querying with Apache Spark SQL at Pinterest
(medium.com) -
Improving data processing efficiency using partial deserialization of Thrift
(medium.com) -
Article: Building Latency Sensitive User Facing Analytics via Apache Pinot
(www.infoq.com)#distributed-systems #analytics #real-time #data-engineering
-
Scaling Worldwide Parcel Logistics with SingleStore and Vectorized
(www.singlestore.com) -
Grab App at Scale with Scylla
(www.scylladb.com) -
Consolidating Facebook storage infrastructure with Tectonic file system
(engineering.fb.com) -
Giving the power of data in hands of your data analyst
(lambda.grofers.com) -
Realtime and databases — a discussion on coupling versus modularity
(ably.com) -
Optimisation using Sparklens
(medium.com) -
Building a Version-Controlled Data Aquarium
(benchling.engineering) -
Sharding, simplification, and Twitter’s ads serving platform
(blog.twitter.com) -
How We Built REGIONAL BY ROW for Row-Level Data Homing & Distribution
(www.cockroachlabs.com) -
Optimizing Analytics Data Processing on eBay’s New Open-Source-Based Platform
(tech.ebayinc.com) -
What actually is a Data Mesh? And is it really a thing?
(blog.scottlogic.com) -
Integrated BlobDB
(rocksdb.org) -
The exabyte club: LinkedIn’s journey of scaling the Hadoop Distributed File System
(engineering.linkedin.com)#scaling #distributed-systems #analytics #big-data #data-engineering
-
Building a Label-Based Enforcement Pipeline for Trust & Safety
(medium.com) -
Vinted Search Scaling Chapter 4: Query Log
(engineering.vinted.com) -
From Vendor to In-house: How eBay Reimagined Its Analytics Landscape
(tech.ebayinc.com) -
Shallow Mirror
(medium.com) -
Presentation: Change Data Capture for Distributed Databases @Netflix
(www.infoq.com) -
Let me automate that for you II, Electric Bugaloo
(tech.gc.com)#software-engineering #automation #performance #data-engineering
-
How we made DISTINCT queries up to 8000x faster on PostgreSQL
(blog.timescale.com) -
How Airbnb Achieved Metric Consistency at Scale
(medium.com)#software-architecture #distributed-systems #analytics #data-engineering
-
On Coordinated Omission
(www.scylladb.com) -
Adopting RocksDB within Manhattan
(blog.twitter.com) -
Attack of the Delta Clones (Against Disaster Recovery Availability Complexity)
(databricks.com)#software-architecture #infra #distributed-systems #data-engineering
-
FullContact: Improving the Graph by Transitioning to Scylla
(www.scylladb.com) -
Solving for the cardinality of set intersection at scale with Pinot and Theta Sketches
(engineering.linkedin.com) -
The Design of Strongly Consistent Global Secondary Indexes in Apache Phoenix — Part 1
(engineering.salesforce.com) -
Scylla’s New IO Scheduler
(www.scylladb.com) -
Powering Messaging Enabledness with Yelp's Data Infrastructure
(engineeringblog.yelp.com) -
Detecting Image Similarity in (Near) Real-time Using Apache Flink
(medium.com)#software-architecture #machine-learning #image-processing #data-engineering
-
Pinterest Flink Deployment Framework
(stackshare.io) -
ReversingLabs: Serving File Reputation for Twenty Billion Files
(www.scylladb.com) -
One billion files in Ozone
(blog.cloudera.com) -
Presentation: Scalable, Cloud-native Data Applications by Example
(www.infoq.com) -
iFood Relies on Scylla to Deliver Over 100 Million Events a Month to Restaurants
(www.scylladb.com) -
Learning Multi-dimensional indices: The next big thing in OLAP DBs
(towardsdatascience.com) -
Powering Pinterest Ads Analytics with Apache Druid
(stackshare.io) -
Real-Time Data Replication with ksqlDB
(www.confluent.io) -
How we scaled Graphite to 100,000 writes per second.
(medium.com) -
Presentation: Streaming a Million likes/second: Real-time Interactions on Live Video
(www.infoq.com)#software-architecture #scaling #distributed-systems #data-engineering
-
Enabling HDFS Federation Having 1B File System Objects
(tech.ebayinc.com) -
Getting storage engines ready for fast storage devices
(engineering.mongodb.com) -
Hash Sharded Indexes Unlock Linear Scaling for Sequential Workloads
(www.cockroachlabs.com) -
Augury: Insights into Industrial IoT Time-Series Data
(www.scylladb.com) -
Using Kafka to Throttle QPS on MySQL Shards in Bulk Write APIs
(stackshare.io)#software-architecture #DBMS #scaling #apache-kafka #data-engineering
-
How we improved latency through projection in Espresso
(engineering.linkedin.com)#software-architecture #DBMS #distributed-systems #data-engineering
-
Architecture for High-Throughput Low-Latency Big Data Pipeline on Cloud
(towardsdatascience.com) -
Bucketisation: Using cassandra for time series data scans.
(medium.com) -
How Netflix uses Druid for Real-time Insights to Ensure a High-Quality Experience
(netflixtechblog.com)#DBMS #distributed-systems #analytics #real-time #data-engineering
-
Presentation: Data Mesh Paradigm Shift in Data Platform Architecture
(www.infoq.com) -
Nauto: Achieving Consistency in an Eventually Consistent Environment
(www.scylladb.com) -
Integrating Elasticsearch and ksqlDB for Powerful Data Enrichment and Analytics
(www.confluent.io) -
How to enable data scientists to stop managing ETL pipelines and get back to doing data science: Part I
(tech.wayfair.com) -
Building a Materialized Cache with ksqlDB
(www.confluent.io) -
FireEye: Providing Real-Time Threat Analysis using a Graph Database
(www.scylladb.com)#DBMS #analytics #real-time #graph-processing #data-engineering
-
Presentation: Snowflake Architecture: Building a Data Warehouse for the Cloud
(www.infoq.com) -
Spotify Unwrapped: How we brought you a decade of data
(labs.spotify.com) -
Data Migrations Don’t Have to Come with Downtime
(engblog.nextdoor.com) -
Fanatics: Using Scylla for Online Order Capture
(www.scylladb.com) -
Infinite Storage in Confluent Platform
(www.confluent.io)#distributed-systems #apache-kafka #big-data #data-engineering
-
Streams and Monk – How Yelp is Approaching Kafka in 2020
(engineeringblog.yelp.com) -
Designing a Production-Ready Kappa Architecture for Timely Data Stream Processing
(eng.uber.com)#data-pipeline #software-architecture #distributed-systems #data-engineering
-
Speeding Up SELECT Queries with Parquet Page Indexes
(blog.cloudera.com) -
Stop the Insanity: Eliminating Data Infrastructure Sprawl
(www.memsql.com) -
Maximizing Disk Utilization with Incremental Compaction
(www.scylladb.com) -
Streams and Tables in Apache Kafka: Elasticity, Fault Tolerance, and Other Advanced Concepts
(www.confluent.io) -
Engineering SQL Support on Apache Pinot at Uber
(eng.uber.com) -
Reliably Upgrading Apache Airflow at Slack’s Scale
(slack.engineering) -
Comcast: Sprinting from Cassandra to Scylla
(www.scylladb.com)#software-architecture #performance #distributed-systems #data-engineering
-
Streams and Tables in Apache Kafka: Topics, Partitions, and Storage Fundamentals
(www.confluent.io) -
Streams and Tables in Apache Kafka: A Primer
(www.confluent.io) -
Introducing Flyte: Cloud Native Machine Learning and Data Processing Platform
(eng.lyft.com) -
Presentation: Practical Change Data Streaming Use Cases with Apache Kafka & Debezium
(www.infoq.com) -
How I'm Engineering a Versioned Database Storage Engine for Byte-Addressable NVM
(hackernoon.com) -
Presentation: Scaling Beyond a Billion Transactions Per Day with Sub-second Responses
(www.infoq.com)#software-architecture #infra #performance #scaling #data-engineering
-
How ads indexing works at Pinterest
(medium.com)#data-pipeline #software-architecture #scaling #data-engineering
-
Streaming Cassandra into Kafka in (Near) Real-Time: Part 2
(engineeringblog.yelp.com)#data-pipeline #distributed-systems #real-time #data-engineering
-
The Story Behind MemSQL’s Skiplist Indexes
(www.memsql.com) -
DBLog: A Generic Change-Data-Capture Framework
(medium.com) -
Uber’s Data Platform in 2019: Transforming Information to Intelligence
(eng.uber.com)#data-pipeline #scaling #distributed-systems #data-engineering
-
GokuL: Extending time series data storage to serve beyond one day
(medium.com) -
How Scylla Scaled to One Billion Rows a Second
(www.scylladb.com) -
Podcast: Josh Wills on Building Resilient Data Engineering and Machine Learning Products at Slack
(www.infoq.com)#software-architecture #machine-learning #scaling #data-engineering
-
Presentation: Batch Processing in 2019
(www.infoq.com)#data-pipeline #software-architecture #backend #data-engineering
-
Streaming Cassandra into Kafka in (Near) Real-Time: Part 1
(engineeringblog.yelp.com)#data-pipeline #software-architecture #distributed-systems #data-engineering
-
Presentation: Future of Data Engineering
(www.infoq.com) -
Reducing Multi-Region Latency with Follower Reads
(www.cockroachlabs.com) -
Availability and Region Failure: Joint Consensus in CockroachDB
(www.cockroachlabs.com)#DBMS #scaling #distributed-systems #reliability #data-engineering
-
Using Kafka to throttle QPS on MySQL shards in bulk write APIs
(medium.com) -
A Glimpse into the World of Embedded Database Feat. RocksDB
(medium.com) -
Maximizing Performance via Concurrency While Minimizing Timeouts in Distributed Databases
(www.scylladb.com)#DBMS #performance #distributed-systems #concurrency #data-engineering
-
Unpacking Competitive Benchmark Claims
(www.cockroachlabs.com) -
Spotify’s Event Delivery – Life in the Cloud
(labs.spotify.com) -
Parallel Commits: An Atomic Commit Protocol For Globally Distributed Transactions
(www.cockroachlabs.com) -
Optimizing Search Index Generation using secondary cache
(medium.com)#performance #distributed-systems #big-data #caching #data-engineering
-
Building columnar compression in a row-oriented database
(blog.timescale.com) -
How We Built a Vectorized SQL Engine
(www.cockroachlabs.com) -
Open Sourcing Amundsen: A Data Discovery And Metadata Platform
(eng.lyft.com) -
An inside look at LinkedIn’s data pipeline monitoring system
(engineering.linkedin.com)#data-pipeline #software-architecture #monitoring #data-engineering
-
Guide to File Formats for Machine Learning: Columnar, Training, Inferencing, and the Feature Store
(towardsdatascience.com) -
2019 @Scale Conference recap
(engineering.fb.com) -
ML Platform Meetup: Infra for Contextual Bandits and Reinforcement Learning
(medium.com) -
Evolving Michelangelo Model Representation for Flexibility at Scale
(eng.uber.com) -
Delta: A Data Synchronization and Enrichment Platform
(medium.com)#software-architecture #algorithms #distributed-systems #data-engineering
-
The Beauty of a Shared-Nothing SQL DBMS for Skewed Database Sizes
(www.memsql.com) -
Compression in Scylla, Part Two
(www.scylladb.com) -
How LinkedIn customizes Apache Kafka for 7 trillion messages per day
(engineering.linkedin.com)#performance #scaling #distributed-systems #apache-kafka #data-engineering
-
Compression in Scylla, Part One
(www.scylladb.com) -
What Makes Apache Flink Scale?
(medium.com) -
How Shopify Manages Petabyte Scale MySQL Backup and Restore
(engineering.shopify.com) -
Adaptive Throttling of Indexing for Improved Query Responsiveness
(medium.com) -
Multiplexing (Mux) in ProxySQL: Use Case
(www.percona.com) -
PinalyticsDB: A Time Series Database on top of Hbase
(medium.com) -
Shared Transactional Tables: The Foundation of Next Generation Big Data Warehousing
(blog.cloudera.com) -
Presentation: Datadog: A Real-time Metrics Database for One Quadrillion Points/Day
(www.infoq.com) -
Joining Petabytes of Data Per Day: How LiveRamp Powers its Matching Product
(liveramp.com) -
Presentation: Scaling DB Access for Billions of Queries per Day @PayPal
(www.infoq.com) -
Presentation: CockroachDB: Architecture of a Geo-distributed SQL Database
(www.infoq.com)#software-architecture #DBMS #algorithms #distributed-systems #data-engineering
-
Time-Based Anti-Patterns for Caching Time-Series Data
(www.scylladb.com) -
A Technical Introduction to MemSQL
(www.memsql.com)#software-architecture #DBMS #distributed-systems #data-engineering
-
Cultivating your Data Lake
(stackshare.io)#data-pipeline #software-architecture #infra #data-engineering
-
Replicating PostgreSQL into MemSQL’s Columnstore
(www.memsql.com) -
How to manage your Snowflake spend with Periscope and dbt
(about.gitlab.com) -
Presto Infrastructure at Lyft
(eng.lyft.com)#infra #scaling #distributed-systems #backend #data-engineering
-
Building a distributed time-series database on PostgreSQL
(blog.timescale.com)#software-architecture #DBMS #time-series #PostgreSQL #data-engineering
-
Adventures in big data wonderland: Going down the Pinterest Path
(medium.com) -
Data Hub: A Generalized Metadata Search & Discovery Tool
(engineering.linkedin.com) -
Presentation: Tackling Computing Challenges @CERN
(www.infoq.com) -
Data Engineering in Badoo: Handling 20 Billion Events per Day
(www.infoq.com)#data-pipeline #software-architecture #scaling #data-engineering
-
Data first, SLA always
(engineering.grab.com)#software-design #software-architecture #backend #data-engineering
-
Advances in Spam Detection on Tumblr
(engineering.tumblr.com) -
Extending Hive Replication: Transactional Tables, External Tables, and Statistics
(blog.cloudera.com) -
OrderedAppend: An optimization for range partitioning
(blog.timescale.com) -
Maptype — fast doc-value lookups for map data in Elasticsearch
(engineeringblog.yelp.com) -
Improving the scalability of a Spark pipeline for conversion attribution
(medium.com) -
Implementing constraint exclusion for faster query performance
(timescale.ghost.io) -
A Scalable SQL Database Powers Real-Time Analytics at Uber
(www.memsql.com) -
Fast Parallel Testing at Databricks with Bazel
(databricks.com) -
Accelerating NiFi flows delivery: Part 1
(blog.octo.com)#data-pipeline #software-architecture #performance #optimisation #data-engineering
-
Presentation: Automatic Clustering at Snowflake
(www.infoq.com)#infra #DBMS #scaling #distributed-systems #data-engineering
-
Petabyte Scale Data Deduplication
(engineering.mixpanel.com) -
Making Apache Spark Effortless for All of Uber
(eng.uber.com)#software-architecture #DBMS #distributed-systems #apache-spark #data-engineering
-
Open Sourcing Brooklin: Near Real-Time Data Streaming at Scale
(engineering.linkedin.com)#data-pipeline #software-architecture #scaling #distributed-systems #data-engineering
-
Auto-Tuning Pinot Real-Time Consumption
(engineering.linkedin.com) -
Expediting Data Fixes and Data Migrations
(engineering.linkedin.com)#software-engineering #infra #scaling #practices #data-engineering
-
OIL+VCache: File abstraction for distributed systems
(code.fb.com) -
Query Plan Caching in CockroachDB
(www.cockroachlabs.com) -
Pilosa: A Scalable High Performance Bitmap Database Index
(hackernoon.com) -
Kafka Listeners – Explained
(www.confluent.io)#software-architecture #distributed-systems #apache-kafka #internals #data-engineering
-
CockroachDB Change Data Capture: Transactionally and Horizontally Scalable
(www.cockroachlabs.com) -
Improving Performance and Capacity for Espresso with New Netty Framework
(engineering.linkedin.com)#software-architecture #performance #distributed-systems #data-engineering
-
Community-Focused Feed Optimization
(engineering.linkedin.com)#data-science #software-architecture #machine-learning #analytics #data-engineering
-
Building a Scalable Search Architecture
(www.confluent.io) -
Putting Machine Learning Models into Production
(blog.cloudera.com)#data-science #machine-learning #big-data #production #data-engineering
-
Star-Tree Index: Powering Fast Aggregations on Pinot
(engineering.linkedin.com) -
Streaming Data from the Universe with Apache Kafka
(www.confluent.io)#software-architecture #distributed-systems #apache-kafka #data-engineering
-
Presentation: Machine Learning Engineering - A New Yet Not so New Paradigm
(www.infoq.com)#software-engineering #machine-learning #practices #data-engineering
-
Presentation: Petastorm: A Light-Weight Approach to Building ML Pipelines
(www.infoq.com)#data-pipeline #machine-learning #big-data #data-engineering
-
Rethinking the Database Materialized View as an Index
(blog.timescale.com) -
HDFS Erasure Coding in Production
(blog.cloudera.com) -
Presentation: People You May Know: Fast Recommendations Over Massive Data
(www.infoq.com)#performance #distributed-systems #real-time #graphDB #data-engineering
-
Bringing scalable real-time analytics to the enterprise
(www.oreilly.com)#DBMS #scaling #distributed-systems #podcast #data-engineering
-
Delos: Simple, flexible storage for the Facebook control plane
(code.fb.com) -
Building A Scalable Data Management System for Computer Vision Tasks
(medium.com)#data-pipeline #software-architecture #image-processing #data-engineering
-
Log Compacted Topics in Apache Kafka
(towardsdatascience.com) -
Migrating a Big Data Environment to the Cloud, Part 4
(liveramp.com)#software-architecture #big-data #migration #cloud #data-engineering
-
Presentation: Michelangelo Palette: A Feature Engineering Platform at Uber
(www.infoq.com)#data-science #machine-learning #distributed-systems #data-engineering
-
Grafana Labs at KubeCon: Awesome Query Performance with Cortex
(grafana.com)#software-architecture #DBMS #noSQL #performance #data-engineering
-
Intelligent computing in Snowflake
(towardsdatascience.com) -
Presentation: Life of a Distributed Graph Database Query
(www.infoq.com)#DBMS #distributed-systems #graph-processing #data-engineering
-
Workload Prioritization: Running OLTP and OLAP Traffic on the Same Superhighway
(www.scylladb.com) -
A Richer Activity, Part 1
(medium.com) -
Maintainable ETLs: Tips for Making Your Pipelines Easier to Support and Extend
(multithreaded.stitchfix.com)#data-pipeline #software-engineering #practices #data-engineering
-
Introducing LINE Games analytics environment
(engineering.linecorp.com)#data-pipeline #software-architecture #big-data #data-engineering
-
Accelerating Machine Learning with the Feature Store Service
(technology.condenast.com) -
MetricsDB: TimeSeries Database for storing metrics at Twitter
(blog.twitter.com)#software-architecture #DBMS #analytics #time-series #data-engineering
-
Continuous aggregates: faster queries with automatically maintained materialized views
(blog.timescale.com) -
Introducing Data Compaction in Ambry
(engineering.linkedin.com)#software-architecture #DBMS #compression #media #data-engineering
-
Apache Kafka Data Access Semantics: Consumers and Membership
(www.confluent.io) -
Railyard: how we rapidly train machine learning models with Kubernetes
(stripe.com)#data-science #software-engineering #software-architecture #kubernetes #data-engineering
-
MySQL InnoDB Sorted Index Builds
(www.percona.com) -
Real-time data processing for monitoring and reporting — A practical use case of spark structured…
(medium.com)#data-pipeline #stream-processing #distributed-systems #apache-spark #data-engineering
-
An intuitive understanding of the LAMB optimizer
(towardsdatascience.com) -
Beam: A Distributed Knowledge Graph Store
(www.ebayinc.com)#DBMS #distributed-systems #GoLang #semantic-data #data-engineering
-
Presentation: YugaByte DB - A Planet-scale Database for Low Latency Transactional Apps
(www.infoq.com)#DBMS #performance #scaling #distributed-systems #data-engineering
-
Troubleshooting Data Engineering Software
(engineering.linecorp.com)#debugging #performance #distributed-systems #backend #data-engineering
-
Consistent Data Partitioning through Global Indexing for Large Apache Hadoop Tables at Uber
(eng.uber.com) -
Better to Give and to Receive: Alibaba’s Open-source Contributions to Flink
(hackernoon.com) -
How Bloomberg Tracks Hundreds of Billions of Data Points Daily with MetricTank and Grafana
(grafana.com) -
How eBay Governs its Big Data Fabric
(www.ebayinc.com) -
Amundsen — Lyft’s data discovery & metadata engine
(eng.lyft.com) -
Troubleshooting Data Engineering Software
(engineering.linecorp.com)#debugging #performance #distributed-systems #data-engineering
-
Presentation: Michelangelo - Machine Learning @Uber
(www.infoq.com)#data-pipeline #data-science #machine-learning #data-engineering
-
Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…
(medium.com)#software-architecture #infra #scaling #backend #data-engineering
-
How to Reindex One Billion Documents in One Hour at SoundCloud
(developers.soundcloud.com) -
Uber Case Study: Choosing the Right HDFS File Format for Your Apache Spark Jobs
(eng.uber.com) -
Pro Tips: How Booking.com Handles Millions of Metrics Per Second with Graphite
(grafana.com) -
Solving Big Data Challenges with Data Science at Uber
(eng.uber.com)#DBMS #scaling #distributed-systems #big-data #data-engineering
-
DBEvents: A Standardized Framework for Efficiently Ingesting Data into Uber’s Apache Hadoop Data Lake
(eng.uber.com)#data-pipeline #distributed-systems #hadoop #data-engineering
-
Bullet Updates - Windowing, Apache Pulsar PubSub, Configuration-based Data Ingestion, and More
(yahooeng.tumblr.com)#data-pipeline #software-architecture #backend #data-engineering
-
Transparent Hierarchical Storage Management with Apache Kudu and Impala
(blog.cloudera.com)