Tagged | hadoop
-
HDFS Erasure Coding in Production
(blog.cloudera.com) -
Presentation: Docker Data Science Pipeline
(www.infoq.com)#data-pipeline #software-architecture #docker #distributed-systems #hadoop
-
Small Files, Big Foils: Addressing the Associated Metadata and Application Challenges
(blog.cloudera.com)#software-architecture #distributed-systems #hadoop #systems
-
Partition Management in Hadoop
(blog.cloudera.com) -
Consistent Data Partitioning through Global Indexing for Large Apache Hadoop Tables at Uber
(eng.uber.com) -
How eBay Governs its Big Data Fabric
(www.ebayinc.com) -
Uber Case Study: Choosing the Right HDFS File Format for Your Apache Spark Jobs
(eng.uber.com) -
DBEvents: A Standardized Framework for Efficiently Ingesting Data into Uber’s Apache Hadoop Data Lake
(eng.uber.com)#data-pipeline #distributed-systems #hadoop #data-engineering
-
Open Sourcing TonY: Native Support of TensorFlow on Hadoop
(engineering.linkedin.com) -
Marmaray: An Open Source Generic Data Ingestion and Dispersal Framework and Library for Apache Hadoop
(ubereng.wpengine.com) -
Unriddling Big Data file formats
(www.thoughtworks.com) -
YARN FairScheduler Preemption Deep Dive
(blog.cloudera.com) -
Optimizing CAL Report Hadoop MapReduce Jobs
(www.ebayinc.com) -
Scaling Uber’s Hadoop Distributed File System for Growth
(eng.uber.com) -
How to hack Spark to do some data lineage
(blog.octo.com) -
Dynamometer: Scale Testing HDFS on Minimal Hardware with Maximum Fidelity
(engineering.linkedin.com) -
Fishing for graphs in a Hadoop data lake
(www.oreilly.com) -
Hadoop Delegation Tokens Explained
(blog.cloudera.com) -
From Hadoop and Cassandra to Kafka Streams
(tech.finn.no2017)