Tagged | hadoop

HDFS Erasure Coding in Production
(blog.cloudera.com)

#performance #distributed-systems #hadoop #data-engineering
Presentation: Docker Data Science Pipeline
(www.infoq.com)

#data-pipeline #software-architecture #docker #distributed-systems #hadoop
Small Files, Big Foils: Addressing the Associated Metadata and Application Challenges
(blog.cloudera.com)

#software-architecture #distributed-systems #hadoop #systems
Partition Management in Hadoop
(blog.cloudera.com)

#noSQL #scaling #distributed-systems #hadoop
Consistent Data Partitioning through Global Indexing for Large Apache Hadoop Tables at Uber
(eng.uber.com)

#distributed-systems #big-data #hadoop #data-engineering
How eBay Governs its Big Data Fabric
(www.ebayinc.com)

#distributed-systems #big-data #hadoop #data-engineering
Uber Case Study: Choosing the Right HDFS File Format for Your Apache Spark Jobs
(eng.uber.com)

#distributed-systems #big-data #hadoop #data-engineering
DBEvents: A Standardized Framework for Efficiently Ingesting Data into Uber’s Apache Hadoop Data Lake
(eng.uber.com)

#data-pipeline #distributed-systems #hadoop #data-engineering
Open Sourcing TonY: Native Support of TensorFlow on Hadoop
(engineering.linkedin.com)

#data-science #big-data #tensor-flow #hadoop
Marmaray: An Open Source Generic Data Ingestion and Dispersal Framework and Library for Apache Hadoop
(ubereng.wpengine.com)

#data-pipeline #software-architecture #big-data #hadoop
Unriddling Big Data file formats
(www.thoughtworks.com)

#big-data #hadoop #filesystem
YARN FairScheduler Preemption Deep Dive
(blog.cloudera.com)

#distributed-systems #hadoop #yarn #scheduler
Optimizing CAL Report Hadoop MapReduce Jobs
(www.ebayinc.com)

#data-pipeline #distributed-systems #hadoop
Scaling Uber’s Hadoop Distributed File System for Growth
(eng.uber.com)

#scaling #big-data #hadoop #filesystem
How to hack Spark to do some data lineage
(blog.octo.com)

#data-pipeline #apache-spark #big-data #hadoop
Dynamometer: Scale Testing HDFS on Minimal Hardware with Maximum Fidelity
(engineering.linkedin.com)

#distributed-systems #big-data #hadoop
Fishing for graphs in a Hadoop data lake
(www.oreilly.com)

#big-data #hadoop #graph
Hadoop Delegation Tokens Explained
(blog.cloudera.com)

#big-data #hadoop #data-center #tokens
From Hadoop and Cassandra to Kafka Streams
(tech.finn.no2017)

#stream-processing #apache-kafka #hadoop #cassandra