Efficient Point in Polygon Joins via PySpark and BNG Geospatial Indexing

(databricks.com)

#DBMS #algorithms #performance #big-data #GeoData

How to ETL at Petabyte-Scale with Trino

(engineering.salesforce.com)

#scaling #big-data #data-engineering

Improving HDFS I/O Utilization for Efficiency

(eng.uber.com)

#performance #distributed-systems #big-data #data-engineering

Efficiently Managing the Supply and Demand on Uber’s Big Data Platform

(eng.uber.com)

#software-architecture #infra #distributed-systems #big-data

Cost-Efficient Open Source Big Data Platform at Uber

(eng.uber.com)

#optimisation #distributed-systems #big-data #data-engineering

Challenges and Opportunities to Dramatically Reduce the Cost of Uber’s Big Data

(eng.uber.com)

#software-engineering #big-data #data-engineering

Evolution of search engines architecture - Algolia New Search Architecture Part 1

(highscalability.com)

#software-architecture #search #big-data

‘Orders Near You’ and User-Facing Analytics on Real-Time Geospatial Data

(eng.uber.com)

#software-architecture #big-data #GeoData #data-engineering

Interactive Querying with Apache Spark SQL at Pinterest

(medium.com)

#DBMS #scaling #big-data #data-engineering

Consolidating Facebook storage infrastructure with Tectonic file system

(engineering.fb.com)

#software-architecture #big-data #systems #data-engineering

Optimizing Analytics Data Processing on eBay’s New Open-Source-Based Platform

(tech.ebayinc.com)

#data-pipeline #analytics #big-data #data-engineering

The exabyte club: LinkedIn’s journey of scaling the Hadoop Distributed File System

(engineering.linkedin.com)

#scaling #distributed-systems #analytics #big-data #data-engineering

Fraud Detection: Using Relational Graph Learning to Detect Collusion

(eng.uber.com)

#algorithms #scaling #big-data #graph-processing

Introducing Orbit, An Open Source Package for Time Series Inference and Forecasting

(eng.uber.com)

#machine-learning #analytics #time-series #big-data

Fusing Elasticsearch with neural networks to identify data

(blog.twitter.com)

#machine-learning #search #big-data

The Journey of Corpus

(developers.soundcloud.com)

#software-engineering #analytics #practices #big-data

Categorizing Products at Scale

(engineering.shopify.com)

#software-design #machine-learning #algorithms #big-data

Presentation: Beyond the Distributed Monolith: Rearchitecting the Big Data Platform

(www.infoq.com)

#software-architecture #scaling #microservices #big-data

Merging Telemetry and Logs from Microservices at Scale with Apache Spark

(devblogs.nvidia.com)

#performance #microservices #GPU #big-data

Learning Multi-dimensional indices: The next big thing in OLAP DBs

(towardsdatascience.com)

#DBMS #big-data #data-engineering

Presentation: From Spark to Elasticsearch and Back - Learning Large-scale Models for Content Recommendation

(www.infoq.com)

#data-pipeline #infra #machine-learning #big-data

Presentation: Computational Propaganda - How Algorithms Influence our Decisions

(www.infoq.com)

#machine-learning #algorithms #analytics #big-data

Architecture for High-Throughput Low-Latency Big Data Pipeline on Cloud

(towardsdatascience.com)

#DBMS #big-data #cloud #data-engineering

Contextual Topic Identification

(blog.insightdatascience.com)

#data-science #machine-learning #NLP #big-data #text-analysis

Supporting Spark as a First-Class Citizen in Yelp’s Computing Platform

(engineeringblog.yelp.com)

#data-pipeline #distributed-systems #apache-spark #big-data #backend

The Causal Analysis of Cannibalization in Online Products

(codeascraft.com)

#analytics #big-data #graph

Auto-Generated Knowledge Graphs

(towardsdatascience.com)

#AI #NLP #big-data

Leveraging “spot” instances to drive down costs

(www.eventbrite.com)

#devops #optimisation #big-data #cloud

Deep Learning for Anomaly Detection

(blog.cloudera.com)

#deep-learning #data-science #analytics #big-data

Spotify Unwrapped: How we brought you a decade of data

(labs.spotify.com)

#analytics #big-data #data-engineering

Fanatics: Using Scylla for Online Order Capture

(www.scylladb.com)

#DBMS #scaling #big-data #data-engineering

Infinite Storage in Confluent Platform

(www.confluent.io)

#distributed-systems #apache-kafka #big-data #data-engineering

Bayesian Product Ranking at Wayfair

(tech.wayfair.com)

#data-science #algorithms #big-data #math

Keeping LinkedIn professional by detecting and removing inappropriate profiles

(engineering.linkedin.com)

#machine-learning #analytics #big-data

For Your Ears Only: Personalizing Spotify Home with Machine Learning

(labs.spotify.com)

#data-science #machine-learning #big-data #tensor-flow

Engineering SQL Support on Apache Pinot at Uber

(eng.uber.com)

#DBMS #distributed-systems #SQL #big-data #data-engineering

The Winding Road to Better Machine Learning Infrastructure Through Tensorflow Extended and Kubeflow

(labs.spotify.com)

#data-science #infra #machine-learning #big-data

How Scylla Scaled to One Billion Rows a Second

(www.scylladb.com)

#scaling #distributed-systems #big-data #data-engineering

Pretraining BERT with Layer-wise Adaptive Learning Rates

(devblogs.nvidia.com)

#machine-learning #algorithms #GPU #big-data

Food Discovery with Uber Eats: Using Graph Learning to Power Recommendations

(eng.uber.com)

#algorithms #big-data #graph-processing #math

Our Transition to Machine Learning in Search Ranking to Match Customers and Professionals

(engineering.thumbtack.com)

#data-science #machine-learning #search #big-data

Powered by AI: Instagram’s Explore recommender system

(instagram-engineering.com)

#data-science #machine-learning #big-data

Large Graph Visualization Tools and Approaches

(towardsdatascience.com)

#big-data #visualisation #graph

Spotify’s Event Delivery – Life in the Cloud

(labs.spotify.com)

#software-architecture #big-data #data-engineering

Optimizing Search Index Generation using secondary cache

(medium.com)

#performance #distributed-systems #big-data #caching #data-engineering

Interpretability in ML: Identifying anomalies, influencers, and root causes

(www.elastic.co)

#data-science #machine-learning #analytics #big-data

Griffin, an anti-fraud risk rule engine making billions of predictions daily

(engineering.grab.com)

#data-science #software-engineering #software-architecture #algorithms #big-data

Semantics at Scale: BERT + Elasticsearch

(towardsdatascience.com)

#search #scaling #big-data #text-analysis

Guide to File Formats for Machine Learning: Columnar, Training, Inferencing, and the Feature Store

(towardsdatascience.com)

#machine-learning #big-data #data-engineering

Presto for ad hoc interactive Big Data Analytics at Salesforce

(engineering.salesforce.com)

#DBMS #analytics #time-series #big-data

Searchable Ground Truth: Querying Uncommon Scenarios in Self-Driving Car Development

(eng.uber.com)

#software-engineering #AI #big-data #visualisation

Real-time experiment analytics at Pinterest using Apache Flink

(medium.com)

#data-pipeline #distributed-systems #analytics #big-data

What Makes Apache Flink Scale?

(medium.com)

#analytics #big-data #systems #data-engineering

How Shopify Manages Petabyte Scale MySQL Backup and Restore

(engineering.shopify.com)

#DBMS #scaling #big-data #data-engineering

MaRS: How Facebook keeps maps current and accurate

(engineering.fb.com)

#data-science #algorithms #big-data #GeoData #maps

Shared Transactional Tables: The Foundation of Next Generation Big Data Warehousing

(blog.cloudera.com)

#DBMS #distributed-systems #big-data #data-engineering

Pilosa: A Scalable High Performance Bitmap Database Index

(hackernoon.com)

#DBMS #algorithms #performance #big-data

How Map Matching Failures can be used for Map Making

(eng.lyft.com)

#algorithms #big-data #GeoData #maps

PinText: A Multitask Text Embedding System in Pinterest

(medium.com)

#data-science #machine-learning #NLP #big-data

BIG Data, Fast Data - Part I

(www.thoughtworks.com)

#data-pipeline #IoT #networking #big-data #real-time

Adventures in big data wonderland: Going down the Pinterest Path

(medium.com)

#software-architecture #DBMS #big-data #data-engineering

Pin2Interest: A scalable system for content classification

(medium.com)

#data-pipeline #data-science #NLP #big-data

Labeling, transforming, and structuring training data sets for machine learning

(www.oreilly.com)

#data-science #big-data #research #podcast

Data Hub: A Generalized Metadata Search & Discovery Tool

(engineering.linkedin.com)

#software-architecture #search #big-data #data-engineering

Detecting and Preventing Abuse on LinkedIn Using Isolation Forests

(engineering.linkedin.com)

#data-science #algorithms #analytics #big-data

Presentation: Tackling Computing Challenges @CERN

(www.infoq.com)

#performance #big-data #computation #data-engineering

Unifying visual embeddings for visual search at Pinterest

(medium.com)

#machine-learning #image-processing #search #big-data #research

Code as Craft: Understand the role of Style in e-commerce shopping

(codeascraft.com)

#data-science #machine-learning #image-processing #big-data

The science behind consolidating Answer Bot production Models: Part 1

(medium.com)

#NLP #big-data #text-analysis #bots

Moving from Data-Driven to AI-Driven: The Next Step in the Evolution of Business Workflows

(multithreaded.stitchfix.com)

#data-science #AI #machine-learning #big-data

Semantic Graphs

(blog.imaginea.com)

#NLP #big-data #graph-processing #semantic-data

Lynx: Identifying Wayfair Customers’ Functional Needs

(tech.wayfair.com)

#data-science #algorithms #analytics #big-data

Give Me Jeans not Shoes: How BERT Helps Us Deliver What Clients Want

(multithreaded.stitchfix.com)

#data-science #machine-learning #analytics #big-data

Presto at Pinterest

(medium.com)

#software-architecture #infra #scaling #big-data #backend

Presentation: Reinforcement Learning: A Gentle Introduction with a Real Application

(www.infoq.com)

#data-science #machine-learning #big-data

Gaining Insights in a Simulated Marketplace with Machine Learning at Uber

(eng.uber.com)

#data-science #machine-learning #analytics #big-data

Presentation: Scaling Deep Learning to Petaflops and Beyond!

(www.infoq.com)

#machine-learning #performance #big-data #neural-net

Putting Machine Learning Models into Production

(blog.cloudera.com)

#data-science #machine-learning #big-data #production #data-engineering

The quest for high-quality data

(www.oreilly.com)

#data-science #practices #big-data

Presentation: Petastorm: A Light-Weight Approach to Building ML Pipelines

(www.infoq.com)

#data-pipeline #machine-learning #big-data #data-engineering

Rethinking the Database Materialized View as an Index

(blog.timescale.com)

#DBMS #time-series #big-data #data-engineering

Presentation: Applying Deep Learning to Airbnb Search

(www.infoq.com)

#deep-learning #data-science #machine-learning #search #big-data

Modeling the Unseen

(tech.instacart.com)

#data-science #machine-learning #analytics #big-data

Log Compacted Topics in Apache Kafka

(towardsdatascience.com)

#DBMS #apache-kafka #big-data #data-engineering

Migrating a Big Data Environment to the Cloud, Part 4

(liveramp.com)

#software-architecture #big-data #migration #cloud #data-engineering

Migrating a Big Data Environment to the Cloud, Part 3

(liveramp.com)

#software-engineering #infra #big-data #migration #cloud

A Richer Activity, Part 1

(medium.com)

#DBMS #SQL #big-data #data-engineering

Presentation: Massive Scale Anomaly Detection Framework

(www.infoq.com)

#data-science #machine-learning #big-data #statistics

Introducing LINE Games analytics environment

(engineering.linecorp.com)

#data-pipeline #software-architecture #big-data #data-engineering

Accelerating Machine Learning with the Feature Store Service

(technology.condenast.com)

#data-science #machine-learning #big-data #data-engineering

Hybrid Search: Building a textual and visual discovery experience at Pinterest

(medium.com)

#machine-learning #image-processing #search #big-data

Deep Learning for Single Cell Biology

(towardsdatascience.com)

#deep-learning #machine-learning #big-data #biotech

Presentation: Forecasting in Complex Systems

(www.infoq.com)

#data-science #algorithms #big-data #math

Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask

(eng.uber.com)

#data-science #data-analytics #big-data #math

Driving Business Decisions Using Data Science and Machine Learning

(engineering.linkedin.com)

#data-science #software-engineering #big-data

Evaluating the Unsupervised Learning of Disentangled Representations

(ai.googleblog.com)

#data-science #machine-learning #big-data #research

Consistent Data Partitioning through Global Indexing for Large Apache Hadoop Tables at Uber

(eng.uber.com)

#distributed-systems #big-data #hadoop #data-engineering

Extracting knowledge from knowledge graphs.

(towardsdatascience.com)

#data-science #AI #big-data #graph-processing

Better to Give and to Receive: Alibaba’s Open-source Contributions to Flink

(hackernoon.com)

#DBMS #big-data #opensource #data-engineering

How Bloomberg Tracks Hundreds of Billions of Data Points Daily with MetricTank and Grafana

(grafana.com)

#scaling #time-series #big-data #data-engineering

How eBay Governs its Big Data Fabric

(www.ebayinc.com)

#distributed-systems #big-data #hadoop #data-engineering

Presentation: Fairness, Transparency, and Privacy in AI @LinkedIn

(www.infoq.com)

#data-science #AI #machine-learning #big-data

Uber Case Study: Choosing the Right HDFS File Format for Your Apache Spark Jobs

(eng.uber.com)

#distributed-systems #big-data #hadoop #data-engineering

Pro Tips: How Booking.com Handles Millions of Metrics Per Second with Graphite

(grafana.com)

#performance #scaling #big-data #data-engineering

Lessons from our journey to enable global code search with Elasticsearch on GitLab.com

(about.gitlab.com)

#noSQL #search #big-data #elastisearch

Solving Big Data Challenges with Data Science at Uber

(eng.uber.com)

#DBMS #scaling #distributed-systems #big-data #data-engineering

Presentation: Would You Have Clicked on What We Would Have Recommended?

(www.infoq.com)

#machine-learning #big-data #AB-Testing #recommender

Tackling Bias in Machine Learning

(blog.insightdatascience.com)

#data-science #machine-learning #big-data

Productionizing ML with workflows at Twitter

(blog.twitter.com)

#software-engineering #software-architecture #machine-learning #big-data #production

Transparent Hierarchical Storage Management with Apache Kudu and Impala

(blog.cloudera.com)

#distributed-systems #big-data #backend #data-engineering

Rendezvous Architecture for Data Science in Production

(towardsdatascience.com)

#data-science #software-architecture #DBMS #distributed-systems #big-data

Managing Uber’s Data Workflows at Scale

(eng.uber.com)

#data-pipeline #DBMS #scaling #distributed-systems #big-data

How Netflix Uses AI and Machine Learning

(becominghuman.ai)

#machine-learning #scaling #big-data

Three Principles of Data Warehouse Development

(www.toptal.com)

#DBMS #big-data #OLAP #data-center

3 reasons to add deep learning to your time series toolkit

(www.oreilly.com)

#deep-learning #data-science #time-series #big-data

Lambda architecture— how to build a Big data pipeline part 1

(towardsdatascience.com)

#data-pipeline #software-architecture #big-data

Understanding Supply & Demand in Ride-hailing Through the Lens of Data

(engineering.grab.com)

#software-design #algorithms #analytics #big-data

Complementary Item Recommendations at eBay Scale

(www.ebayinc.com)

#data-pipeline #software-architecture #machine-learning #big-data

Learning Hiring Preferences: The AI Behind LinkedIn Jobs

(engineering.linkedin.com)

#machine-learning #NLP #big-data #text-analysis

Contextualizing Airbnb by Building Knowledge Graph

(medium.com)

#data-science #machine-learning #NLP #big-data #graph

Understanding Customer Churning with Big Data Analytics

(towardsdatascience.com)

#data-analytics #big-data #visualisation

Big Data Metrics Discovery

(engineering.salesforce.com)

#software-architecture #distributed-systems #big-data #backend

Explainable Reasoning over Knowledge Graphs for Recommendation

(www.ebayinc.com)

#data-science #big-data #neural-net #graph-processing

Interactive Visual Search

(www.ebayinc.com)

#machine-learning #image-processing #algorithms #search #big-data

Introducing Feast

(towardsdatascience.com)

#data-science #machine-learning #DBMS #big-data

Keeping It Classy: How Quizlet uses hierarchical classification to label content with academic…

(towardsdatascience.com)

#software-architecture #infra #machine-learning #big-data

Why we've chosen Snowflake ❄️ as our Data Warehouse

(drivy.engineering)

#infra #DBMS #big-data #backend

A Deep Dive Into Data Quality

(towardsdatascience.com)

#data-science #SQL #big-data

Presentation: Designing Automated Pipelines for Unseen Custom Data

(www.infoq.com)

#data-pipeline #automation #machine-learning #big-data

Presentation: Nearline Recommendations for Active Communities @LinkedIn

(www.infoq.com)

#machine-learning #big-data #graph-processing #recommender

Generating Twitter Ego-Networks & Detecting Ego-Communities

(towardsdatascience.com)

#data-analytics #big-data #graph-processing #visualisation #social-networks

Using Economic Graph Data to Power the LinkedIn Salary Product

(engineering.linkedin.com)

#machine-learning #NLP #big-data #text-analysis

HyperLogLog in Presto: A significantly faster way to handle cardinality estimation

(code.fb.com)

#DBMS #performance #SQL #big-data

Implementing the Netflix Media Database

(medium.com)

#software-architecture #DBMS #big-data #internals #media

Boosting Big Data workloads with Presto Auto Scaling

(www.eventbrite.com)

#data-pipeline #infra #scaling #big-data

TagOverflow — Correlating Tags in Stackoverflow

(towardsdatascience.com)

#algorithms #big-data #graph-processing #visualisation

Providing Metadata Discovery on Large-Volume Data Sets

(www.ebayinc.com)

#data-pipeline #search #analytics #big-data

Predicting real-time availability of 200 million grocery items in US/Canada stores

(tech.instacart.com)

#machine-learning #data-analytics #big-data #real-time

Tag-based Navigation of a Fashion Catalog

(jobs.zalando.com)

#data-science #algorithms #big-data #graph-processing

Seven Tips for Visual Search at Scale

(www.ebayinc.com)

#machine-learning #image-processing #search #big-data

Splitting Stateful Services across Continents at Instagram

(www.infoq.com)

#DBMS #scaling #distributed-systems #big-data

The Best Data Visualizations for Grabbing Readers’ Attention

(hackernoon.com)

#data-visualisation #analytics #big-data

New fastMRI open source AI research tools from Facebook and NYU School of Medicine

(code.fb.com)

#AI #machine-learning #big-data #biotech

The Fundamental Problem of Search

(www.eventbrite.com)

#algorithms #search #information-retrieval #big-data

Handling Imbalanced Datasets in Deep Learning

(towardsdatascience.com)

#deep-learning #data-science #big-data

Presentation: Big Data and Deep Learning: A Tale of Two Systems

(www.infoq.com)

#deep-learning #data-science #machine-learning #big-data

boundary-layer : Declarative Airflow Workflows

(codeascraft.com)

#software-architecture #big-data #cloud

Druid @ Airbnb Data Platform

(medium.com)

#data-pipeline #software-architecture #analytics #big-data #druid

Netflix MediaDatabase — Media Timeline Data Model

(medium.com)

#software-architecture #DBMS #scaling #big-data #media

Splitting Millions of Source Code Identifiers with Deep Learning

(blog.sourced.tech)

#deep-learning #analytics #big-data #parsing

ModaNet: A Large-scale Street Fashion Dataset with Polygon Annotations

(www.ebayinc.com)

#machine-learning #image-processing #big-data #neural-net

Horizon: The first open source reinforcement learning platform for large-scale products and services

(code.fb.com)

#data-science #machine-learning #scaling #big-data

Peloton: Uber’s Unified Resource Scheduler for Diverse Cluster Workloads

(eng.uber.com)

#data-pipeline #software-architecture #distributed-systems #big-data #backend

Uber Introduces PyML: Their Secret Weapon for Rapid Machine Learning Development

(towardsdatascience.com)

#data-science #machine-learning #big-data #python

Turnilo — let’s change the way people explore Big Data

(allegro.tech)

#data-visualisation #analytics #big-data

Uber’s Big Data Platform: 100+ Petabytes with Minute Latency

(eng.uber.com)

#software-architecture #distributed-systems #big-data #systems

Using machine learning to index text from billions of images

(blogs.dropbox.com)

#machine-learning #image-processing #search #big-data

An Introduction to AI at LinkedIn

(engineering.linkedin.com)

#AI #machine-learning #NLP #big-data

Managing data store locality at scale with Akkio

(code.fb.com)

#DBMS #distributed-systems #big-data #systems

Building Google Dataset Search and Fostering an Open Data Ecosystem

(ai.googleblog.com)

#data-science #AI #big-data #semantic-data

Architecture of Nautilus, the new Dropbox search engine

(blogs.dropbox.com)

#software-architecture #search #scaling #big-data #filesystem

Introducing Petastorm: Uber ATG’s Data Access Library for Deep Learning

(eng.uber.com)

#data-pipeline #deep-learning #software-architecture #big-data

Big Data Governance: Hive Metastore Listener for Apache Atlas Use Cases

(www.ebayinc.com)

#software-architecture #distributed-systems #big-data

Open Sourcing TonY: Native Support of TensorFlow on Hadoop

(engineering.linkedin.com)

#data-science #big-data #tensor-flow #hadoop

Introducing Oak: an Open Source Scalable Key-Value Map for Big Data Analytics

(yahooeng.tumblr.com)

#noSQL #data-analytics #big-data

Marmaray: An Open Source Generic Data Ingestion and Dispersal Framework and Library for Apache Hadoop

(ubereng.wpengine.com)

#data-pipeline #software-architecture #big-data #hadoop

Progress for big data in Kubernetes

(www.oreilly.com)

#distributed-systems #kubernetes #big-data

Scaling neural machine translation to bigger data sets with faster training and inference

(code.fb.com)

#data-science #machine-learning #big-data

Data Wrangling with Apache Kafka and KSQL

(www.confluent.io)

#data-pipeline #data-stream #big-data

Data Warehousing and ETLs

(medium.com)

#data-analytics #data-visualisation #big-data

An introduction to Druid, your Interactive Analytics at (big) Scale

(towardsdatascience.com)

#time-series #big-data #druid

Unriddling Big Data file formats

(www.thoughtworks.com)

#big-data #hadoop #filesystem

Airflow, Meta Data Engineering, and a Data Platform for the World’s Largest Democracy

(hackernoon.com)

#data-pipeline #big-data #python #backend

Leveraging Elastic Demand for Forecasting

(tech.instacart.com)

#data-analytics #big-data

From Big Data to micro-services: how to serve Spark-trained models through AWS lambdas

(towardsdatascience.com)

#microservices #AWS #serverless #big-data

Parallelizing Feature Engineering with Dask

(towardsdatascience.com)

#data-science #big-data #python #parallel-computing

Learning Market Dynamics for Optimal Pricing

(medium.com)

#data-science #data-analytics #big-data

Developing a Bioinformatics Database for Disulfide Bonds Research

(www.toptal.com)

#DBMS #big-data #biotech

How we build a robust analytics platform using Spark, Kafka and Cassandra Lambda architecture

(medium.com)

#data-pipeline #software-architecture #infra #big-data

Databook: Turning Big Data into Knowledge with Metadata at Uber

(eng.uber.com)

#data-pipeline #software-architecture #big-data

Comparing Billions of Rows per Day

(segment.com)

#infra #scaling #big-data

Building a Graph Data Pipeline With Zeppelin Spark and Neo4j

(towardsdatascience.com)

#data-pipeline #big-data #graph

FastText: Under the Hood

(towardsdatascience.com)

#machine-learning #NLP #big-data #text-analysis

Distributed graphs processing with Spark GraphX

(hackernoon.com)

#noSQL #distributed-systems #big-data #graph-processing

Migrating Messenger storage to optimize performance

(code.fb.com)

#software-architecture #performance #big-data #migration

H3: Uber’s Hexagonal Hierarchical Spatial Index

(eng.uber.com)

#data-visualisation #big-data #GeoData #maps

Migrating Messenger storage to optimize performance

(code.facebook.com)

#software-architecture #infra #scaling #big-data

Productionizing ML with Workflows at Twitter

(blog.twitter.com)

#data-pipeline #software-architecture #machine-learning #big-data

Presentation: The Future of Distributed Databases Is Relational

(www.infoq.com)

#DBMS #distributed-systems #big-data

Presentation: Simplifying ML Workflows with Apache Beam

(www.infoq.com)

#data-pipeline #machine-learning #big-data

Presentation: Gimel: PayPal’s Analytics Data Platform

(www.infoq.com)

#SQL #analytics #big-data #storage

Introducing Commute Time for Jobs

(engineering.linkedin.com)

#data-pipeline #big-data #GeoData #maps

Solr: Improving performance for Batch Indexing

(blog.box.com)

#performance #distributed-systems #big-data #apache-solr

How Microservices Could Save Medical IoT

(nordicapis.com)

#IoT #microservices #big-data #biotech

Apache Spark - Performance

(blog.scottlogic.com)

#performance #distributed-systems #apache-spark #big-data

Data Retrieval and Cleaning: Tracking Migratory Patterns

(www.dataquest.io)

#data-science #big-data

Centrifuge: a reliable system for delivering billions of events per day

(segment.com)

#scaling #big-data #backend #event-driven

Structure & Attribute Based Graph Partitioning

(medium.com)

#algorithms #big-data #graph-processing

Looking under the hood of the Eventbrite data pipeline!

(www.eventbrite.com)

#data-pipeline #software-architecture #big-data #backend

Exploring The GitHub Archive

(blog.wallaroolabs.com)

#stream-processing #big-data #python

Balanced Partitioning and Hierarchical Clustering at Scale

(ai.googleblog.com)

#algorithms #big-data #graph-processing #research

Utilizing MapReduce Combiners and HyperLogLog++ to process millions of queries over datasets with billions of records

(liveramp.com)

#data-pipeline #big-data

Then and Now: The Rethinking of Time Series Data at Wayfair

(tech.wayfair.com)

#software-architecture #noSQL #time-series #big-data

Introducing Semantic Experiences with Talk to Books and Semantris

(research.googleblog.com)

#deep-learning #big-data #text-analysis

Simon Moss on using artificial intelligence to fight financial crimes

(www.oreilly.com)

#AI #data-analytics #big-data #podcast

Give Meaning to 100 billion Events a Day - The Analytics Pipeline at Teads

(highscalability.com)

#data-analytics #big-data #event-queue

Scaling Uber’s Hadoop Distributed File System for Growth

(eng.uber.com)

#scaling #big-data #hadoop #filesystem

Extracting Signals From the News

(eng.datafox.com)

#data-analytics #big-data #text-analysis

A brief introduction to two data processing architectures — Lambda and Kappa for Big Data

(towardsdatascience.com)

#data-pipeline #big-data

Search Federation Architecture at LinkedIn

(engineering.linkedin.com)

#search #big-data

The Evolution of Data at Reddit

(redditblog.com)

#big-data #data-center

Data Analysis with Spark

(jobs.zalando.com)

#data-analytics #apache-spark #big-data

Under the hood: Suicide prevention tools powered by AI

(code.facebook.com)

#deep-learning #machine-learning #big-data #text-analysis

A Cornucopia of Area Rugs: Will a Diverse Set of Choices Help Customers Find More of What They Love?

(tech.wayfair.com)

#data-science #search #big-data

How to hack Spark to do some data lineage

(blog.octo.com)

#data-pipeline #apache-spark #big-data #hadoop

Creating a musical (data) pipeline

(devblog.songkick.com)

#data-pipeline #big-data

Mis-employing radar charts to distinguish multidimensional data

(towardsdatascience.com)

#data-visualisation #big-data

How to add full text search to your website

(medium.com)

#search #big-data #elastisearch

Cross-Lingual End-to-End Product Search with Deep Learning

(jobs.zalando.com)

#deep-learning #search #big-data #text-analysis

Dynamometer: Scale Testing HDFS on Minimal Hardware with Maximum Fidelity

(engineering.linkedin.com)

#distributed-systems #big-data #hadoop

From big data to fast data

(www.oreilly.com)

#data-pipeline #analytics #big-data

Using Synthetic Data Modeling to Enhance Machine Learning

(engineering.salesforce.com)

#machine-learning #data-analytics #big-data

Caviar’s Word2Vec Tagging For Menu Item Recommendations

(medium.com)

#big-data #text-analysis #word2vec

Time Series Forecasting with Splunk. Part I. Intro & Kalman Filter.

(towardsdatascience.com)

#machine-learning #data-analytics #time-series #big-data

Scaling Time Series Data Storage — Part I

(medium.com)

#DBMS #scaling #time-series #big-data

PageRank in Spark

(developers.soundcloud.com)

#search #apache-spark #big-data

Omphalos, Uber’s Parallel and Language-Extensible Time Series Backtesting Tool

(eng.uber.com)

#machine-learning #data-analytics #time-series #big-data

Fishing for graphs in a Hadoop data lake

(www.oreilly.com)

#big-data #hadoop #graph

Mapping Medium’s Tags

(medium.engineering)

#machine-learning #big-data #text-analysis #word2vec

Faster E-commerce Search

(www.ebayinc.com)

#DBMS #search #performance #big-data

The frequency of tags on Stack Overflow

(towardsdatascience.com)

#data-visualisation #analytics #big-data

Evolving search recommendations on Pinterest

(medium.com)

#search #big-data #recommender

The Art of Effective Visualization of Multi-dimensional Data

(towardsdatascience.com)

#data-science #big-data #visualisation

Bad Design Is Bad for Your Health: Why Data Visualization Details Matter

(engineering.cerner.com)

#data-visualisation #big-data

Big Data: Information visualization techniques

(towardsdatascience.com)

#data-analytics #data-visualisation #big-data

Out of Core Genomics

(towardsdatascience.com)

#data-analytics #big-data #biotech

Large-Scale Health Data Analytics with OHDSI

(blog.cloudera.com)

#DBMS #data-analytics #big-data

How machine learning will accelerate data management systems

(www.oreilly.com)

#machine-learning #DBMS #big-data #podcast

Hadoop Delegation Tokens Explained

(blog.cloudera.com)

#big-data #hadoop #data-center #tokens

DeepVariant: Highly Accurate Genomes With Deep Neural Networks

(research.googleblog.com)

#deep-learning #big-data #research #biotech

[Episode 01] Airbnb, Machine Learning & the Future of Travel

(mesosphere.com)

#data-science #AI #machine-learning #big-data #podcast

Incremental Data Capture for Oracle Databases at LinkedIn: Then and Now

(engineering.linkedin.com)

#data-pipeline #infra #big-data #backend

Dali Views: Functions as a Service for Big Data

(engineering.linkedin.com)

#data-pipeline #software-architecture #big-data

Rebuilding the Segment Leaderboards Infrastructure — Part 3: Design of the New System

(medium.com)

#stream-processing #apache-kafka #big-data #backend #cassandra

The Global Heatmap, Now 6x Hotter

(medium.com)

#data-visualisation #analytics #big-data #maps

Big Data Processing at Spotify: The Road to Scio (Part 1)

(labs.spotify.com)

#data-pipeline #big-data #scala

Airflow: The Missing Context

(hackernoon.com)

#data-science #big-data #crawling

Using Kafka Streams API for predictive budgeting

(medium.com)

#data-pipeline #stream-processing #apache-kafka #big-data

Big Dataset: All Reddit Comments – Analyzing with ClickHouse

(www.percona.com)

#analytics #big-data #visualisation

One Million Tables in MySQL 8.0

(www.percona.com)

#scaling #MySql #big-data

The Search for Better Search at Reddit - Because, certainly, we’ve solved it this time

(redditblog.com)

#software-architecture #infra #search #big-data

Exploring and Visualizing an Open Global Dataset

(research.googleblog.com)

#AI #machine-learning #data-analytics #big-data

Steering oceans of content to the world

(code.facebook.com)

#data-pipeline #infra #big-data #backend

IMDb Data in a Graph Database

(www.percona.com)

#big-data #graph-processing #graphDB

Implementing Temporal Graphs with Apache TinkerPop and HGraphDB

(blog.cloudera.com)

#noSQL #data-analytics #big-data #graph-processing

Breaking the “curse of dimensionality” in Genomics using “wide” Random Forests

(databricks.com)

#noSQL #data-analytics #big-data

Building the Activity Graph, Part 2

(engineering.linkedin.com)

#noSQL #big-data #graph-processing

BigDB - an ad data pipeline for LINE

(engineering.linecorp.com)

#data-pipeline #infra #DBMS #big-data

Engineering Data Analytics with Presto and Parquet at Uber

(eng.uber.com)

#software-architecture #infra #data-analytics #big-data

Tagged | big-data