Tagged | infra
-
Capacity Recommendation Engine: Throughput and Utilization Based Predictive Scaling
(eng.uber.com) -
FOQS: Making a distributed priority queue disaster-ready
(engineering.fb.com)#software-engineering #software-architecture #infra #distributed-systems
-
Designing Tinder
(highscalability.com)#software-design #software-architecture #infra #data-engineering
-
Auto-Diagnosis and Remediation in Netflix Data Platform
(netflixtechblog.com) -
Spike detection in Alert Correlation
(engineering.linkedin.com) -
Snaring the Bad Folks
(netflixtechblog.com) -
Evolving LinkedIn’s analytics tech stack
(engineering.linkedin.com) -
Article: Design Patterns for Serverless Systems
(www.infoq.com) -
The Case of the Recursive Resolvers
(slack.engineering) -
Introducing Karpenter – An Open-Source High-Performance Kubernetes Cluster Autoscaler
(aws.amazon.com) -
Modernizing Our Search Stack
(engblog.nextdoor.com) -
Running ML Inference Services in Shared Hosting Environments
(engblog.nextdoor.com) -
Meet Ottr: A Serverless Public Key Infrastructure Framework
(medium.com)#software-engineering #software-architecture #infra #security
-
Distributed Firewall (DFW): Network security at the host level at LinkedIn
(engineering.linkedin.com) -
Autonomous testing of services at scale
(engineering.fb.com) -
Building the Next Evolution of Cloud Networks at Slack – A Retrospective
(slack.engineering) -
Network Validation Evolution at Hostinger
(www.hostinger.com) -
Powering Security Reports with Cartography and Flyte
(eng.lyft.com) -
Threat modeling the Kubernetes Agent: from MVC to continuous improvement
(about.gitlab.com) -
What are traces, and how SQL (yes, SQL) and OpenTelemetry can help us get more value out of traces to build better software
(blog.timescale.com) -
Infrastructure Observability for Changing the Spend Curve
(slack.engineering) -
Understanding How Facebook Disappeared from the Internet
(blog.cloudflare.com) -
Groot: eBay’s Event-graph-based Approach for Root Cause Analysis
(tech.ebayinc.com) -
Partitioning GitHub’s relational databases to handle scale
(github.blog) -
Distributed tier merge: How LinkedIn tackles stragglers in search index build
(engineering.linkedin.com) -
Faster Flink adoption with self-service diagnosis tool at Pinterest
(medium.com) -
Presentation: User Simulation for Rapid Outage Mitigation
(www.infoq.com) -
The Show Must Go On: Securing Netflix Studios At Scale
(netflixtechblog.com) -
CacheLib, Facebook’s open source caching engine for web-scale services
(engineering.fb.com) -
Infrastructure Design for Real-time Machine Learning Inference
(databricks.com)#data-pipeline #software-engineering #infra #machine-learning
-
Myntra’s BCP/DR Journey
(medium.com)#software-engineering #software-design #software-architecture #infra
-
HTTP/2 in infrastructure: Ambry network stack refactoring
(engineering.linkedin.com) -
Service Architecture at SoundCloud — Part 2: Value-Added Services
(developers.soundcloud.com) -
Efficiently Managing the Supply and Demand on Uber’s Big Data Platform
(eng.uber.com)#software-architecture #infra #distributed-systems #big-data
-
Data Lineage at Slack
(slack.engineering) -
Applying flame graphs outside of performance analysis
(blog.twitter.com) -
Open-sourcing a more precise time appliance
(engineering.fb.com) -
Risk-driven backbone management during COVID-19 and beyond
(engineering.fb.com) -
How Uber Achieves Operational Excellence in the Data Quality Experience
(eng.uber.com) -
How we’re making Dropbox data centers 100% carbon neutral
(dropbox.tech) -
Canary rollouts — how we’re trying to safely deploy to Kubernetes
(lambda.grofers.com) -
Video: Infrastructure Engineering at HubSpot
(product.hubspot.com) -
Data Movement in Netflix Studio via Data Mesh
(netflixtechblog.com) -
Containerizing Apache Hadoop Infrastructure at Uber
(eng.uber.com) -
Automating root cause analysis for infrastructure systems
(research.fb.com) -
Presentation: History of Infra as Code
(www.infoq.com) -
No, we don’t use Kubernetes
(ably.com) -
Secure provisioning of LoadBalancer Services on Kubernetes using Kyverno
(lambda.grofers.com) -
Presentation: True Observability Needs High-Cardinality
(www.infoq.com) -
Article: Solving Mysteries Faster With Observability
(www.infoq.com) -
Efficient and Reliable Compute Cluster Management at Scale
(eng.uber.com) -
Presentation: Pragmatic Performance - Tales from the Trenches
(www.infoq.com) -
Network hose: Managing uncertain network demand with model simplicity
(engineering.fb.com) -
Managing key-values in Consul using ConsulKV CRD
(lambda.grofers.com) -
How Clever Secures Infrastructure Secrets Using AWS SSM Parameter Store
(engineering.clever.com) -
Reducing data transfer costs with a Docker registry cache
(lambda.grofers.com) -
Scaling our inventory cache reads to 1000X
(medium.com) -
Article: Building Reliable Software Systems with Chaos Engineering
(www.infoq.com) -
Scaling of Uber’s API gateway
(eng.uber.com) -
Block Aggregator: Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Retries
(tech.ebayinc.com) -
The Legends of Runeterra CI/CD Pipeline
(technology.riotgames.com) -
What actually is a Data Mesh? And is it really a thing?
(blog.scottlogic.com) -
Why Elasticsearch is an indispensable component of the Adyen stack
(www.elastic.co) -
Scaling Club Leaderboard Infrastructure for Millions of Users
(medium.com) -
Building a Hyper Self-Service, Distributed Tracing and Feedback System for Rule & Machine Learning (ML) Predictions
(engineering.grab.com)#data-pipeline #infra #machine-learning #scaling #distributed-systems
-
Presentation: Safe and Fast Deploys at Planet Scale
(www.infoq.com) -
The Architecture of Uber’s API gateway
(eng.uber.com) -
Chaos Experimentation, an open-source framework built on top of Envoy Proxy
(eng.lyft.com) -
How Facebook’s Project SEISMIC helps bring greener telecom infrastructure
(research.fb.com) -
Monitoring PostgreSQL with a modern software stack
(tech.showmax.com) -
Using kafka-merge-purge to Deal with Failure in an Event-Driven System at FLYERALARM
(www.confluent.io) -
LyftLearn: ML Model Training Infrastructure built on Kubernetes
(eng.lyft.com) -
Our Journey to Continuous Delivery at Grab (Part 2)
(engineering.grab.com) -
Presentation: Change Data Capture for Distributed Databases @Netflix
(www.infoq.com) -
Dropping cache didn’t drop cache
(blog.twitter.com) -
Netflix Drive
(netflixtechblog.com) -
Google Provides a Peek into the Architecture of Colossus - Its Storage Foundation
(www.infoq.com) -
Scaling Kubernetes with Assurance at Pinterest
(stackshare.io) -
Dependable realtime banking with Kafka and Ably
(ably.com) -
Presentation: A Sticky Situation: How Netflix Gains Confidence in Changes
(www.infoq.com) -
How to Successfully Hand Over Systems
(developers.soundcloud.com) -
Attack of the Delta Clones (Against Disaster Recovery Availability Complexity)
(databricks.com)#software-architecture #infra #distributed-systems #data-engineering
-
Containers at the edge: it’s not what you think, or maybe it is
(blog.cloudflare.com) -
How Facebook encodes your videos
(engineering.fb.com) -
Uber Implements Disaster Recovery for Multi-Region Kafka
(www.infoq.com)#software-architecture #infra #distributed-systems #apache-kafka
-
Pharos - Searching Nearby Drivers on Road Network at Scale
(engineering.grab.com) -
Scaling Cache Infrastructure at Pinterest
(stackshare.io) -
Visualising Complex Systems
(sbg.technology) -
Cloud Jewels: Estimating kWh in the Cloud
(codeascraft.com) -
Why we enabled Geo on the staging environment for GitLab.com
(about.gitlab.com) -
A case study on the migration of Housings’ application servers
(medium.com) -
How LINE messaging servers prepare for New Year’s traffic
(engineering.linecorp.com) -
Multi-zone Cluster Management at Wayfair with Kubernetes
(tech.wayfair.com) -
An investigation into Kafka Log Compaction
(medium.com) -
Embrace and Replace: Migrating ZooKeeper into Kubernetes
(product.hubspot.com) -
“What's the worst that could happen?”: A worked example of how we deal with live incidents.
(sbg.technology) -
Building a platform: Learnings from our pursuit for leverage
(engineering.linkedin.com) -
Chaos Engineering comes to Ruby
(medium.com) -
Scaling Machine Learning
(towardsdatascience.com) -
February service disruptions post-incident analysis
(github.blog) -
Introducing Quicksilver: Configuration Distribution at Internet Scale
(blog.cloudflare.com)#software-architecture #infra #distributed-systems #internet
-
Challenges using Prometheus at scale
(sysdig.com) -
Building a scalable online product recommender with Keras, Docker, GCP, and GKE
(blog.insightdatascience.com)#data-pipeline #software-architecture #infra #machine-learning #cloud
-
Presentation: Monitoring and Tracing @Netflix Streaming Data Infrastructure
(www.infoq.com) -
On the shoulders of giants: recent changes in Internet traffic
(blog.cloudflare.com) -
What the heck is Backstage anyway?
(labs.spotify.com) -
Building a more accurate time service at Facebook scale
(engineering.fb.com)#software-engineering #software-architecture #infra #networking #systems
-
Periskop: Exception Monitoring Service
(developers.soundcloud.com) -
Presentation: From Spark to Elasticsearch and Back - Learning Large-scale Models for Content Recommendation
(www.infoq.com) -
Knowing PySpark and Kafka: A 100 Million Events Use-Case
(towardsdatascience.com)#data-pipeline #software-engineering #software-architecture #infra
-
Presentation: Securing Your CI/CD Pipeline
(www.infoq.com) -
Solving DNS lookup failures in Kubernetes
(tech.findmypast.com) -
MLOps: not as Boring as it Sounds
(itnext.io)#data-pipeline #software-engineering #infra #machine-learning
-
Presentation: InfraCoding with Terraform: Writing Tests for Infrastructure-as-Code
(www.infoq.com) -
Kubernetes Autoscaling 101: Cluster Autoscaler, Horizontal Pod Autoscaler, and Vertical Pod…
(levelup.gitconnected.com) -
Presentation: "This Website is Not Secured" You Had One Job: Configuring the Edge Proxy!
(www.infoq.com) -
Introducing Dispatch
(netflixtechblog.com) -
Under the Hood of Uber ATG’s Machine Learning Infrastructure and Versioning Control Platform for Self-Driving Vehicles
(eng.uber.com)#data-pipeline #software-engineering #infra #machine-learning
-
Terraforming RDS — Part 1
(tech.instacart.com) -
Presentation: Helm 3: A Mariner's Delight
(www.infoq.com) -
Building our in-house virtual device lab “Caroufarm”
(medium.com) -
PubSub: A conceptual deep-dive
(www.ably.io) -
New & Improved Terapeak Research 2.0 in eBay Seller Hub
(tech.ebayinc.com) -
Presentation: Data Mesh Paradigm Shift in Data Platform Architecture
(www.infoq.com) -
Why Cloudflare Chose AMD EPYC for Gen X Servers
(blog.cloudflare.com) -
Gen X Performance Tuning
(blog.cloudflare.com) -
Running Online Services at Riot: Part VI
(technology.riotgames.com) -
Developing the Antman Project
(engineering.linecorp.com) -
How Spotify Aligned CDN Services for a Lightning Fast Streaming Experience
(labs.spotify.com) -
Cloudflare’s Gen X: Servers for an Accelerated Future
(blog.cloudflare.com) -
Debugging Distributed Systems: 3 Common Distributed Tracing Challenges & How to Overcome Them
(blog.overops.com) -
Open sourcing DataHub: LinkedIn’s metadata search and discovery platform
(engineering.linkedin.com) -
Your Circuit Breaker is Misconfigured
(engineering.shopify.com) -
Article: Service Mesh Ultimate Guide: Managing Service-to-Service Communications in the Era of Microservices
(www.infoq.com) -
How to build a real-time fraud detection pipeline using Faust and MLFlow
(towardsdatascience.com) -
Presentation: Deploy on Friday!
(www.infoq.com) -
K8s Vertical Pod Autoscaling
(itnext.io) -
How to enable data scientists to stop managing ETL pipelines and get back to doing data science: Part I
(tech.wayfair.com) -
Observability on Heroku: How to Monitor Apps on a Managed Infrastructure
(hackernoon.com) -
Presentation: ML's Hidden Tasks: A Checklist for Developers When Building ML Systems
(www.infoq.com)#data-science #software-engineering #infra #machine-learning
-
The Distributed Data Mesh as a Solution to Centralized Data Monoliths
(www.infoq.com)#software-architecture #infra #devops #microservices #backend
-
How We Built a Scalable Architecture for Real-Time Recommendations
(clevertap.com) -
Ultron: ML Inferencing Platform @Walmart Labs
(medium.com)#data-science #software-engineering #infra #machine-learning
-
Presentation: Shifting Left with Cloud Native CI/CD
(www.infoq.com) -
Presentation: Programming the Cloud: Empowering Developers to Do Infrastructure
(www.infoq.com) -
Presentation: CI/CD for Machine Learning
(www.infoq.com) -
How to Continuously Profile Tens of Thousands of Production Servers
(engineering.salesforce.com) -
Stop the Insanity: Eliminating Data Infrastructure Sprawl
(www.memsql.com) -
Operating Load Testing Infrastructure at Scale
(tech.just-eat.com) -
A Scientific Approach to Capacity Planning
(tech.wayfair.com) -
Intelligent DNS based load balancing at Dropbox
(blogs.dropbox.com) -
Finding Optimal Data Centers (Part 1)
(medium.com) -
Presentation: Managing Failure Modes in Microservice Architectures
(www.infoq.com) -
How Kubernetes Helps Solve Cloud Complexity
(www.weave.works) -
Prometheus at Prezi: replacing 10 years of anti-patterns
(engineering.prezi.com) -
CloudFormation To Terraform
(deliveroo.engineering) -
Presentation: From Idea to Dev to Ops
(www.infoq.com) -
Provisioning Infrastructure for Stateful Services in Public Cloud: An HBase Use Case (Part II)
(engineering.salesforce.com) -
High Availability for Self-Managed Kubernetes Clusters at DT One
(www.infoq.com) -
Experience Running Spotify’s Event Delivery System in the Cloud
(www.infoq.com) -
Presentation: Scaling Beyond a Billion Transactions Per Day with Sub-second Responses
(www.infoq.com)#software-architecture #infra #performance #scaling #data-engineering
-
Provisioning Infrastructure for Stateful Services in Public Cloud: An HBase Use Case (Part I)
(engineering.salesforce.com) -
Uber Infrastructure in 2019: Improving Reliability, Driving Customer Satisfaction
(eng.uber.com) -
Hypothesis Testing in Production
(engineering.remind.com) -
Controlled Chaos with Fault Injection Testing
(technology.riotgames.com) -
Presentation: Evolution of Edge @Netflix
(www.infoq.com) -
The Winding Road to Better Machine Learning Infrastructure Through Tensorflow Extended and Kubeflow
(labs.spotify.com) -
Presentation: Building Reactive Pipelines: How to Go from Scalable Apps to (Ridiculously) Scalable Systems
(www.infoq.com) -
Transforming the Management of Application Configurations & Secrets at 24 Hour Fitness
(stackshare.io) -
How Monzo Isolated Their Microservices Using Kubernetes Network Policies
(www.infoq.com) -
EKS + Fargate = Extensibility of Kubernetes + Serverless Benefits
(itnext.io) -
Presentation: Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
(www.infoq.com) -
Building a Kubernetes Platform at Pinterest
(stackshare.io) -
Faster PostgreSQL connection recovery
(www.theguardian.com) -
Infrastructure Resilience: Handling Invalid Configuration in the Envoy Proxy
(itnext.io) -
How Shopify Implements Custom Autoscaling Rules in Kubernetes
(www.infoq.com) -
Article: Adoption of Cloud-Native Architecture, Part 1: Architecture Evolution and Maturity
(www.infoq.com) -
Optimizing Observability with Jaeger, M3, and XYS at Uber
(eng.uber.com) -
Egnyte Architecture: Lessons learned in building and scaling a multi petabyte content platform
(highscalability.com)#software-architecture #infra #scaling #distributed-systems #internet
-
Evolution of Zulily’s Airflow Infrastructure
(zulily-tech.com) -
G-Scout Enterprise and Cloud Security at Etsy
(codeascraft.com) -
7 Ways We Put Kubernetes to Work at Salesforce
(engineering.salesforce.com) -
Monitoring server applications with Vortex
(blogs.dropbox.com) -
The Two Most Important Challenges with an API Gateway when Adopting Kubernetes
(itnext.io) -
Heroku's Journey to Automated Continuous Deployment
(www.infoq.com) -
The Consul outage that never happened
(about.gitlab.com) -
Zendesk’s Global Mesh Network- Part 1
(medium.com) -
The Journey to Kubernetes High Availability - Part 4
(tech.findmypast.com) -
Presentation: Multi-language Infrastructure as Code
(www.infoq.com) -
When you deserve better (systems)
(tech.gc.com)#software-engineering #software-architecture #infra #distributed-systems
-
Presentation: Making a Lion Bulletproof: SRE in Banking
(www.infoq.com) -
Load Balancers - Whats, Hows, and Whens
(hackernoon.com) -
Learnings from the journey to continuous deployment
(engineering.linkedin.com) -
On-premise HA Kubernetes cluster
(itnext.io) -
Prometheus introduction - How we show new colleagues the power of fire
(tech.showmax.com) -
Presentation: How to Evolve Kubernetes Resource Management Model
(www.infoq.com) -
Presentation: What Breaks Our Systems: A Taxonomy of Black Swans
(www.infoq.com) -
Presentation: How Did Things Go Right? Learning More From Incidents
(www.infoq.com) -
Presentation: Alibaba Container Platform Infrastructure - a Kubernetes Approach
(www.infoq.com) -
A Kubernetes crime story
(engineering.prezi.com) -
Scalability Tuning on a Tess.IO Cluster
(tech.ebayinc.com) -
Jason L. van Brackel on Seamless Kubernetes Adoption for Development Teams
(semaphoreci.com) -
Operating Apache Kafka Clusters 24/7 Without A Global Ops Team
(eng.lyft.com) -
Evolving Regional Evacuation
(medium.com) -
How Sqreen handles 50,000 requests every minute in a write-heavy environment
(stackshare.io) -
Enhancing Bandaid load balancing at Dropbox by leveraging real-time backend server load information
(blogs.dropbox.com) -
Solving manageability challenges at scale with Nuage
(engineering.linkedin.com) -
A web performance issue
(medium.com) -
Presentation: Building Resilient Serverless Systems
(www.infoq.com) -
Kubernetes: A simple overview
(www.oreilly.com) -
Achieving a tenfold increase in Varnish throughput by replacing libvmod‑curl with native request restarts
(tech.showmax.com) -
Demand Forecasting Tech Stack @ Walmart
(medium.com) -
eBay's Hyperscale Platforms
(tech.ebayinc.com) -
Three Strategies For Designing The Caching In Large Scale Distributed System
(hackernoon.com) -
Podcast: Yuri Shkuro on Tracing Distributed Systems Using Jaeger
(www.infoq.com) -
Federated GraphQL Server at Scale: Zillow Rental Manager Real-time Chat Application
(www.zillow.com) -
Article: Cellery: A Code-First Approach to Deploy Applications on Kubernetes
(www.infoq.com) -
Cultivating your Data Lake
(stackshare.io)#data-pipeline #software-architecture #infra #data-engineering
-
Auditing Content Features in FollowFeed
(engineering.linkedin.com) -
How to manage your Snowflake spend with Periscope and dbt
(about.gitlab.com) -
Presto Infrastructure at Lyft
(eng.lyft.com)#infra #scaling #distributed-systems #backend #data-engineering
-
RunBMC: OCP hardware spec solves data center BMC pain points
(blogs.dropbox.com) -
Presentation: The State of Serverless Computing
(www.infoq.com) -
SQL Prober: Black-box Monitoring in Managed CockroachDB
(www.cockroachlabs.com) -
Performance Tuning Postgres within our TLS Infrastructure
(engineering.squarespace.com) -
Upgrading Pinterest operational metrics
(medium.com)#software-architecture #monitoring #infra #microservices #backend
-
Life of Image
(tech.showmax.com) -
Presentation: Building Resilient Serverless Systems
(www.infoq.com) -
Fast and flexible observability with canonical log lines
(stripe.com) -
Evolution of Netflix Conductor:
(medium.com) -
Building a real-time anomaly detection system for time series at Pinterest
(medium.com) -
Fault Tolerance in Distributed Systems: Tracing with Apache Kafka and Jaeger
(www.confluent.io) -
What Is Observability & How to Measure the Quality of Microservices
(semaphoreci.com) -
Implementation of a monitoring strategy for products based on microservices
(engineering.salesforce.com) -
Presentation: The Service Mesh: It's About Traffic
(www.infoq.com) -
Presentation: Automatic Clustering at Snowflake
(www.infoq.com)#infra #DBMS #scaling #distributed-systems #data-engineering
-
Petabyte Scale Data Deduplication
(engineering.mixpanel.com) -
Elasticsearch - clustering on AWS with optional auto-scaling
(blog.scottlogic.com) -
Article: The Pipeline Driven Organization - Enabling True Continuous Delivery
(www.infoq.com) -
Resiliency Doctor – A tool to achieve resiliency in hybrid cloud application ecosystems
(medium.com) -
Presto at Pinterest
(medium.com) -
Expediting Data Fixes and Data Migrations
(engineering.linkedin.com)#software-engineering #infra #scaling #practices #data-engineering
-
A Comprehensive Guide to the Realtime Tech Stack
(www.pubnub.com) -
Presentation: On a Deep Journey Towards Five Nines
(www.infoq.com) -
Autoscaling AWS Step Functions Activities
(engineeringblog.yelp.com) -
A History of Integrator: Scaling Software Deployment Automation
(tech.wayfair.com) -
Athena: Automated Build Health Monitoring at Dropbox Engineering
(www.infoq.com) -
Introduction to Kubernetes Security
(www.weave.works) -
How to get started with site reliability engineering (SRE)
(www.oreilly.com) -
Efficient, reliable cluster management at scale with Tupperware
(code.fb.com) -
Painting a Picture of Your Infrastructure in Minutes
(labs.spotify.com) -
Alerting on SLOs like Pros
(developers.soundcloud.com) -
How PostgreSQL and Grafana Can Improve Monitoring Together
(grafana.com) -
Migrating a Big Data Environment to the Cloud, Part 3
(liveramp.com) -
Presentation: Instrumentation, Observability & Monitoring of Machine Learning Models
(www.infoq.com) -
Building Facebook’s service encryption infrastructure
(code.fb.com) -
Monitoring at eBay with Druid
(www.ebayinc.com) -
Extending DHCPLB: The path from load balancer to server
(code.fb.com) -
Kubernetes Future: VMs, Containers, or Hypervisor?
(www.infoq.com) -
Argo: Workflow Engine for Kubernetes
(itnext.io) -
Athena: Our automated build health management system
(blogs.dropbox.com) -
Observability with the Elastic Stack
(stackshare.io) -
Article: Towards Building an Open API For Consolidating and Federating Service Meshes
(www.infoq.com) -
Highly Available Postgres Databases
(tech.findmypast.com) -
Learning DevOps as a Software Engineer
(jobs.zalando.com) -
xdpcap: XDP Packet Capture
(blog.cloudflare.com) -
Presentation: Airbnb’s Great Migration: Building Services at Scale
(www.infoq.com) -
The history of infrastructure at Zendesk (Part 3) — Foundation team forming and evolving
(medium.com) -
Presentation: Reducing Risk of Credential Compromise @Netflix
(www.infoq.com) -
Presentation: Develop Hundreds of Kubernetes Services at Scale with Airbnb
(www.infoq.com) -
Improving Key Expiration in Redis
(blog.twitter.com) -
From bare-metal to Kubernetes
(highscalability.com) -
One Year of Load Balancing
(blog.algolia.com)#software-architecture #infra #scaling #networking #load-balancing
-
Network Boot To The Rescue
(tech.showmax.com) -
Presentation: Lessons from 300k+ Lines of Infrastructure Code
(www.infoq.com) -
Performance Monitoring Best Practices: Wayfair at InfluxDays NYC
(tech.wayfair.com) -
Improving the User Experience with Uber’s Customer Obsession Ticket Routing Workflow and Orchestration Engine
(eng.uber.com)#software-engineering #software-architecture #infra #backend
-
Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…
(medium.com)#software-architecture #infra #scaling #backend #data-engineering
-
Reinventing Facebook’s data center network
(code.fb.com) -
How Airbnb Simplified the Kubernetes Workflow for 1000+ Engineers
(www.infoq.com)#software-engineering #infra #automation #devops #kubernetes
-
Datadog Log Management from Zero to One
(medium.com) -
Using Machine Learning to Ensure the Capacity Safety of Individual Microservices
(eng.uber.com) -
Re-Platforming Data @BigCommerce: five second latency on Petabytes of data
(www.bigeng.io)#software-architecture #infra #performance #backend #latency
-
How Data Inspires Building a Scalable, Resilient and Secure Cloud Infrastructure At Netflix
(medium.com) -
What is Identity Infrastructure?
(auth0.com) -
Design Of A Modern Cache—Part Deux
(highscalability.com) -
Patterns for asynchronous read models in infrastructure without order guarantee
(blog.arkency.com) -
How we used delayed replication for disaster recovery with PostgreSQL
(about.gitlab.com) -
Fishing For Correlations
(tech.gc.com) -
Presentation: Chaos Engineering with Containers
(www.infoq.com)#software-engineering #infra #QA #testing #chaos-engineering
-
A Hybrid Cloud Approach from FraudGuard.io that Handles 50M Requests a Day
(highscalability.com) -
The scalable fabric behind our growing data center network
(blogs.dropbox.com) -
Automating Datacenter Operations at Dropbox
(blogs.dropbox.com) -
Keeping It Classy: How Quizlet uses hierarchical classification to label content with academic…
(towardsdatascience.com) -
Rethinking data center design for Singapore
(code.fb.com) -
Why we've chosen Snowflake ❄️ as our Data Warehouse
(drivy.engineering) -
Designing resilient systems: Circuit Breakers or Retries? (Part 2)
(engineering.grab.com)#software-architecture #infra #distributed-systems #availability
-
Should You Build An API Gateway In-House?
(nordicapis.com) -
Presentation: DevOps for the Database
(www.infoq.com) -
The history of infrastructure at Zendesk (Part 2) — the messy middle
(medium.com) -
Presentation: Human-centric Machine Learning Infrastructure @Netflix
(www.infoq.com) -
Open Sourcing Bro-Sysmon
(engineering.salesforce.com) -
Bigtable Autoscaler: saving money and time using managed storage
(labs.spotify.com) -
Boosting Big Data workloads with Presto Auto Scaling
(www.eventbrite.com) -
Cache warming: Agility for a stateful service
(medium.com) -
Globalizing Player Accounts
(engineering.riotgames.com) -
Troubleshooting a Connection Timeout Issue with tcp_tw_recycle Enabled
(www.ebayinc.com) -
How to run Docker and get more sleep than I did
(engineering.gusto.com) -
Greenplum for Kubernetes Operator
(engineering.pivotal.io) -
Observability at Scale: Building Uber’s Alerting Ecosystem
(eng.uber.com) -
Dynamic configuration at Twitter
(blog.twitter.com) -
Coding Conversations: The “Perfect Storm" that Brought Down LinkedIn.com
(engineering.linkedin.com) -
Automating Terraform: Infrastructure as Code as a Service
(blog.scottlogic.com) -
How we use AWS Batch at Zendesk to Build All The Machine Learning Models
(medium.com) -
Proactive Data Pipeline Alerting with Pulse
(blog.cloudera.com) -
Optimizing Cluster Resources for Kubernetes Team Development
(www.weave.works) -
Terraform AWS Cloud - Sane Infrastructure Management
(www.toptal.com) -
Optimising an AWS microservice - Part 1
(technology.skybettingandgaming.com) -
Immutable Infrastructure Using Packer, Ansible, and Terraform
(itnext.io) -
Understanding Production: What can you measure?
(www.opsian.com)#software-engineering #monitoring #infra #profiling #production
-
Getting started with monitoring for developers
(blog.bugsnag.com) -
Presentation: Next Gen Networking Infrastructure with Rust
(www.infoq.com) -
Presentation: Programming in Hostile Environments
(www.infoq.com) -
Lumen: Custom, Self-Service Dashboarding For Netflix
(medium.com) -
Dropbox traffic infrastructure: Edge network
(blogs.dropbox.com) -
Use Custom Packet Framing for Microservices Messaging
(blog.codeship.com) -
Eureka, Zuul, and Cloud Configuration - Local Development
(engineering.pivotal.io) -
How We Store Data in the Cloud at Auth0
(auth0.com) -
Kelsey Hightower and Chris Gaun on serverless and Kubernetes
(www.oreilly.com) -
Modernizing Applications for Kubernetes
(www.digitalocean.com) -
Presentation: No Microservice Is an Island
(www.infoq.com) -
Presentation: Design Microservice Architectures the Right Way
(www.infoq.com)#software-design #software-architecture #infra #microservices
-
Presentation: Efficient Service Communication with gRPC
(www.infoq.com) -
Hands-Off Deployment with Canary
(developers.soundcloud.com) -
The history of infrastructure at Zendesk — constant tradeoffs
(medium.com) -
Auth0 Architecture: Running In Multiple Cloud Providers And Regions
(highscalability.com) -
Kubernetes Monitoring with Prometheus, the ultimate guide (part 1)
(sysdig.com) -
Article: Testing Programmable Infrastructure - a Year On
(www.infoq.com) -
Article: How the Boston Children’s Hospital Is Innovating on Top of an Open Cloud
(www.infoq.com) -
Beyond Web and Worker: Evolution of the Modern Web App on Heroku
(blog.heroku.com) -
How we improved the observability of a Go project
(medium.com) -
Presentation: Lyft's Envoy: Embracing a Service Mesh
(www.infoq.com) -
The Dawn of Zendesk’s Machine Learning Model Building Platform with AWS Batch
(medium.com)#data-pipeline #software-architecture #infra #machine-learning
-
Nuage: Making Data Systems Management Scalable
(engineering.linkedin.com) -
How we build a robust analytics platform using Spark, Kafka and Cassandra Lambda architecture
(medium.com) -
Presentation: Next Gen Networking Infrastructure with Rust
(www.infoq.com) -
Comparing Billions of Rows per Day
(segment.com) -
Presentation: Observability to Better Serverless Apps
(www.infoq.com) -
Securing New Products at Clever
(engineering.clever.com) -
Presentation: CRI Runtimes Deep Dive: Who's Running My Kubernetes Pod!?
(www.infoq.com) -
Location-Aware Distribution: Configuring servers at scale
(code.fb.com) -
Presentation: Control Planes: Designing Infrastructure for Rapid Iteration
(www.infoq.com) -
Article: Building an API Gateway with the Ballerina Programming Language
(www.infoq.com) -
Netflix SIRT releases Diffy: A Differencing Engine for Digital Forensics in the Cloud
(medium.com) -
Presentation: Chick-Fil-A: Milking the Most Out of 1000's of K8s Clusters
(www.infoq.com) -
Do you need a service mesh?
(www.oreilly.com) -
Evolution of Telemetry at Bloomberg
(grafana.com) -
Spiral: Self-tuning services via real-time machine learning
(code.fb.com) -
Refactoring Thrift schemas at Pinterest
(medium.com) -
GUI-fying the Machine Learning Workflow: Towards Rapid Discovery of Viable Pipelines
(towardsdatascience.com) -
Migrating Messenger storage to optimize performance
(code.facebook.com) -
Scaling Network Automation at Facebook Using Zero-Touch Provisioning
(www.infoq.com) -
Strategies for Decomposing a System into Microservices
(www.infoq.com) -
From startup to web scale: A gentle introduction to scaling
(hackernoon.com) -
Metacat: Making Big Data Discoverable and Meaningful at Netflix
(medium.com) -
StatePoint Liquid Cooling System: A new, more efficient way to cool a data center
(code.facebook.com) -
Dev and prod parity
(blog.box.com) -
Automated cluster management and recovery for Rocksplicator
(medium.com) -
Red Hat Summit: Building production-ready containers
(developers.redhat.com) -
A Practical Introduction to Logstash
(www.elastic.co) -
Scaling the Facebook backbone through Zero Touch Provisioning
(code.facebook.com) -
How Raygun Processes Millions of Error Events Per Second
(stackshare.io) -
Making LinkedIn's Organic Feed Handle Peak Traffic
(engineering.linkedin.com) -
Building Services at Airbnb, Part 2
(medium.com) -
Meet Our New Network Infrastructure
(www.hostinger.com) -
An Overview of the Service Mesh and Its Tooling Options
(blog.codeship.com) -
How to Build an Effective Initial Deployment Pipeline
(www.toptal.com) -
New data pipeline management platform at Khan Academy
(engineering.khanacademy.org) -
Risk vs. Reward: A Guide to Understanding Software Containers
(www.toptal.com) -
Google: Addressing Cascading Failures
(highscalability.com) -
Lessons Learned — A Year Of Going “Fully Serverless” In Production
(hackernoon.com) -
Intuition Engineering at Allegro with Phobos
(allegro.tech) -
The next step in Facebook's AI hardware infrastructure
(code.facebook.com) -
2 Fast 2 Furious: migrating Medium’s codebase without slowing down
(medium.engineering) -
Fabric Aggregator: A flexible solution to our traffic demand
(code.facebook.com) -
Solution Deep Dive: Building a Highly Available Web Application with Web Processing and Storing Capabilities Using MongoDB and Elk Stack
(www.digitalocean.com) -
Migrating from Heroku to AWS with kubernates and without stopping production
(hackernoon.com) -
SHIFT Commerce's Journey: Deconstructing Monolithic Applications into Services
(blog.heroku.com) -
Scaling Infrastructure Management with Grail
(eng.uber.com) -
Caching Internal Service Calls at Yelp
(engineeringblog.yelp.com) -
Project Nimble: Region Evacuation Reimagined
(medium.com) -
Add Some Smarts To Your Change Data Capture
(medium.com) -
Tobi Knaup and Gou Rao on stateful containers
(www.oreilly.com) -
Introduction to Istio; It Makes A Mesh Of Things
(developers.redhat.com) -
Exploring new frontiers in CI/CD and DevOps
(hackernoon.com) -
Debugging Production with Event Logging
(www.zillow.com) -
Meet Bandaid, the Dropbox service proxy
(blogs.dropbox.com) -
Global connectivity: Working together to bring more people online
(code.facebook.com) -
Chaos Engineering using Amazon EC2 Systems Manager
(hackernoon.com) -
Implementing Model-Agnosticism in Uber’s Real-Time Anomaly Detection Platform
(eng.uber.com) -
The Quest for Availability.
(hackernoon.com) -
How we built rearranging Pins
(medium.com) -
How production engineers support global events on Facebook
(code.facebook.com) -
HelloFresh: Navigating the rough seas of environment scaling with Aho
(highscalability.com) -
What’s All the FaaS About?
(www.pubnub.com) -
Buzzword Central — Microservices, Serverless, Functions, Programmable Infrastructure
(www.pubnub.com) -
Kubernetes for dev infrastructure
(hackernoon.com) -
Machine Learning for a Secure, Available, and Performant Infrastructure
(engineering.salesforce.com) -
Revitalize Gilt City's Order Processing with Serverless Architecture
(tech.gilt.com) -
The Universal Polling System
(code.hootsuite.com) -
Mesos Executor
(allegro.tech) -
Microservices at scale
(www.oreilly.com) -
Fishing for Hackers 2 – Kubernetes Boogaloo
(sysdig.com) -
The Road to Cloud Native
(blog.codeship.com) -
Technical Challenges We Encountered When Moving to a Serverless Architecture in AWS
(engineering.skybettingandgaming.com) -
Gathering Metrics from Your Infrastructure and Applications
(www.digitalocean.com) -
Setting up a Machine Learning Framework for Production
(code.hootsuite.com) -
Our Journey to a Near Perfect Log Pipeline
(engineering.salesforce.com) -
Seedfinder – Infrastructure to Improve Sample Balance in Online A/B Tests
(www.thumbtack.com) -
Scaling PostgreSQL at Thumbtack: Load Balancing And Health Checks
(stackshare.io) -
From on-prem to AWS to ECS and beyond. The past 5 years at Arthrex Digital Media.
(hackernoon.com) -
Play by Play: Moving the NYT Games Platform to GCP With Zero Downtime
(open.nytimes.com) -
Breaking down the monolith with AWS Step Functions
(engineeringblog.yelp.com) -
The Evolution of Security at Riot
(engineering.riotgames.com) -
Scaling PostgreSQL: load balancing and healthchecks
(www.thumbtack.com) -
Docker as Build Environment
(hackernoon.com) -
Chaos Testing for Docker Containers
(hackernoon.com) -
Incremental Data Capture for Oracle Databases at LinkedIn: Then and Now
(engineering.linkedin.com) -
Common MongoDB Topologies
(www.percona.com) -
How To Create a High Availability Setup with Heartbeat and Floating IPs on Ubuntu 16.04
(www.digitalocean.com) -
Moving Towards Full Stack Automation
(www.hostinger.com) -
Transactions in Apache Kafka
(www.confluent.io) -
Serverless Dynamic Web Pages in AWS
(engineering.monsanto.com) -
Chaos Engineering at Twilio with Ratequeue HA
(twilioinc.wpengine.com) -
Resilience Engineering at LinkedIn with Project Waterbear
(engineering.linkedin.com) -
Reliability under abnormal conditions — Part Two
(www.thoughtworks.com) -
How Sentry Receives 20 Billion Events Per Month While Preparing to Handle Twice That
(stackshare.io) -
Fixing the Plumbing: How We Identify and Stop Slow Latency Leaks at LinkedIn
(engineering.linkedin.com) -
Rebuilding the Segment Leaderboards Infrastructure — Part 2: First Principles of a New System
(medium.com) -
Improving Our Video Experience
(open.nytimes.com) -
Performance Left Right and Center.
(engineering.skybettingandgaming.com) -
Live Video Transmuxing/Transcoding: FFmpeg vs TwitchTranscoder, Part II
(blog.twitch.tv) -
Departures: Building a Docker Container-Based Deployment Platform at Condé Nast
(technology.condenast.com) -
Publishing with Apache Kafka at The New York Times
(open.nytimes.com) -
Infrastructure Monitoring with TICK Stack
(blog.codeship.com) -
Videos series: Modernizing Java Apps for IT Pros
(blog.docker.com) -
Stretching Spokes
(githubengineering.com) -
How Shopify Governs Containers at Scale with Grafeas and Kritis
(shopifyengineering.myshopify.com) -
Transit and Peering: How your requests reach GitHub
(githubengineering.com) -
Developer Experience Lessons Operating a Serverless-like Platform At Netflix — Part II
(medium.com) -
Introducing AthenaX, Uber Engineering’s Open Source Streaming Analytics Platform
(eng.uber.com) -
Event First Development - Moving Towards Kafka Pipeline Applications
(jobs.zalando.com) -
Building Complex Data Pipelines with Unified Analytics Platform
(databricks.com) -
OpenSource Metric Based Monitoring
(www.codementor.io) -
Getting Started with Building Realtime API Infrastructure
(becominghuman.ai) -
How does it work? Docker! Part 4: Control your Swarm!
(blog.octo.com) -
Microservices Architecture As A Large-Scale Refactoring Tool
(blog.avenuecode.com) -
How Cloudflare Streams
(blog.cloudflare.com) -
What Is Immutable Infrastructure?
(www.digitalocean.com) -
Geo Key Manager: How It Works
(blog.cloudflare.com) -
Our journey from Redis 2 to Redis 3 while not taking the site down.
(engineering.skybettingandgaming.com) -
Machine Learning for Nginx Logs - Identifying Operational Issues with Your Website
(www.elastic.co) -
Disaster Recovery for Multi-Datacenter Apache Kafka Deployments
(www.confluent.io) -
How to Safely Throttle High Traffic APIs
(nordicapis.com) -
Scaling Event Sourcing for Netflix Downloads, Episode 2
(medium.com) -
Docker image digests
(engineering.remind.com) -
Implementing a queue for LINE LIVE PC transmission
(engineering.linecorp.com) -
Stepping Up the Cloud Security Game
(labs.spotify.com) -
Autoscaling based on request queuing
(medium.com) -
Overcoming AWS Complexity with SaltStack patterns
(eng.lyft.com) -
HTTP/2 for Developers
(fly.io) -
An Introduction to Load Testing
(www.digitalocean.com) -
Delivering Dot
(blog.cloudflare.com) -
9 Tips for a Painless Microservices Migration
(engineering.invisionapp.com) -
A Developer's Introduction to Geotraffic
(fly.io) -
Continuous Delivery for DC/OS With Spinnaker
(engineering.cerner.com) -
Moving Our Trading Engine to AWS
(engineering.skybettingandgaming.com) -
Behind the Scenes of our Transition to a Multi-Cloud Environment
(metamarkets.com) -
The Search for Better Search at Reddit - Because, certainly, we’ve solved it this time
(redditblog.com) -
Optimizing web servers for high throughput and low latency
(blogs.dropbox.com) -
Continuous Automation at Texas A&M University
(blog.chef.io) -
Deploying GrootFS to Pivotal Web Services (PWS)
(engineering.pivotal.io) -
Keeping an eye on our network
(githubengineering.com) -
How does it work? Docker! Part 2: Swarm networking
(blog.octo.com) -
Rapid release at massive scale
(code.facebook.com) -
LogDevice: a distributed data store for logs
(code.facebook.com) -
Moving Real-Time Data Flow Across Cloud Providers
(metamarkets.com) -
Introducing Social Hash Partitioner, a scalable distributed hypergraph partitioner
(research.fb.com) -
The Skinny on Fat, Thin, Hollow, and Uber
(developers.redhat.com) -
Game of Lambdas
(rea.tech) -
Infrastructure As Code With humidifier-reservoir
(eng.localytics.com) -
Finding yourself in the world of backend architecture
(upday.github.io) -
Understanding Failure Modes in Message and Event-based Systems
(multithreaded.stitchfix.com) -
Container Metadata – Understanding Metrics, Labels, & Tags
(sysdig.com) -
Steering oceans of content to the world
(code.facebook.com) -
5 Technologies We Have Used At Hootsuite to Build a Flexible Distributed Data PipeLine
(code.hootsuite.com)#dev-tools #software-architecture #infra #distributed-systems #backend
-
Sensitive data storage made easy.
(www.codelitt.com) -
Autoscaling applications @ PayPal
(www.paypal-engineering.com) -
Scaling Contextual Conversation Suggestions Over 500 Million Members
(engineering.linkedin.com) -
Designing a Microservices Architecture for Failure
(blog.risingstack.com) -
How Hootsuite does Microservices
(code.hootsuite.com) -
Genie in a Box : Making Spark Easy for Stitch Fix Data Scientists
(multithreaded.stitchfix.com) -
Monitoring Your Asynchronous Python Web Applications Using Prometheus
(blog.codeship.com) -
Migrating Existing Datastores
(engineering.grab.com) -
Sundial or AWS Batch, Why not both?
(tech.gilt.com) -
Dynamically Routing Requests Across Different Stacks with VCL
(redditblog.com) -
How to Route SSL Traffic to a Kubernetes Application
(fly.io) -
Inside a SoundCloud Microservice
(developers.soundcloud.com) -
API, Zero Downtime Deployment and SQL Migration: Theory and Case Study
(blog.octo.com) -
7 Interesting Parallels Between the Invention of Tiny Satellites and Cloud Computing
(highscalability.com) -
ChAP: Chaos Automation Platform
(medium.com) -
AWS infrastructure setup: the CleverTap way
(clevertap.com) -
How to use Cloudflare for Service Discovery
(blog.cloudflare.com) -
Lambda@Edge – Intelligent Processing of HTTP Requests at the Edge
(aws.amazon.com) -
Discovering Flynn
(blog.octo.com) -
Serverless Continuous Delivery with Databricks and AWS CodePipeline
(databricks.com) -
BigDB - an ad data pipeline for LINE
(engineering.linecorp.com) -
Securing W Magazine: Our Migration to HTTPS
(technology.condenast.com) -
Engineering Data Analytics with Presto and Parquet at Uber
(eng.uber.com) -
Developer Experience Lessons Operating a Serverless-like Platform at Netflix
(medium.com)#software-architecture #infra #serverless #deployment #api-backend
-
Configuring Containerized Services
(developers.redhat.com) -
High-reliability OCSP stapling and why it matters
(blog.cloudflare.com) -
Helm secrets a missing piece in Kubernetes
(lab.getbase.com) -
Introducing BDDA, the infrastructure workflow we use for Kubernetes
(developer.atlassian.com) -
Building an Internal Cloud with Docker and CoreOS
(shopifyengineering.myshopify.com) -
MySQL infrastructure testing automation at GitHub
(githubengineering.com) -
How we designed our Kubernetes infrastructure on AWS
(developer.atlassian.com) -
Exactly-once Semantics are Possible: Here’s How Kafka Does it
(www.confluent.io) -
Stupidly Simple DDoS Protocol (SSDP) generates 100 Gbps DDoS
(blog.cloudflare.com) -
Presto - a small step for DevOps engineer but a big step for BigData analyst
(allegro.tech) -
Scaling the Elastic Stack in a Microservices Architecture @ Rightmove
(www.elastic.co) -
Understanding Linux Container Scheduling
(engineering.squarespace.com) -
Instrumenting Sidekiq
(drivy.engineering) -
Declarative Infrastructure with the Jsonnet Templating Language
(databricks.com)