Can someone explain the difference between Apache Mesos, Apache Spark, and Kuber...

wowmsi · on Nov 4, 2014

Mesos and Kubernetes share lot of commonalities: both are useful for managing (virtual) compute resources for a given application. Primarily, as an app developer, you are interesting in specifying the required resources at a high level: 10 front end instances, 20 middle tier, and 3 back end database instances. You may not care how the resources are provisioned as long as some underlying SLA is satisfied. This is useful for boosting service uptime since in any long running service the nodes may go offline anytime and the framework will re-provision the resources automatically--often within few mins.

Mesos is useful to share resources across multiple clusters (i.e., multiple departments using Hadoop within a same organization may want to share the resources), where as Kubernetes seems to focus on container applications using docker. In addition, Mesos implements dominant-resource-fairness (DRF) scheduling, which has some nice properties [0].

Lastly, Spark is just an application running on the top of Mesos or Kubernetes. Using my earlier example, Spark and Hadoop can run in a single datacenter on the top of Mesos without statically partitioning the clusters a priori each having an illusion that they own the entire datacenter.

[0] https://www.usenix.org/event/nsdi11/tech/full_papers/Hindman...

shepardrtc · on Nov 4, 2014

In simple terms:

Apache Mesos is a distributed system that is kind of a "bottom layer" for computations and storage (whether in-memory or on disk).

Apache Spark is a distributed application that runs on top of Mesos and does computations that takes advantage of cluster computing. It can do classic MapReduce or other algorithms that you write using its API.

Kubernetes is a distributed system that runs Docker in a cluster. Docker is a way to run sandboxed applications. Kubernetes can run on Mesos.

bjt · on Nov 4, 2014

I'll give it a shot, with the disclaimer that this explanation is based on conference talks I've seen and documentation I've read.

Spark is a successor to Hadoop, aiming to perform big distributed data crunching jobs more quickly by not limiting itself to a map-reduce paradigm and by holding more data in memory.

Apache Mesos is a resource scheduler for a cluster of machines. It is architected to be agnostic about the kind of application running on it, but I get the impression it's primarily used to make things like Hadoop and Spark run on the same cluster without stepping on each other's toes (and not used so much for long-running services like a web app). An application says to Mesos "hey I'd like to run job X", outlining the resources that it needs, and then Mesos looks at the cluster to see the best place to run the job. Mesos has some Docker integration but it's weird. Instead of launching your job and creating cgroups directly as it usually does, Mesos has to run a proxy process that talks to Docker and asks it to do it. (This is because Docker no longer has standalone mode. Which is my biggest gripe about Docker.)

Kubernetes is solving a similar problem to Mesos, how to allocate resources in a cluster to all the jobs that you might want to run on it. It's explicitly focused on "containers", though not necessarily just Docker containers. (The README at https://github.com/GoogleCloudPlatform/kubernetes makes no mention of Docker.) I get the impression it's focused more on long-running processes like web servers and less on ephemeral Hadoop-style jobs.

vertex-four · on Nov 4, 2014

> and not used so much for long-running services like a web app

Mesosphere Marathon, which sits on top of Mesos, makes doing long-running services easier; it starts services, restarts them if machines crash, etc etc. Where Mesos is the resource schedular, Marathon is more like a *nix init system.

Aurora is supposed to do much the same thing, but is in Incubation at the moment. There's also "Singularity", which does a bunch of stuff including managing long-running services and one-off services through an HTTP API and webapp.

Mesos is supposed to be treated sort of as a "cluster kernel", with "frameworks" which sit on top of it and use it to schedule things. It's a lot more versatile than "I want to deploy a bunch of services" - it's more along the lines of having your own EC2 that you can request resources from on-demand.

dln_eintr · on Nov 5, 2014

Kubernetes also fits very well with Mesos for scheduling.

Check out https://github.com/mesosphere/kubernetes-mesos

vertex-four · on Nov 4, 2014

Apache Mesos is based on Twitter's expertise in deploying their cluster, Kubernetes on Google's. They do things in different ways. Spark seems to be in the same area as Hadoop, so not relevant to the conversation.

23david · on Nov 4, 2014

Yep, but there's an important difference I think...

Mesos (some customizations, but largely the same as open-source Apache Mesos) IS what Twitter uses to deploy and manage their clusters. Battle-hardened at scale running diverse production workloads.

With Kubernetes, we're told that it is built using architectural and philosophical principles proven to work at scale on Google's production systems. But it's a fairly clean-room built-from-scratch implementation and although developing quickly, is still immature and untested.

hendzen · on Nov 4, 2014

No. Apache Mesos was originally developed at the UC Berkeley AMPLab as a research project. Twitter was a very early adopter, who subsequently hired the Mesos author/creator.