Spark on Kubernetes. How spark nature of statefulness is maintained in Kubernetes?

I am experimenting Spark2.3 on a K8s cluster. Wondering how does the checkpoint work? Where is it stored? If the main driver dies, what happens to the existing processing?

In case of consuming from Kafka, how does the offset maintained? I tried to lookup online but could not find any answer to those questions. Our application is consuming a lot of Kafka data so it is essential to be able to restart and pick up from where it was stopped.

Any gotchas on running Spark Streaming on K8s?

edited Nov 14 '18 at 22:33

Rico

28.4k95066

asked Nov 14 '18 at 22:21

Vijay Ram

12512

add a comment |

I am experimenting Spark2.3 on a K8s cluster. Wondering how does the checkpoint work? Where is it stored? If the main driver dies, what happens to the existing processing?

Any gotchas on running Spark Streaming on K8s?

edited Nov 14 '18 at 22:33

Rico

28.4k95066

asked Nov 14 '18 at 22:21

Vijay Ram

12512

add a comment |

I am experimenting Spark2.3 on a K8s cluster. Wondering how does the checkpoint work? Where is it stored? If the main driver dies, what happens to the existing processing?

Any gotchas on running Spark Streaming on K8s?

edited Nov 14 '18 at 22:33

Rico

28.4k95066

asked Nov 14 '18 at 22:21

Vijay Ram

12512

I am experimenting Spark2.3 on a K8s cluster. Wondering how does the checkpoint work? Where is it stored? If the main driver dies, what happens to the existing processing?

Any gotchas on running Spark Streaming on K8s?

apache-spark kubernetes spark-streaming

edited Nov 14 '18 at 22:33

Rico

28.4k95066

asked Nov 14 '18 at 22:21

Vijay Ram

12512

edited Nov 14 '18 at 22:33

Rico

28.4k95066

asked Nov 14 '18 at 22:21

Vijay Ram

12512

edited Nov 14 '18 at 22:33

Rico

28.4k95066

edited Nov 14 '18 at 22:33

Rico

28.4k95066

edited Nov 14 '18 at 22:33

Rico

28.4k95066

asked Nov 14 '18 at 22:21

Vijay Ram

12512

asked Nov 14 '18 at 22:21

Vijay Ram

12512

asked Nov 14 '18 at 22:21

Vijay Ram

12512

add a comment |

1 Answer
1

active

oldest

votes

The Kubernetes Spark Controller doesn't know anything about checkpointing, AFAIK. It's just a way for Kubernetes to schedule your Spark driver and the Workers that it needs to run a job.

Storing the offset is really up to your application and where you want to store the Kafka offset, so that when it restarts it picks up that offset and starts consuming from there. This is an example on how to store it in Zookeeper.

You could, for example, write ZK offset manager functions in Scala:

import com.metamx.common.scala.Logging
import org.apache.curator.framework.CuratorFramework
...
object OffsetManager extends Logging {

 def getOffsets(client: CuratorFramework,
 ... = 

 

 def setOffsets(client: CuratorFramework,
 ... = 

 
 ...

Another way would be storing your Kafka offsets in something reliable like HDFS.

answered Nov 14 '18 at 22:50

Rico

28.4k95066

Thanks for comments. So.. stateful is not provided by kubernetes. application's responsibility to take care of that. Is that what it is?

– Vijay Ram
Nov 15 '18 at 12:12

Yes, that is correct.

– Rico
Nov 15 '18 at 15:00

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53309614%2fspark-on-kubernetes-how-spark-nature-of-statefulness-is-maintained-in-kubernete%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

The Kubernetes Spark Controller doesn't know anything about checkpointing, AFAIK. It's just a way for Kubernetes to schedule your Spark driver and the Workers that it needs to run a job.

You could, for example, write ZK offset manager functions in Scala:

import com.metamx.common.scala.Logging
import org.apache.curator.framework.CuratorFramework
...
object OffsetManager extends Logging {

 def getOffsets(client: CuratorFramework,
 ... = 

 

 def setOffsets(client: CuratorFramework,
 ... = 

 
 ...

Another way would be storing your Kafka offsets in something reliable like HDFS.

answered Nov 14 '18 at 22:50

Rico

28.4k95066

Thanks for comments. So.. stateful is not provided by kubernetes. application's responsibility to take care of that. Is that what it is?

– Vijay Ram
Nov 15 '18 at 12:12

Yes, that is correct.

– Rico
Nov 15 '18 at 15:00

add a comment |

The Kubernetes Spark Controller doesn't know anything about checkpointing, AFAIK. It's just a way for Kubernetes to schedule your Spark driver and the Workers that it needs to run a job.

You could, for example, write ZK offset manager functions in Scala:

import com.metamx.common.scala.Logging
import org.apache.curator.framework.CuratorFramework
...
object OffsetManager extends Logging {

 def getOffsets(client: CuratorFramework,
 ... = 

 

 def setOffsets(client: CuratorFramework,
 ... = 

 
 ...

Another way would be storing your Kafka offsets in something reliable like HDFS.

answered Nov 14 '18 at 22:50

Rico

28.4k95066

Thanks for comments. So.. stateful is not provided by kubernetes. application's responsibility to take care of that. Is that what it is?

– Vijay Ram
Nov 15 '18 at 12:12

Yes, that is correct.

– Rico
Nov 15 '18 at 15:00

add a comment |

The Kubernetes Spark Controller doesn't know anything about checkpointing, AFAIK. It's just a way for Kubernetes to schedule your Spark driver and the Workers that it needs to run a job.

You could, for example, write ZK offset manager functions in Scala:

import com.metamx.common.scala.Logging
import org.apache.curator.framework.CuratorFramework
...
object OffsetManager extends Logging {

 def getOffsets(client: CuratorFramework,
 ... = 

 

 def setOffsets(client: CuratorFramework,
 ... = 

 
 ...

Another way would be storing your Kafka offsets in something reliable like HDFS.

answered Nov 14 '18 at 22:50

Rico

28.4k95066

The Kubernetes Spark Controller doesn't know anything about checkpointing, AFAIK. It's just a way for Kubernetes to schedule your Spark driver and the Workers that it needs to run a job.

You could, for example, write ZK offset manager functions in Scala:

import com.metamx.common.scala.Logging
import org.apache.curator.framework.CuratorFramework
...
object OffsetManager extends Logging {

 def getOffsets(client: CuratorFramework,
 ... = 

 

 def setOffsets(client: CuratorFramework,
 ... = 

 
 ...

Another way would be storing your Kafka offsets in something reliable like HDFS.

answered Nov 14 '18 at 22:50

Rico

28.4k95066

answered Nov 14 '18 at 22:50

Rico

28.4k95066

answered Nov 14 '18 at 22:50

Rico

28.4k95066

answered Nov 14 '18 at 22:50

Rico

28.4k95066

Thanks for comments. So.. stateful is not provided by kubernetes. application's responsibility to take care of that. Is that what it is?

– Vijay Ram
Nov 15 '18 at 12:12

Yes, that is correct.

– Rico
Nov 15 '18 at 15:00

add a comment |

Thanks for comments. So.. stateful is not provided by kubernetes. application's responsibility to take care of that. Is that what it is?

– Vijay Ram
Nov 15 '18 at 12:12

Yes, that is correct.

– Rico
Nov 15 '18 at 15:00

Thanks for comments. So.. stateful is not provided by kubernetes. application's responsibility to take care of that. Is that what it is?

– Vijay Ram
Nov 15 '18 at 12:12

Yes, that is correct.

– Rico
Nov 15 '18 at 15:00

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Odtnhj