Where to run the processing code in Kafka?

I am trying to setup a data pipeline using Kafka.
Data go in (with producers), get processed, enriched and cleaned and move out to different databases or storage (with consumers or Kafka connect).

But where do you run the actual pipeline processing code to enrich and clean the data? Should it be part of the producers or the consumers? I think I missed something.

asked Nov 15 '18 at 19:33

user2409399

155113

add a comment |

I am trying to setup a data pipeline using Kafka.
Data go in (with producers), get processed, enriched and cleaned and move out to different databases or storage (with consumers or Kafka connect).

But where do you run the actual pipeline processing code to enrich and clean the data? Should it be part of the producers or the consumers? I think I missed something.

asked Nov 15 '18 at 19:33

user2409399

155113

add a comment |

I am trying to setup a data pipeline using Kafka.
Data go in (with producers), get processed, enriched and cleaned and move out to different databases or storage (with consumers or Kafka connect).

But where do you run the actual pipeline processing code to enrich and clean the data? Should it be part of the producers or the consumers? I think I missed something.

asked Nov 15 '18 at 19:33

user2409399

155113

I am trying to setup a data pipeline using Kafka.
Data go in (with producers), get processed, enriched and cleaned and move out to different databases or storage (with consumers or Kafka connect).

But where do you run the actual pipeline processing code to enrich and clean the data? Should it be part of the producers or the consumers? I think I missed something.

apache-kafka

asked Nov 15 '18 at 19:33

user2409399

155113

asked Nov 15 '18 at 19:33

user2409399

155113

asked Nov 15 '18 at 19:33

user2409399

155113

asked Nov 15 '18 at 19:33

user2409399

155113

asked Nov 15 '18 at 19:33

user2409399

155113

add a comment |

3 Answers
3

active

oldest

votes

In the use case of a data pipeline the Kafka clients could serve both as a consumer and producer.

For example, if you have raw data being streamed into ClientA where it is being cleaned before being passed to ClientB for enrichment then ClientA is serving as a consumer (listening to a topic for raw data) and a producer (publishing cleaned data to a topic).

Where you draw those boundaries is a separate question.

answered Nov 15 '18 at 20:45

akim

263

add a comment |

It can be part of either producer or consumer.

Or you could setup an environment dedicated to something like Kafka Streams processes or a KSQL cluster

answered Nov 15 '18 at 22:12

cricket_007

83.6k1147117

add a comment |

It is possible either ways.Consider all possible options , choose an option which suits you best. Lets assume you have a source, raw data in csv or some DB(Oracle) and you want to do your ETL stuff and load it back to some different datastores

1) Use kafka connect to produce your data to kafka topics.
 Have a consumer which would consume off of these topics(could Kstreams, Ksql or Akka, Spark).
 Produce back to a kafka topic for further use or some datastore, any sink basically

This has the benefit of ingesting your data with little or no code using kafka connect as it is easy to set up kafka connect source producers.

2) Write custom producers, do your transformations in producers before 
writing to kafka topic or directly to a sink unless you want to reuse this produced data 
for some further processing.
Read from kafka topic and do some further processing and write it back to persistent store.

It all boils down to your design choice, the thoughput you need from the system, how complicated your data structure is.

answered Nov 16 '18 at 6:51

Achilleus

710418

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53326745%2fwhere-to-run-the-processing-code-in-kafka%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

In the use case of a data pipeline the Kafka clients could serve both as a consumer and producer.

Where you draw those boundaries is a separate question.

answered Nov 15 '18 at 20:45

akim

263

add a comment |

In the use case of a data pipeline the Kafka clients could serve both as a consumer and producer.

Where you draw those boundaries is a separate question.

answered Nov 15 '18 at 20:45

akim

263

add a comment |

In the use case of a data pipeline the Kafka clients could serve both as a consumer and producer.

Where you draw those boundaries is a separate question.

answered Nov 15 '18 at 20:45

akim

263

In the use case of a data pipeline the Kafka clients could serve both as a consumer and producer.

Where you draw those boundaries is a separate question.

answered Nov 15 '18 at 20:45

akim

263

answered Nov 15 '18 at 20:45

akim

263

answered Nov 15 '18 at 20:45

akim

263

answered Nov 15 '18 at 20:45

akim

263

add a comment |

It can be part of either producer or consumer.

Or you could setup an environment dedicated to something like Kafka Streams processes or a KSQL cluster

answered Nov 15 '18 at 22:12

cricket_007

83.6k1147117

add a comment |

It can be part of either producer or consumer.

Or you could setup an environment dedicated to something like Kafka Streams processes or a KSQL cluster

answered Nov 15 '18 at 22:12

cricket_007

83.6k1147117

add a comment |

It can be part of either producer or consumer.

Or you could setup an environment dedicated to something like Kafka Streams processes or a KSQL cluster

answered Nov 15 '18 at 22:12

cricket_007

83.6k1147117

It can be part of either producer or consumer.

Or you could setup an environment dedicated to something like Kafka Streams processes or a KSQL cluster

answered Nov 15 '18 at 22:12

cricket_007

83.6k1147117

answered Nov 15 '18 at 22:12

cricket_007

83.6k1147117

answered Nov 15 '18 at 22:12

cricket_007

83.6k1147117

answered Nov 15 '18 at 22:12

cricket_007

83.6k1147117

add a comment |

1) Use kafka connect to produce your data to kafka topics.
 Have a consumer which would consume off of these topics(could Kstreams, Ksql or Akka, Spark).
 Produce back to a kafka topic for further use or some datastore, any sink basically

This has the benefit of ingesting your data with little or no code using kafka connect as it is easy to set up kafka connect source producers.

2) Write custom producers, do your transformations in producers before 
writing to kafka topic or directly to a sink unless you want to reuse this produced data 
for some further processing.
Read from kafka topic and do some further processing and write it back to persistent store.

It all boils down to your design choice, the thoughput you need from the system, how complicated your data structure is.

answered Nov 16 '18 at 6:51

Achilleus

710418

add a comment |

1) Use kafka connect to produce your data to kafka topics.
 Have a consumer which would consume off of these topics(could Kstreams, Ksql or Akka, Spark).
 Produce back to a kafka topic for further use or some datastore, any sink basically

This has the benefit of ingesting your data with little or no code using kafka connect as it is easy to set up kafka connect source producers.

2) Write custom producers, do your transformations in producers before 
writing to kafka topic or directly to a sink unless you want to reuse this produced data 
for some further processing.
Read from kafka topic and do some further processing and write it back to persistent store.

It all boils down to your design choice, the thoughput you need from the system, how complicated your data structure is.

answered Nov 16 '18 at 6:51

Achilleus

710418

add a comment |

1) Use kafka connect to produce your data to kafka topics.
 Have a consumer which would consume off of these topics(could Kstreams, Ksql or Akka, Spark).
 Produce back to a kafka topic for further use or some datastore, any sink basically

This has the benefit of ingesting your data with little or no code using kafka connect as it is easy to set up kafka connect source producers.

2) Write custom producers, do your transformations in producers before 
writing to kafka topic or directly to a sink unless you want to reuse this produced data 
for some further processing.
Read from kafka topic and do some further processing and write it back to persistent store.

It all boils down to your design choice, the thoughput you need from the system, how complicated your data structure is.

answered Nov 16 '18 at 6:51

Achilleus

710418

1) Use kafka connect to produce your data to kafka topics.
 Have a consumer which would consume off of these topics(could Kstreams, Ksql or Akka, Spark).
 Produce back to a kafka topic for further use or some datastore, any sink basically

This has the benefit of ingesting your data with little or no code using kafka connect as it is easy to set up kafka connect source producers.

2) Write custom producers, do your transformations in producers before 
writing to kafka topic or directly to a sink unless you want to reuse this produced data 
for some further processing.
Read from kafka topic and do some further processing and write it back to persistent store.

It all boils down to your design choice, the thoughput you need from the system, how complicated your data structure is.

answered Nov 16 '18 at 6:51

Achilleus

710418

answered Nov 16 '18 at 6:51

Achilleus

710418

answered Nov 16 '18 at 6:51

Achilleus

710418

answered Nov 16 '18 at 6:51

Achilleus

710418

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

M Lf8CYJ FJMRUPWn5rXomqBn0oDT xvbW IVoe4IfLG3D,mkG nkW1o,fD kopfXFS NWit,H1,PyZVTtIxSdrgM8XPV 8IDJ,Q6pFRrGoGdf

搜尋此網誌

Odtnhj