Ignoring Repeated Sets of Logging










1















There is a program that is logging a lot of information, some of which tends to repeat itself in certain situations. I've been tasked with preventing this over-logging and I need some direction as to what to do. Currently I can prevent the same message in a row from repeating, but it becomes a lot more tricky when preventing various sized unique sets of log messages from occurring.



I tried to first break down the problem into a smaller one by which I am using characters to represent unique log messages.



Input



aaaaababababacdefgfggfabcddggddgg


Output



abacdefgfabcdg
0 : Hash: 100, Msg: d
1 : Hash: 103, Msg: g


This is the output for my current program which seems to work for 1-2 unique characters. All the characters displayed in a row represent the messages that are logged and the next two lines show what my current buffer used to compare sequences contains. Therefore if I were to add "xz" to the original input I get the following output



abacdefgfabcdgdg
0 : Hash: 120, Msg: x
1 : Hash: 122, Msg: z


As we can see the d, g get "logged" and the next sequence we don't allow to occur would be 'xz' or 'z' based on what's in the sequence buffer.



Does anyone know of an actual algorithm that can detect/prevent repeated unique sequences?



I've looked at this but it doesn't quite fit my needs.



More Examples




Input -> Desired Output



  1. ABCABCACD -> ABCACD

  2. AAABABACBCBCB -> ABACB

  3. ABBABBA -> ABABA

  4. ACDACDDDCDC -> ACDC



The letters represent unique log messages and I'd like to prevent the repeated sets of log messages from showing up.










share|improve this question
























  • What is the size of the problem? I.e. give us a rough estimate of how many log messages/sec. Do you want the messages to be unique for the entire log? That could be hard with a large log. One way to restrict problem size would be to process the log in blocks, and only guarantee that a message is unique within its own block.

    – Alex
    Nov 14 '18 at 20:49











  • This sounds bad. There is already alot of overhead in logging, why would you want to make it worse by tracking logged messages in the process that logs them? Why would you only want to see that first message? It looks like what you really need is a mainstream logger that provides filtering and/or to reevaluate how and what you are logging. If this is for some real world production project, I'd surely raise an eyebrow. Not enough context is provided to really suggest anything.

    – Christopher Pisz
    Nov 14 '18 at 21:01












  • Look up compression algorithms. You are compressing the output (logging) stream.

    – Thomas Matthews
    Nov 14 '18 at 21:16











  • Do you need to log date or time stamps?

    – Thomas Matthews
    Nov 14 '18 at 21:18











  • The problem I would like to solve is the logs can, if the application is in a broken state, fill the logs with repeated sequence of messages. And so troubleshooting becomes impossible because the useful log messages have since been overwritten @ChristopherPisz

    – RAZ_Muh_Taz
    Nov 14 '18 at 21:54















1















There is a program that is logging a lot of information, some of which tends to repeat itself in certain situations. I've been tasked with preventing this over-logging and I need some direction as to what to do. Currently I can prevent the same message in a row from repeating, but it becomes a lot more tricky when preventing various sized unique sets of log messages from occurring.



I tried to first break down the problem into a smaller one by which I am using characters to represent unique log messages.



Input



aaaaababababacdefgfggfabcddggddgg


Output



abacdefgfabcdg
0 : Hash: 100, Msg: d
1 : Hash: 103, Msg: g


This is the output for my current program which seems to work for 1-2 unique characters. All the characters displayed in a row represent the messages that are logged and the next two lines show what my current buffer used to compare sequences contains. Therefore if I were to add "xz" to the original input I get the following output



abacdefgfabcdgdg
0 : Hash: 120, Msg: x
1 : Hash: 122, Msg: z


As we can see the d, g get "logged" and the next sequence we don't allow to occur would be 'xz' or 'z' based on what's in the sequence buffer.



Does anyone know of an actual algorithm that can detect/prevent repeated unique sequences?



I've looked at this but it doesn't quite fit my needs.



More Examples




Input -> Desired Output



  1. ABCABCACD -> ABCACD

  2. AAABABACBCBCB -> ABACB

  3. ABBABBA -> ABABA

  4. ACDACDDDCDC -> ACDC



The letters represent unique log messages and I'd like to prevent the repeated sets of log messages from showing up.










share|improve this question
























  • What is the size of the problem? I.e. give us a rough estimate of how many log messages/sec. Do you want the messages to be unique for the entire log? That could be hard with a large log. One way to restrict problem size would be to process the log in blocks, and only guarantee that a message is unique within its own block.

    – Alex
    Nov 14 '18 at 20:49











  • This sounds bad. There is already alot of overhead in logging, why would you want to make it worse by tracking logged messages in the process that logs them? Why would you only want to see that first message? It looks like what you really need is a mainstream logger that provides filtering and/or to reevaluate how and what you are logging. If this is for some real world production project, I'd surely raise an eyebrow. Not enough context is provided to really suggest anything.

    – Christopher Pisz
    Nov 14 '18 at 21:01












  • Look up compression algorithms. You are compressing the output (logging) stream.

    – Thomas Matthews
    Nov 14 '18 at 21:16











  • Do you need to log date or time stamps?

    – Thomas Matthews
    Nov 14 '18 at 21:18











  • The problem I would like to solve is the logs can, if the application is in a broken state, fill the logs with repeated sequence of messages. And so troubleshooting becomes impossible because the useful log messages have since been overwritten @ChristopherPisz

    – RAZ_Muh_Taz
    Nov 14 '18 at 21:54













1












1








1








There is a program that is logging a lot of information, some of which tends to repeat itself in certain situations. I've been tasked with preventing this over-logging and I need some direction as to what to do. Currently I can prevent the same message in a row from repeating, but it becomes a lot more tricky when preventing various sized unique sets of log messages from occurring.



I tried to first break down the problem into a smaller one by which I am using characters to represent unique log messages.



Input



aaaaababababacdefgfggfabcddggddgg


Output



abacdefgfabcdg
0 : Hash: 100, Msg: d
1 : Hash: 103, Msg: g


This is the output for my current program which seems to work for 1-2 unique characters. All the characters displayed in a row represent the messages that are logged and the next two lines show what my current buffer used to compare sequences contains. Therefore if I were to add "xz" to the original input I get the following output



abacdefgfabcdgdg
0 : Hash: 120, Msg: x
1 : Hash: 122, Msg: z


As we can see the d, g get "logged" and the next sequence we don't allow to occur would be 'xz' or 'z' based on what's in the sequence buffer.



Does anyone know of an actual algorithm that can detect/prevent repeated unique sequences?



I've looked at this but it doesn't quite fit my needs.



More Examples




Input -> Desired Output



  1. ABCABCACD -> ABCACD

  2. AAABABACBCBCB -> ABACB

  3. ABBABBA -> ABABA

  4. ACDACDDDCDC -> ACDC



The letters represent unique log messages and I'd like to prevent the repeated sets of log messages from showing up.










share|improve this question
















There is a program that is logging a lot of information, some of which tends to repeat itself in certain situations. I've been tasked with preventing this over-logging and I need some direction as to what to do. Currently I can prevent the same message in a row from repeating, but it becomes a lot more tricky when preventing various sized unique sets of log messages from occurring.



I tried to first break down the problem into a smaller one by which I am using characters to represent unique log messages.



Input



aaaaababababacdefgfggfabcddggddgg


Output



abacdefgfabcdg
0 : Hash: 100, Msg: d
1 : Hash: 103, Msg: g


This is the output for my current program which seems to work for 1-2 unique characters. All the characters displayed in a row represent the messages that are logged and the next two lines show what my current buffer used to compare sequences contains. Therefore if I were to add "xz" to the original input I get the following output



abacdefgfabcdgdg
0 : Hash: 120, Msg: x
1 : Hash: 122, Msg: z


As we can see the d, g get "logged" and the next sequence we don't allow to occur would be 'xz' or 'z' based on what's in the sequence buffer.



Does anyone know of an actual algorithm that can detect/prevent repeated unique sequences?



I've looked at this but it doesn't quite fit my needs.



More Examples




Input -> Desired Output



  1. ABCABCACD -> ABCACD

  2. AAABABACBCBCB -> ABACB

  3. ABBABBA -> ABABA

  4. ACDACDDDCDC -> ACDC



The letters represent unique log messages and I'd like to prevent the repeated sets of log messages from showing up.







c++ algorithm logging






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 14 '18 at 22:39







RAZ_Muh_Taz

















asked Nov 14 '18 at 20:27









RAZ_Muh_TazRAZ_Muh_Taz

3,6171822




3,6171822












  • What is the size of the problem? I.e. give us a rough estimate of how many log messages/sec. Do you want the messages to be unique for the entire log? That could be hard with a large log. One way to restrict problem size would be to process the log in blocks, and only guarantee that a message is unique within its own block.

    – Alex
    Nov 14 '18 at 20:49











  • This sounds bad. There is already alot of overhead in logging, why would you want to make it worse by tracking logged messages in the process that logs them? Why would you only want to see that first message? It looks like what you really need is a mainstream logger that provides filtering and/or to reevaluate how and what you are logging. If this is for some real world production project, I'd surely raise an eyebrow. Not enough context is provided to really suggest anything.

    – Christopher Pisz
    Nov 14 '18 at 21:01












  • Look up compression algorithms. You are compressing the output (logging) stream.

    – Thomas Matthews
    Nov 14 '18 at 21:16











  • Do you need to log date or time stamps?

    – Thomas Matthews
    Nov 14 '18 at 21:18











  • The problem I would like to solve is the logs can, if the application is in a broken state, fill the logs with repeated sequence of messages. And so troubleshooting becomes impossible because the useful log messages have since been overwritten @ChristopherPisz

    – RAZ_Muh_Taz
    Nov 14 '18 at 21:54

















  • What is the size of the problem? I.e. give us a rough estimate of how many log messages/sec. Do you want the messages to be unique for the entire log? That could be hard with a large log. One way to restrict problem size would be to process the log in blocks, and only guarantee that a message is unique within its own block.

    – Alex
    Nov 14 '18 at 20:49











  • This sounds bad. There is already alot of overhead in logging, why would you want to make it worse by tracking logged messages in the process that logs them? Why would you only want to see that first message? It looks like what you really need is a mainstream logger that provides filtering and/or to reevaluate how and what you are logging. If this is for some real world production project, I'd surely raise an eyebrow. Not enough context is provided to really suggest anything.

    – Christopher Pisz
    Nov 14 '18 at 21:01












  • Look up compression algorithms. You are compressing the output (logging) stream.

    – Thomas Matthews
    Nov 14 '18 at 21:16











  • Do you need to log date or time stamps?

    – Thomas Matthews
    Nov 14 '18 at 21:18











  • The problem I would like to solve is the logs can, if the application is in a broken state, fill the logs with repeated sequence of messages. And so troubleshooting becomes impossible because the useful log messages have since been overwritten @ChristopherPisz

    – RAZ_Muh_Taz
    Nov 14 '18 at 21:54
















What is the size of the problem? I.e. give us a rough estimate of how many log messages/sec. Do you want the messages to be unique for the entire log? That could be hard with a large log. One way to restrict problem size would be to process the log in blocks, and only guarantee that a message is unique within its own block.

– Alex
Nov 14 '18 at 20:49





What is the size of the problem? I.e. give us a rough estimate of how many log messages/sec. Do you want the messages to be unique for the entire log? That could be hard with a large log. One way to restrict problem size would be to process the log in blocks, and only guarantee that a message is unique within its own block.

– Alex
Nov 14 '18 at 20:49













This sounds bad. There is already alot of overhead in logging, why would you want to make it worse by tracking logged messages in the process that logs them? Why would you only want to see that first message? It looks like what you really need is a mainstream logger that provides filtering and/or to reevaluate how and what you are logging. If this is for some real world production project, I'd surely raise an eyebrow. Not enough context is provided to really suggest anything.

– Christopher Pisz
Nov 14 '18 at 21:01






This sounds bad. There is already alot of overhead in logging, why would you want to make it worse by tracking logged messages in the process that logs them? Why would you only want to see that first message? It looks like what you really need is a mainstream logger that provides filtering and/or to reevaluate how and what you are logging. If this is for some real world production project, I'd surely raise an eyebrow. Not enough context is provided to really suggest anything.

– Christopher Pisz
Nov 14 '18 at 21:01














Look up compression algorithms. You are compressing the output (logging) stream.

– Thomas Matthews
Nov 14 '18 at 21:16





Look up compression algorithms. You are compressing the output (logging) stream.

– Thomas Matthews
Nov 14 '18 at 21:16













Do you need to log date or time stamps?

– Thomas Matthews
Nov 14 '18 at 21:18





Do you need to log date or time stamps?

– Thomas Matthews
Nov 14 '18 at 21:18













The problem I would like to solve is the logs can, if the application is in a broken state, fill the logs with repeated sequence of messages. And so troubleshooting becomes impossible because the useful log messages have since been overwritten @ChristopherPisz

– RAZ_Muh_Taz
Nov 14 '18 at 21:54





The problem I would like to solve is the logs can, if the application is in a broken state, fill the logs with repeated sequence of messages. And so troubleshooting becomes impossible because the useful log messages have since been overwritten @ChristopherPisz

– RAZ_Muh_Taz
Nov 14 '18 at 21:54












0






active

oldest

votes











Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53308248%2fignoring-repeated-sets-of-logging%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53308248%2fignoring-repeated-sets-of-logging%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







這個網誌中的熱門文章

Barbados

How to read a connectionString WITH PROVIDER in .NET Core?

Node.js Script on GitHub Pages or Amazon S3