Can Beehive detect a Snowden-like actor?









up vote
55
down vote

favorite
8












In a seminar, one of the Authors of Beehive: Large-Scale Log Analysis for Detecting Suspicious Activity in Enterprise Networks said that this system can prevent actions like Snowden did.



From their articles' conclusions;




Beehive improves on signature-based approaches to detecting security incidents. Instead, it flags suspected security incidents in hosts based on behavioral analysis. In our evaluation, Beehive detected malware infections and policy violations that went otherwise unnoticed by existing, state-of-the-art security tools and personal.




Can Beehive or a similar system prevent Snowden type action?










share|improve this question



















  • 40




    Simple answer: No, most certainly not. Snowden was someone who had privileged access and had the authority and reason to mass-download content (he was a sysadmin).
    – forest
    Nov 7 at 11:29






  • 3




    But in the training case, they model everybody according to their behavior. So, after the training, a mass download will be a behavioral change that will produce an alert signal.
    – kelalaka
    Nov 7 at 11:31







  • 8




    Unless mass-downloading is 1) not common and 2) it's not possible to just throttle the download.
    – forest
    Nov 7 at 11:43







  • 13




    Why "mass download" is even considered suspicious. there are will be some sorts of constant "mass" downloads during everyday usage, was my first thought. What is mass download? 1 MB? 500 MB ? 5 GB? 500 GB? ...
    – Croll
    Nov 7 at 12:48






  • 8




    @Croll If your organisation has one million files, any one person probably doesn't need to access anywhere close to that many in order to do their job (most files won't be related to their work). If somebody starts trying to download all one million over a day or two, that's suspicious. Even a small percentage of that one million could be suspicious. 1% of one million is 10,000 files. How many people working for your organisation need to access 10,000 files over the span of 48 hours to do their job? Very few (if any).
    – Anthony Grist
    Nov 7 at 16:56














up vote
55
down vote

favorite
8












In a seminar, one of the Authors of Beehive: Large-Scale Log Analysis for Detecting Suspicious Activity in Enterprise Networks said that this system can prevent actions like Snowden did.



From their articles' conclusions;




Beehive improves on signature-based approaches to detecting security incidents. Instead, it flags suspected security incidents in hosts based on behavioral analysis. In our evaluation, Beehive detected malware infections and policy violations that went otherwise unnoticed by existing, state-of-the-art security tools and personal.




Can Beehive or a similar system prevent Snowden type action?










share|improve this question



















  • 40




    Simple answer: No, most certainly not. Snowden was someone who had privileged access and had the authority and reason to mass-download content (he was a sysadmin).
    – forest
    Nov 7 at 11:29






  • 3




    But in the training case, they model everybody according to their behavior. So, after the training, a mass download will be a behavioral change that will produce an alert signal.
    – kelalaka
    Nov 7 at 11:31







  • 8




    Unless mass-downloading is 1) not common and 2) it's not possible to just throttle the download.
    – forest
    Nov 7 at 11:43







  • 13




    Why "mass download" is even considered suspicious. there are will be some sorts of constant "mass" downloads during everyday usage, was my first thought. What is mass download? 1 MB? 500 MB ? 5 GB? 500 GB? ...
    – Croll
    Nov 7 at 12:48






  • 8




    @Croll If your organisation has one million files, any one person probably doesn't need to access anywhere close to that many in order to do their job (most files won't be related to their work). If somebody starts trying to download all one million over a day or two, that's suspicious. Even a small percentage of that one million could be suspicious. 1% of one million is 10,000 files. How many people working for your organisation need to access 10,000 files over the span of 48 hours to do their job? Very few (if any).
    – Anthony Grist
    Nov 7 at 16:56












up vote
55
down vote

favorite
8









up vote
55
down vote

favorite
8






8





In a seminar, one of the Authors of Beehive: Large-Scale Log Analysis for Detecting Suspicious Activity in Enterprise Networks said that this system can prevent actions like Snowden did.



From their articles' conclusions;




Beehive improves on signature-based approaches to detecting security incidents. Instead, it flags suspected security incidents in hosts based on behavioral analysis. In our evaluation, Beehive detected malware infections and policy violations that went otherwise unnoticed by existing, state-of-the-art security tools and personal.




Can Beehive or a similar system prevent Snowden type action?










share|improve this question















In a seminar, one of the Authors of Beehive: Large-Scale Log Analysis for Detecting Suspicious Activity in Enterprise Networks said that this system can prevent actions like Snowden did.



From their articles' conclusions;




Beehive improves on signature-based approaches to detecting security incidents. Instead, it flags suspected security incidents in hosts based on behavioral analysis. In our evaluation, Beehive detected malware infections and policy violations that went otherwise unnoticed by existing, state-of-the-art security tools and personal.




Can Beehive or a similar system prevent Snowden type action?







malware antimalware corporate-policy detection incident-response






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 7 at 12:52









Johnny

478113




478113










asked Nov 7 at 11:27









kelalaka

4711311




4711311







  • 40




    Simple answer: No, most certainly not. Snowden was someone who had privileged access and had the authority and reason to mass-download content (he was a sysadmin).
    – forest
    Nov 7 at 11:29






  • 3




    But in the training case, they model everybody according to their behavior. So, after the training, a mass download will be a behavioral change that will produce an alert signal.
    – kelalaka
    Nov 7 at 11:31







  • 8




    Unless mass-downloading is 1) not common and 2) it's not possible to just throttle the download.
    – forest
    Nov 7 at 11:43







  • 13




    Why "mass download" is even considered suspicious. there are will be some sorts of constant "mass" downloads during everyday usage, was my first thought. What is mass download? 1 MB? 500 MB ? 5 GB? 500 GB? ...
    – Croll
    Nov 7 at 12:48






  • 8




    @Croll If your organisation has one million files, any one person probably doesn't need to access anywhere close to that many in order to do their job (most files won't be related to their work). If somebody starts trying to download all one million over a day or two, that's suspicious. Even a small percentage of that one million could be suspicious. 1% of one million is 10,000 files. How many people working for your organisation need to access 10,000 files over the span of 48 hours to do their job? Very few (if any).
    – Anthony Grist
    Nov 7 at 16:56












  • 40




    Simple answer: No, most certainly not. Snowden was someone who had privileged access and had the authority and reason to mass-download content (he was a sysadmin).
    – forest
    Nov 7 at 11:29






  • 3




    But in the training case, they model everybody according to their behavior. So, after the training, a mass download will be a behavioral change that will produce an alert signal.
    – kelalaka
    Nov 7 at 11:31







  • 8




    Unless mass-downloading is 1) not common and 2) it's not possible to just throttle the download.
    – forest
    Nov 7 at 11:43







  • 13




    Why "mass download" is even considered suspicious. there are will be some sorts of constant "mass" downloads during everyday usage, was my first thought. What is mass download? 1 MB? 500 MB ? 5 GB? 500 GB? ...
    – Croll
    Nov 7 at 12:48






  • 8




    @Croll If your organisation has one million files, any one person probably doesn't need to access anywhere close to that many in order to do their job (most files won't be related to their work). If somebody starts trying to download all one million over a day or two, that's suspicious. Even a small percentage of that one million could be suspicious. 1% of one million is 10,000 files. How many people working for your organisation need to access 10,000 files over the span of 48 hours to do their job? Very few (if any).
    – Anthony Grist
    Nov 7 at 16:56







40




40




Simple answer: No, most certainly not. Snowden was someone who had privileged access and had the authority and reason to mass-download content (he was a sysadmin).
– forest
Nov 7 at 11:29




Simple answer: No, most certainly not. Snowden was someone who had privileged access and had the authority and reason to mass-download content (he was a sysadmin).
– forest
Nov 7 at 11:29




3




3




But in the training case, they model everybody according to their behavior. So, after the training, a mass download will be a behavioral change that will produce an alert signal.
– kelalaka
Nov 7 at 11:31





But in the training case, they model everybody according to their behavior. So, after the training, a mass download will be a behavioral change that will produce an alert signal.
– kelalaka
Nov 7 at 11:31





8




8




Unless mass-downloading is 1) not common and 2) it's not possible to just throttle the download.
– forest
Nov 7 at 11:43





Unless mass-downloading is 1) not common and 2) it's not possible to just throttle the download.
– forest
Nov 7 at 11:43





13




13




Why "mass download" is even considered suspicious. there are will be some sorts of constant "mass" downloads during everyday usage, was my first thought. What is mass download? 1 MB? 500 MB ? 5 GB? 500 GB? ...
– Croll
Nov 7 at 12:48




Why "mass download" is even considered suspicious. there are will be some sorts of constant "mass" downloads during everyday usage, was my first thought. What is mass download? 1 MB? 500 MB ? 5 GB? 500 GB? ...
– Croll
Nov 7 at 12:48




8




8




@Croll If your organisation has one million files, any one person probably doesn't need to access anywhere close to that many in order to do their job (most files won't be related to their work). If somebody starts trying to download all one million over a day or two, that's suspicious. Even a small percentage of that one million could be suspicious. 1% of one million is 10,000 files. How many people working for your organisation need to access 10,000 files over the span of 48 hours to do their job? Very few (if any).
– Anthony Grist
Nov 7 at 16:56




@Croll If your organisation has one million files, any one person probably doesn't need to access anywhere close to that many in order to do their job (most files won't be related to their work). If somebody starts trying to download all one million over a day or two, that's suspicious. Even a small percentage of that one million could be suspicious. 1% of one million is 10,000 files. How many people working for your organisation need to access 10,000 files over the span of 48 hours to do their job? Very few (if any).
– Anthony Grist
Nov 7 at 16:56










5 Answers
5






active

oldest

votes

















up vote
129
down vote













A backup operator will have the permission and behavioral markers of someone that moves lots of data around. Like any sysadmin where there's no dedicated backup operator in place.



Snowden was a sysadmin. He would knew all the protection protocols in place. He could just impersonate anyone, from any area, download things, impersonate the next one, and keep doing that.



If it's common knowledge that there's no bulletproof protection against a dedicated attacker, imagine a trusted internal dedicated attacker with sysadmin privileges.






share|improve this answer
















  • 159




    TL;dr: you can't protect yourself against yourself.
    – Braiam
    Nov 7 at 13:14






  • 1




    Comments are not for extended discussion; this conversation has been moved to chat.
    – Jeff Ferland
    2 days ago

















up vote
19
down vote













Anomaly detection systems like Beehive make it easier than before to dig through lots of data and detect suspicious behavior. This means that it is possible for an analyst to focus on the more relevant data, process more data in shorter time and also use more detailed input data for the analysis. This way the chance is higher than before that somebody can detect unwanted behavior.



It is claimed (and I have no reason to doubt this claim) in the Beehive paper that the system can detect more incidents than the usually used systems - but it is not claimed that the system can detect every incident or even how much of all incidents it could detect. Thus, it might be that other systems only detect 10% of all incidents and Beehive detects 20%, which is good but not really satisfactory.



Could such a system detect somebody like Snowden? This depends a lot on how much and what kind and what detail of data is collected for analysis, how strict the existing security policies are in the first place so that policy violations can be logged and how much the illegal (as seen by the NSA) activities of Snowden differed from his usual work activity. The more it differs the more likely it can be detected by anomaly detection system. But the more similar illegal and legal activities are in terms of the logged data, the less likely is that illegal activities will be reported as anomaly.



In other words: it could help to detect some Snowden type actions but it will not detect all Snowden type actions. And preventing such actions would be even more difficult, more likely is a more early detection after some harm was already done and thus limiting the impact.






share|improve this answer


















  • 2




    And the false positives... Wow, imagine you got promoted to a System Admin position and then suddenly you have federal agents show up at your door...
    – Nelson
    2 days ago






  • 6




    @Nelson Federal agents will be at your door long before that if you're in the running for a sysadmin position. Get ready for looooads of profiling and interviews.
    – Lightness Races in Orbit
    2 days ago

















up vote
12
down vote













Snowden's intent was data exfiltration and he was also a system admin. So, he had access to large amounts of data normal users didn't and would have a different pattern of how he interacts with the network. If Beehive was in place, it may have logged that he was doing something but anyone who has an intent of data exfiltration would've known how to bypass alerting: make your pattern of data exfiltration "normal" from the time the system started getting trained and it wouldn't be flagged as anomalous activity. Snowden could've had pattern of dumping 16GB a day to a USB thumb drive but as long as he didn't do sudden change in his techniques, Beehive wouldn't have flagged him.



I'm working on some custom ways at work to detect this kind of pattern. But, right now I don't know of anything automated that'll do a good job.






share|improve this answer








New contributor




RG1 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
























    up vote
    8
    down vote













    No it can't.



    And the quote that you pulled clearly explained why not, and how people came to claim that it could.



    What Beehive might be able to do is tell you that a Snowden-style attack has taken place. (even thoguh goin by @ThoriumBR a SNOWDEN would not have been prevented)



    What you (or that guy) claims is that it could PREVENT such an attack, which is far, far different.
    Beehive is crawling logs and (maybe, didn't read too much) combining that with some advanced analysis.
    Which means that even if your analysis-and-flagging system is running in real-time it would probably be too late.



    [Just imagine where Beehive comes in:



    Suspicious action -> security program -> log -> beehive extracts data -> beehive analysis -> flag thrown -> intervention?



    This is far too late (and it assumes that the logs are evaluated in real-time]



    Logs are for retroactive investigation, not real-time intervention.



    What you could do is produce a pseudo-log for any action, have that analysed by Beehive and only upon being greenlit the action is executed.
    The enormous overhead and noticeable delay would make that approach a really hard sell to any manager though. [also, not using logs but build in evaluating-mechanisms in your platform would be far better]






    share|improve this answer








    New contributor




    Hobbamok is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.













    • 6




      And the false positives. Job promotions will be a nightmare, as will department changes.
      – Nelson
      2 days ago











    • As a sysadmin, could one simple alter the logs?
      – paulj
      2 days ago










    • @paulj Not if the logs are sent to a remote server or forward-sealed, but that only applies to logs that were already generated. A sysadmin could, of course, forge any subsequent logs.
      – forest
      yesterday










    • Incidentally (and unrelatedly), modern file systems do have pseudo-logs, which are finalized much more quickly than something like Beehive could match
      – jpaugh
      yesterday


















    up vote
    2
    down vote













    First of all, there is a very important distinction between being able to detect a "Snowden-like" actor and being able to prevent one. As far as I have seen, Beehive makes no claims about preventing one, but rather seems to promise the ability to give you alerts that suspicious activity is happening in your network. Sure, not as good, but still considered a "holy grail" in some research communities.



    With that said, I'm extremely doubtful that Beehive is able to meet those expectations. Machine learning can do quite well at extracting complex patterns from large piles of data with reliable identities. For example, differentiating between pictures of cats and dogs is extremely reliable; we can all do it 99+% of the time, yet if I had to tell what's the exact algorithm for taking in 100x100 pixels and determining cat vs dog, I have no idea how I would do that. But I can supply you with 100,000 of such images and let ML methods figure out a rule that reliably differentiates between the two based on the values of 100x100 pixels. If I do things right, the rules created by ML should even work on new images of cats and dogs, assuming no huge changes in the new data (i.e., if I only used labs and tabby cats in the training data, then try to get it to identify a terrier...good luck). That's ML's strength.



    Determining "suspicious behavior" is a much more difficult issue. We don't have 100,000's of samples of confirmed bad behavior, and we don't even really have 100,000's of samples of confirmed good behavior! Worse yet, what was a good ML method that worked yesterday doesn't work today; unlike cats and dogs in photos, adversaries try really hard to trick you. Most people I know working on ML for cyber security have accepted that the idea of purely automated detection is beyond our grasp at the moment, but perhaps we can build tools to automate very specific repetitive tasks that a security analyst needs to do over and over, thus making them more efficient.



    With that said, the authors of Beehive seem to have skipped that lesson and claim that they've solved this problem. I'm highly suspicious of the performance, especially given that the methods they suggest are the first one a ML researcher might think to try and have routinely been rejected as not useful. For example, they suggest using PCA to identify outliers in logs. This, and variations of it, has been tried 100s of times and the result is always that the security analyst shuts off the "automated detection" because they get so many false positives that it costs way more time than it saves.



    Of course, in all these methods, the devil is the details and the details of these types of methods never really get exposed in published work ("we used PCA to look for outliers in server logs" is an extremely vague statement). It's always possible that they have some super clever way of preprocessing the data before applying their methods that didn't make it into the paper. But I'd be willing to bet my right arm that no user of Beehive will be able to reliably differentiate between "Snowden-like" behavior and non-adversarial real world use of a network in real time.






    share|improve this answer










    New contributor




    Cliff AB is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.

















      Your Answer








      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "162"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      convertImagesToLinks: false,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      imageUploader:
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      ,
      noCode: true, onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );













       

      draft saved


      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsecurity.stackexchange.com%2fquestions%2f197169%2fcan-beehive-detect-a-snowden-like-actor%23new-answer', 'question_page');

      );

      Post as a guest






























      5 Answers
      5






      active

      oldest

      votes








      5 Answers
      5






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      129
      down vote













      A backup operator will have the permission and behavioral markers of someone that moves lots of data around. Like any sysadmin where there's no dedicated backup operator in place.



      Snowden was a sysadmin. He would knew all the protection protocols in place. He could just impersonate anyone, from any area, download things, impersonate the next one, and keep doing that.



      If it's common knowledge that there's no bulletproof protection against a dedicated attacker, imagine a trusted internal dedicated attacker with sysadmin privileges.






      share|improve this answer
















      • 159




        TL;dr: you can't protect yourself against yourself.
        – Braiam
        Nov 7 at 13:14






      • 1




        Comments are not for extended discussion; this conversation has been moved to chat.
        – Jeff Ferland
        2 days ago














      up vote
      129
      down vote













      A backup operator will have the permission and behavioral markers of someone that moves lots of data around. Like any sysadmin where there's no dedicated backup operator in place.



      Snowden was a sysadmin. He would knew all the protection protocols in place. He could just impersonate anyone, from any area, download things, impersonate the next one, and keep doing that.



      If it's common knowledge that there's no bulletproof protection against a dedicated attacker, imagine a trusted internal dedicated attacker with sysadmin privileges.






      share|improve this answer
















      • 159




        TL;dr: you can't protect yourself against yourself.
        – Braiam
        Nov 7 at 13:14






      • 1




        Comments are not for extended discussion; this conversation has been moved to chat.
        – Jeff Ferland
        2 days ago












      up vote
      129
      down vote










      up vote
      129
      down vote









      A backup operator will have the permission and behavioral markers of someone that moves lots of data around. Like any sysadmin where there's no dedicated backup operator in place.



      Snowden was a sysadmin. He would knew all the protection protocols in place. He could just impersonate anyone, from any area, download things, impersonate the next one, and keep doing that.



      If it's common knowledge that there's no bulletproof protection against a dedicated attacker, imagine a trusted internal dedicated attacker with sysadmin privileges.






      share|improve this answer












      A backup operator will have the permission and behavioral markers of someone that moves lots of data around. Like any sysadmin where there's no dedicated backup operator in place.



      Snowden was a sysadmin. He would knew all the protection protocols in place. He could just impersonate anyone, from any area, download things, impersonate the next one, and keep doing that.



      If it's common knowledge that there's no bulletproof protection against a dedicated attacker, imagine a trusted internal dedicated attacker with sysadmin privileges.







      share|improve this answer












      share|improve this answer



      share|improve this answer










      answered Nov 7 at 11:41









      ThoriumBR

      20k54868




      20k54868







      • 159




        TL;dr: you can't protect yourself against yourself.
        – Braiam
        Nov 7 at 13:14






      • 1




        Comments are not for extended discussion; this conversation has been moved to chat.
        – Jeff Ferland
        2 days ago












      • 159




        TL;dr: you can't protect yourself against yourself.
        – Braiam
        Nov 7 at 13:14






      • 1




        Comments are not for extended discussion; this conversation has been moved to chat.
        – Jeff Ferland
        2 days ago







      159




      159




      TL;dr: you can't protect yourself against yourself.
      – Braiam
      Nov 7 at 13:14




      TL;dr: you can't protect yourself against yourself.
      – Braiam
      Nov 7 at 13:14




      1




      1




      Comments are not for extended discussion; this conversation has been moved to chat.
      – Jeff Ferland
      2 days ago




      Comments are not for extended discussion; this conversation has been moved to chat.
      – Jeff Ferland
      2 days ago












      up vote
      19
      down vote













      Anomaly detection systems like Beehive make it easier than before to dig through lots of data and detect suspicious behavior. This means that it is possible for an analyst to focus on the more relevant data, process more data in shorter time and also use more detailed input data for the analysis. This way the chance is higher than before that somebody can detect unwanted behavior.



      It is claimed (and I have no reason to doubt this claim) in the Beehive paper that the system can detect more incidents than the usually used systems - but it is not claimed that the system can detect every incident or even how much of all incidents it could detect. Thus, it might be that other systems only detect 10% of all incidents and Beehive detects 20%, which is good but not really satisfactory.



      Could such a system detect somebody like Snowden? This depends a lot on how much and what kind and what detail of data is collected for analysis, how strict the existing security policies are in the first place so that policy violations can be logged and how much the illegal (as seen by the NSA) activities of Snowden differed from his usual work activity. The more it differs the more likely it can be detected by anomaly detection system. But the more similar illegal and legal activities are in terms of the logged data, the less likely is that illegal activities will be reported as anomaly.



      In other words: it could help to detect some Snowden type actions but it will not detect all Snowden type actions. And preventing such actions would be even more difficult, more likely is a more early detection after some harm was already done and thus limiting the impact.






      share|improve this answer


















      • 2




        And the false positives... Wow, imagine you got promoted to a System Admin position and then suddenly you have federal agents show up at your door...
        – Nelson
        2 days ago






      • 6




        @Nelson Federal agents will be at your door long before that if you're in the running for a sysadmin position. Get ready for looooads of profiling and interviews.
        – Lightness Races in Orbit
        2 days ago














      up vote
      19
      down vote













      Anomaly detection systems like Beehive make it easier than before to dig through lots of data and detect suspicious behavior. This means that it is possible for an analyst to focus on the more relevant data, process more data in shorter time and also use more detailed input data for the analysis. This way the chance is higher than before that somebody can detect unwanted behavior.



      It is claimed (and I have no reason to doubt this claim) in the Beehive paper that the system can detect more incidents than the usually used systems - but it is not claimed that the system can detect every incident or even how much of all incidents it could detect. Thus, it might be that other systems only detect 10% of all incidents and Beehive detects 20%, which is good but not really satisfactory.



      Could such a system detect somebody like Snowden? This depends a lot on how much and what kind and what detail of data is collected for analysis, how strict the existing security policies are in the first place so that policy violations can be logged and how much the illegal (as seen by the NSA) activities of Snowden differed from his usual work activity. The more it differs the more likely it can be detected by anomaly detection system. But the more similar illegal and legal activities are in terms of the logged data, the less likely is that illegal activities will be reported as anomaly.



      In other words: it could help to detect some Snowden type actions but it will not detect all Snowden type actions. And preventing such actions would be even more difficult, more likely is a more early detection after some harm was already done and thus limiting the impact.






      share|improve this answer


















      • 2




        And the false positives... Wow, imagine you got promoted to a System Admin position and then suddenly you have federal agents show up at your door...
        – Nelson
        2 days ago






      • 6




        @Nelson Federal agents will be at your door long before that if you're in the running for a sysadmin position. Get ready for looooads of profiling and interviews.
        – Lightness Races in Orbit
        2 days ago












      up vote
      19
      down vote










      up vote
      19
      down vote









      Anomaly detection systems like Beehive make it easier than before to dig through lots of data and detect suspicious behavior. This means that it is possible for an analyst to focus on the more relevant data, process more data in shorter time and also use more detailed input data for the analysis. This way the chance is higher than before that somebody can detect unwanted behavior.



      It is claimed (and I have no reason to doubt this claim) in the Beehive paper that the system can detect more incidents than the usually used systems - but it is not claimed that the system can detect every incident or even how much of all incidents it could detect. Thus, it might be that other systems only detect 10% of all incidents and Beehive detects 20%, which is good but not really satisfactory.



      Could such a system detect somebody like Snowden? This depends a lot on how much and what kind and what detail of data is collected for analysis, how strict the existing security policies are in the first place so that policy violations can be logged and how much the illegal (as seen by the NSA) activities of Snowden differed from his usual work activity. The more it differs the more likely it can be detected by anomaly detection system. But the more similar illegal and legal activities are in terms of the logged data, the less likely is that illegal activities will be reported as anomaly.



      In other words: it could help to detect some Snowden type actions but it will not detect all Snowden type actions. And preventing such actions would be even more difficult, more likely is a more early detection after some harm was already done and thus limiting the impact.






      share|improve this answer














      Anomaly detection systems like Beehive make it easier than before to dig through lots of data and detect suspicious behavior. This means that it is possible for an analyst to focus on the more relevant data, process more data in shorter time and also use more detailed input data for the analysis. This way the chance is higher than before that somebody can detect unwanted behavior.



      It is claimed (and I have no reason to doubt this claim) in the Beehive paper that the system can detect more incidents than the usually used systems - but it is not claimed that the system can detect every incident or even how much of all incidents it could detect. Thus, it might be that other systems only detect 10% of all incidents and Beehive detects 20%, which is good but not really satisfactory.



      Could such a system detect somebody like Snowden? This depends a lot on how much and what kind and what detail of data is collected for analysis, how strict the existing security policies are in the first place so that policy violations can be logged and how much the illegal (as seen by the NSA) activities of Snowden differed from his usual work activity. The more it differs the more likely it can be detected by anomaly detection system. But the more similar illegal and legal activities are in terms of the logged data, the less likely is that illegal activities will be reported as anomaly.



      In other words: it could help to detect some Snowden type actions but it will not detect all Snowden type actions. And preventing such actions would be even more difficult, more likely is a more early detection after some harm was already done and thus limiting the impact.







      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited Nov 7 at 12:12

























      answered Nov 7 at 12:02









      Steffen Ullrich

      110k12191256




      110k12191256







      • 2




        And the false positives... Wow, imagine you got promoted to a System Admin position and then suddenly you have federal agents show up at your door...
        – Nelson
        2 days ago






      • 6




        @Nelson Federal agents will be at your door long before that if you're in the running for a sysadmin position. Get ready for looooads of profiling and interviews.
        – Lightness Races in Orbit
        2 days ago












      • 2




        And the false positives... Wow, imagine you got promoted to a System Admin position and then suddenly you have federal agents show up at your door...
        – Nelson
        2 days ago






      • 6




        @Nelson Federal agents will be at your door long before that if you're in the running for a sysadmin position. Get ready for looooads of profiling and interviews.
        – Lightness Races in Orbit
        2 days ago







      2




      2




      And the false positives... Wow, imagine you got promoted to a System Admin position and then suddenly you have federal agents show up at your door...
      – Nelson
      2 days ago




      And the false positives... Wow, imagine you got promoted to a System Admin position and then suddenly you have federal agents show up at your door...
      – Nelson
      2 days ago




      6




      6




      @Nelson Federal agents will be at your door long before that if you're in the running for a sysadmin position. Get ready for looooads of profiling and interviews.
      – Lightness Races in Orbit
      2 days ago




      @Nelson Federal agents will be at your door long before that if you're in the running for a sysadmin position. Get ready for looooads of profiling and interviews.
      – Lightness Races in Orbit
      2 days ago










      up vote
      12
      down vote













      Snowden's intent was data exfiltration and he was also a system admin. So, he had access to large amounts of data normal users didn't and would have a different pattern of how he interacts with the network. If Beehive was in place, it may have logged that he was doing something but anyone who has an intent of data exfiltration would've known how to bypass alerting: make your pattern of data exfiltration "normal" from the time the system started getting trained and it wouldn't be flagged as anomalous activity. Snowden could've had pattern of dumping 16GB a day to a USB thumb drive but as long as he didn't do sudden change in his techniques, Beehive wouldn't have flagged him.



      I'm working on some custom ways at work to detect this kind of pattern. But, right now I don't know of anything automated that'll do a good job.






      share|improve this answer








      New contributor




      RG1 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





















        up vote
        12
        down vote













        Snowden's intent was data exfiltration and he was also a system admin. So, he had access to large amounts of data normal users didn't and would have a different pattern of how he interacts with the network. If Beehive was in place, it may have logged that he was doing something but anyone who has an intent of data exfiltration would've known how to bypass alerting: make your pattern of data exfiltration "normal" from the time the system started getting trained and it wouldn't be flagged as anomalous activity. Snowden could've had pattern of dumping 16GB a day to a USB thumb drive but as long as he didn't do sudden change in his techniques, Beehive wouldn't have flagged him.



        I'm working on some custom ways at work to detect this kind of pattern. But, right now I don't know of anything automated that'll do a good job.






        share|improve this answer








        New contributor




        RG1 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.



















          up vote
          12
          down vote










          up vote
          12
          down vote









          Snowden's intent was data exfiltration and he was also a system admin. So, he had access to large amounts of data normal users didn't and would have a different pattern of how he interacts with the network. If Beehive was in place, it may have logged that he was doing something but anyone who has an intent of data exfiltration would've known how to bypass alerting: make your pattern of data exfiltration "normal" from the time the system started getting trained and it wouldn't be flagged as anomalous activity. Snowden could've had pattern of dumping 16GB a day to a USB thumb drive but as long as he didn't do sudden change in his techniques, Beehive wouldn't have flagged him.



          I'm working on some custom ways at work to detect this kind of pattern. But, right now I don't know of anything automated that'll do a good job.






          share|improve this answer








          New contributor




          RG1 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.









          Snowden's intent was data exfiltration and he was also a system admin. So, he had access to large amounts of data normal users didn't and would have a different pattern of how he interacts with the network. If Beehive was in place, it may have logged that he was doing something but anyone who has an intent of data exfiltration would've known how to bypass alerting: make your pattern of data exfiltration "normal" from the time the system started getting trained and it wouldn't be flagged as anomalous activity. Snowden could've had pattern of dumping 16GB a day to a USB thumb drive but as long as he didn't do sudden change in his techniques, Beehive wouldn't have flagged him.



          I'm working on some custom ways at work to detect this kind of pattern. But, right now I don't know of anything automated that'll do a good job.







          share|improve this answer








          New contributor




          RG1 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.









          share|improve this answer



          share|improve this answer






          New contributor




          RG1 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.









          answered Nov 7 at 14:54









          RG1

          1312




          1312




          New contributor




          RG1 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.





          New contributor





          RG1 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.






          RG1 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.




















              up vote
              8
              down vote













              No it can't.



              And the quote that you pulled clearly explained why not, and how people came to claim that it could.



              What Beehive might be able to do is tell you that a Snowden-style attack has taken place. (even thoguh goin by @ThoriumBR a SNOWDEN would not have been prevented)



              What you (or that guy) claims is that it could PREVENT such an attack, which is far, far different.
              Beehive is crawling logs and (maybe, didn't read too much) combining that with some advanced analysis.
              Which means that even if your analysis-and-flagging system is running in real-time it would probably be too late.



              [Just imagine where Beehive comes in:



              Suspicious action -> security program -> log -> beehive extracts data -> beehive analysis -> flag thrown -> intervention?



              This is far too late (and it assumes that the logs are evaluated in real-time]



              Logs are for retroactive investigation, not real-time intervention.



              What you could do is produce a pseudo-log for any action, have that analysed by Beehive and only upon being greenlit the action is executed.
              The enormous overhead and noticeable delay would make that approach a really hard sell to any manager though. [also, not using logs but build in evaluating-mechanisms in your platform would be far better]






              share|improve this answer








              New contributor




              Hobbamok is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.













              • 6




                And the false positives. Job promotions will be a nightmare, as will department changes.
                – Nelson
                2 days ago











              • As a sysadmin, could one simple alter the logs?
                – paulj
                2 days ago










              • @paulj Not if the logs are sent to a remote server or forward-sealed, but that only applies to logs that were already generated. A sysadmin could, of course, forge any subsequent logs.
                – forest
                yesterday










              • Incidentally (and unrelatedly), modern file systems do have pseudo-logs, which are finalized much more quickly than something like Beehive could match
                – jpaugh
                yesterday















              up vote
              8
              down vote













              No it can't.



              And the quote that you pulled clearly explained why not, and how people came to claim that it could.



              What Beehive might be able to do is tell you that a Snowden-style attack has taken place. (even thoguh goin by @ThoriumBR a SNOWDEN would not have been prevented)



              What you (or that guy) claims is that it could PREVENT such an attack, which is far, far different.
              Beehive is crawling logs and (maybe, didn't read too much) combining that with some advanced analysis.
              Which means that even if your analysis-and-flagging system is running in real-time it would probably be too late.



              [Just imagine where Beehive comes in:



              Suspicious action -> security program -> log -> beehive extracts data -> beehive analysis -> flag thrown -> intervention?



              This is far too late (and it assumes that the logs are evaluated in real-time]



              Logs are for retroactive investigation, not real-time intervention.



              What you could do is produce a pseudo-log for any action, have that analysed by Beehive and only upon being greenlit the action is executed.
              The enormous overhead and noticeable delay would make that approach a really hard sell to any manager though. [also, not using logs but build in evaluating-mechanisms in your platform would be far better]






              share|improve this answer








              New contributor




              Hobbamok is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.













              • 6




                And the false positives. Job promotions will be a nightmare, as will department changes.
                – Nelson
                2 days ago











              • As a sysadmin, could one simple alter the logs?
                – paulj
                2 days ago










              • @paulj Not if the logs are sent to a remote server or forward-sealed, but that only applies to logs that were already generated. A sysadmin could, of course, forge any subsequent logs.
                – forest
                yesterday










              • Incidentally (and unrelatedly), modern file systems do have pseudo-logs, which are finalized much more quickly than something like Beehive could match
                – jpaugh
                yesterday













              up vote
              8
              down vote










              up vote
              8
              down vote









              No it can't.



              And the quote that you pulled clearly explained why not, and how people came to claim that it could.



              What Beehive might be able to do is tell you that a Snowden-style attack has taken place. (even thoguh goin by @ThoriumBR a SNOWDEN would not have been prevented)



              What you (or that guy) claims is that it could PREVENT such an attack, which is far, far different.
              Beehive is crawling logs and (maybe, didn't read too much) combining that with some advanced analysis.
              Which means that even if your analysis-and-flagging system is running in real-time it would probably be too late.



              [Just imagine where Beehive comes in:



              Suspicious action -> security program -> log -> beehive extracts data -> beehive analysis -> flag thrown -> intervention?



              This is far too late (and it assumes that the logs are evaluated in real-time]



              Logs are for retroactive investigation, not real-time intervention.



              What you could do is produce a pseudo-log for any action, have that analysed by Beehive and only upon being greenlit the action is executed.
              The enormous overhead and noticeable delay would make that approach a really hard sell to any manager though. [also, not using logs but build in evaluating-mechanisms in your platform would be far better]






              share|improve this answer








              New contributor




              Hobbamok is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.









              No it can't.



              And the quote that you pulled clearly explained why not, and how people came to claim that it could.



              What Beehive might be able to do is tell you that a Snowden-style attack has taken place. (even thoguh goin by @ThoriumBR a SNOWDEN would not have been prevented)



              What you (or that guy) claims is that it could PREVENT such an attack, which is far, far different.
              Beehive is crawling logs and (maybe, didn't read too much) combining that with some advanced analysis.
              Which means that even if your analysis-and-flagging system is running in real-time it would probably be too late.



              [Just imagine where Beehive comes in:



              Suspicious action -> security program -> log -> beehive extracts data -> beehive analysis -> flag thrown -> intervention?



              This is far too late (and it assumes that the logs are evaluated in real-time]



              Logs are for retroactive investigation, not real-time intervention.



              What you could do is produce a pseudo-log for any action, have that analysed by Beehive and only upon being greenlit the action is executed.
              The enormous overhead and noticeable delay would make that approach a really hard sell to any manager though. [also, not using logs but build in evaluating-mechanisms in your platform would be far better]







              share|improve this answer








              New contributor




              Hobbamok is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.









              share|improve this answer



              share|improve this answer






              New contributor




              Hobbamok is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.









              answered Nov 7 at 14:20









              Hobbamok

              1813




              1813




              New contributor




              Hobbamok is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.





              New contributor





              Hobbamok is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.






              Hobbamok is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.







              • 6




                And the false positives. Job promotions will be a nightmare, as will department changes.
                – Nelson
                2 days ago











              • As a sysadmin, could one simple alter the logs?
                – paulj
                2 days ago










              • @paulj Not if the logs are sent to a remote server or forward-sealed, but that only applies to logs that were already generated. A sysadmin could, of course, forge any subsequent logs.
                – forest
                yesterday










              • Incidentally (and unrelatedly), modern file systems do have pseudo-logs, which are finalized much more quickly than something like Beehive could match
                – jpaugh
                yesterday













              • 6




                And the false positives. Job promotions will be a nightmare, as will department changes.
                – Nelson
                2 days ago











              • As a sysadmin, could one simple alter the logs?
                – paulj
                2 days ago










              • @paulj Not if the logs are sent to a remote server or forward-sealed, but that only applies to logs that were already generated. A sysadmin could, of course, forge any subsequent logs.
                – forest
                yesterday










              • Incidentally (and unrelatedly), modern file systems do have pseudo-logs, which are finalized much more quickly than something like Beehive could match
                – jpaugh
                yesterday








              6




              6




              And the false positives. Job promotions will be a nightmare, as will department changes.
              – Nelson
              2 days ago





              And the false positives. Job promotions will be a nightmare, as will department changes.
              – Nelson
              2 days ago













              As a sysadmin, could one simple alter the logs?
              – paulj
              2 days ago




              As a sysadmin, could one simple alter the logs?
              – paulj
              2 days ago












              @paulj Not if the logs are sent to a remote server or forward-sealed, but that only applies to logs that were already generated. A sysadmin could, of course, forge any subsequent logs.
              – forest
              yesterday




              @paulj Not if the logs are sent to a remote server or forward-sealed, but that only applies to logs that were already generated. A sysadmin could, of course, forge any subsequent logs.
              – forest
              yesterday












              Incidentally (and unrelatedly), modern file systems do have pseudo-logs, which are finalized much more quickly than something like Beehive could match
              – jpaugh
              yesterday





              Incidentally (and unrelatedly), modern file systems do have pseudo-logs, which are finalized much more quickly than something like Beehive could match
              – jpaugh
              yesterday











              up vote
              2
              down vote













              First of all, there is a very important distinction between being able to detect a "Snowden-like" actor and being able to prevent one. As far as I have seen, Beehive makes no claims about preventing one, but rather seems to promise the ability to give you alerts that suspicious activity is happening in your network. Sure, not as good, but still considered a "holy grail" in some research communities.



              With that said, I'm extremely doubtful that Beehive is able to meet those expectations. Machine learning can do quite well at extracting complex patterns from large piles of data with reliable identities. For example, differentiating between pictures of cats and dogs is extremely reliable; we can all do it 99+% of the time, yet if I had to tell what's the exact algorithm for taking in 100x100 pixels and determining cat vs dog, I have no idea how I would do that. But I can supply you with 100,000 of such images and let ML methods figure out a rule that reliably differentiates between the two based on the values of 100x100 pixels. If I do things right, the rules created by ML should even work on new images of cats and dogs, assuming no huge changes in the new data (i.e., if I only used labs and tabby cats in the training data, then try to get it to identify a terrier...good luck). That's ML's strength.



              Determining "suspicious behavior" is a much more difficult issue. We don't have 100,000's of samples of confirmed bad behavior, and we don't even really have 100,000's of samples of confirmed good behavior! Worse yet, what was a good ML method that worked yesterday doesn't work today; unlike cats and dogs in photos, adversaries try really hard to trick you. Most people I know working on ML for cyber security have accepted that the idea of purely automated detection is beyond our grasp at the moment, but perhaps we can build tools to automate very specific repetitive tasks that a security analyst needs to do over and over, thus making them more efficient.



              With that said, the authors of Beehive seem to have skipped that lesson and claim that they've solved this problem. I'm highly suspicious of the performance, especially given that the methods they suggest are the first one a ML researcher might think to try and have routinely been rejected as not useful. For example, they suggest using PCA to identify outliers in logs. This, and variations of it, has been tried 100s of times and the result is always that the security analyst shuts off the "automated detection" because they get so many false positives that it costs way more time than it saves.



              Of course, in all these methods, the devil is the details and the details of these types of methods never really get exposed in published work ("we used PCA to look for outliers in server logs" is an extremely vague statement). It's always possible that they have some super clever way of preprocessing the data before applying their methods that didn't make it into the paper. But I'd be willing to bet my right arm that no user of Beehive will be able to reliably differentiate between "Snowden-like" behavior and non-adversarial real world use of a network in real time.






              share|improve this answer










              New contributor




              Cliff AB is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.





















                up vote
                2
                down vote













                First of all, there is a very important distinction between being able to detect a "Snowden-like" actor and being able to prevent one. As far as I have seen, Beehive makes no claims about preventing one, but rather seems to promise the ability to give you alerts that suspicious activity is happening in your network. Sure, not as good, but still considered a "holy grail" in some research communities.



                With that said, I'm extremely doubtful that Beehive is able to meet those expectations. Machine learning can do quite well at extracting complex patterns from large piles of data with reliable identities. For example, differentiating between pictures of cats and dogs is extremely reliable; we can all do it 99+% of the time, yet if I had to tell what's the exact algorithm for taking in 100x100 pixels and determining cat vs dog, I have no idea how I would do that. But I can supply you with 100,000 of such images and let ML methods figure out a rule that reliably differentiates between the two based on the values of 100x100 pixels. If I do things right, the rules created by ML should even work on new images of cats and dogs, assuming no huge changes in the new data (i.e., if I only used labs and tabby cats in the training data, then try to get it to identify a terrier...good luck). That's ML's strength.



                Determining "suspicious behavior" is a much more difficult issue. We don't have 100,000's of samples of confirmed bad behavior, and we don't even really have 100,000's of samples of confirmed good behavior! Worse yet, what was a good ML method that worked yesterday doesn't work today; unlike cats and dogs in photos, adversaries try really hard to trick you. Most people I know working on ML for cyber security have accepted that the idea of purely automated detection is beyond our grasp at the moment, but perhaps we can build tools to automate very specific repetitive tasks that a security analyst needs to do over and over, thus making them more efficient.



                With that said, the authors of Beehive seem to have skipped that lesson and claim that they've solved this problem. I'm highly suspicious of the performance, especially given that the methods they suggest are the first one a ML researcher might think to try and have routinely been rejected as not useful. For example, they suggest using PCA to identify outliers in logs. This, and variations of it, has been tried 100s of times and the result is always that the security analyst shuts off the "automated detection" because they get so many false positives that it costs way more time than it saves.



                Of course, in all these methods, the devil is the details and the details of these types of methods never really get exposed in published work ("we used PCA to look for outliers in server logs" is an extremely vague statement). It's always possible that they have some super clever way of preprocessing the data before applying their methods that didn't make it into the paper. But I'd be willing to bet my right arm that no user of Beehive will be able to reliably differentiate between "Snowden-like" behavior and non-adversarial real world use of a network in real time.






                share|improve this answer










                New contributor




                Cliff AB is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.



















                  up vote
                  2
                  down vote










                  up vote
                  2
                  down vote









                  First of all, there is a very important distinction between being able to detect a "Snowden-like" actor and being able to prevent one. As far as I have seen, Beehive makes no claims about preventing one, but rather seems to promise the ability to give you alerts that suspicious activity is happening in your network. Sure, not as good, but still considered a "holy grail" in some research communities.



                  With that said, I'm extremely doubtful that Beehive is able to meet those expectations. Machine learning can do quite well at extracting complex patterns from large piles of data with reliable identities. For example, differentiating between pictures of cats and dogs is extremely reliable; we can all do it 99+% of the time, yet if I had to tell what's the exact algorithm for taking in 100x100 pixels and determining cat vs dog, I have no idea how I would do that. But I can supply you with 100,000 of such images and let ML methods figure out a rule that reliably differentiates between the two based on the values of 100x100 pixels. If I do things right, the rules created by ML should even work on new images of cats and dogs, assuming no huge changes in the new data (i.e., if I only used labs and tabby cats in the training data, then try to get it to identify a terrier...good luck). That's ML's strength.



                  Determining "suspicious behavior" is a much more difficult issue. We don't have 100,000's of samples of confirmed bad behavior, and we don't even really have 100,000's of samples of confirmed good behavior! Worse yet, what was a good ML method that worked yesterday doesn't work today; unlike cats and dogs in photos, adversaries try really hard to trick you. Most people I know working on ML for cyber security have accepted that the idea of purely automated detection is beyond our grasp at the moment, but perhaps we can build tools to automate very specific repetitive tasks that a security analyst needs to do over and over, thus making them more efficient.



                  With that said, the authors of Beehive seem to have skipped that lesson and claim that they've solved this problem. I'm highly suspicious of the performance, especially given that the methods they suggest are the first one a ML researcher might think to try and have routinely been rejected as not useful. For example, they suggest using PCA to identify outliers in logs. This, and variations of it, has been tried 100s of times and the result is always that the security analyst shuts off the "automated detection" because they get so many false positives that it costs way more time than it saves.



                  Of course, in all these methods, the devil is the details and the details of these types of methods never really get exposed in published work ("we used PCA to look for outliers in server logs" is an extremely vague statement). It's always possible that they have some super clever way of preprocessing the data before applying their methods that didn't make it into the paper. But I'd be willing to bet my right arm that no user of Beehive will be able to reliably differentiate between "Snowden-like" behavior and non-adversarial real world use of a network in real time.






                  share|improve this answer










                  New contributor




                  Cliff AB is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.









                  First of all, there is a very important distinction between being able to detect a "Snowden-like" actor and being able to prevent one. As far as I have seen, Beehive makes no claims about preventing one, but rather seems to promise the ability to give you alerts that suspicious activity is happening in your network. Sure, not as good, but still considered a "holy grail" in some research communities.



                  With that said, I'm extremely doubtful that Beehive is able to meet those expectations. Machine learning can do quite well at extracting complex patterns from large piles of data with reliable identities. For example, differentiating between pictures of cats and dogs is extremely reliable; we can all do it 99+% of the time, yet if I had to tell what's the exact algorithm for taking in 100x100 pixels and determining cat vs dog, I have no idea how I would do that. But I can supply you with 100,000 of such images and let ML methods figure out a rule that reliably differentiates between the two based on the values of 100x100 pixels. If I do things right, the rules created by ML should even work on new images of cats and dogs, assuming no huge changes in the new data (i.e., if I only used labs and tabby cats in the training data, then try to get it to identify a terrier...good luck). That's ML's strength.



                  Determining "suspicious behavior" is a much more difficult issue. We don't have 100,000's of samples of confirmed bad behavior, and we don't even really have 100,000's of samples of confirmed good behavior! Worse yet, what was a good ML method that worked yesterday doesn't work today; unlike cats and dogs in photos, adversaries try really hard to trick you. Most people I know working on ML for cyber security have accepted that the idea of purely automated detection is beyond our grasp at the moment, but perhaps we can build tools to automate very specific repetitive tasks that a security analyst needs to do over and over, thus making them more efficient.



                  With that said, the authors of Beehive seem to have skipped that lesson and claim that they've solved this problem. I'm highly suspicious of the performance, especially given that the methods they suggest are the first one a ML researcher might think to try and have routinely been rejected as not useful. For example, they suggest using PCA to identify outliers in logs. This, and variations of it, has been tried 100s of times and the result is always that the security analyst shuts off the "automated detection" because they get so many false positives that it costs way more time than it saves.



                  Of course, in all these methods, the devil is the details and the details of these types of methods never really get exposed in published work ("we used PCA to look for outliers in server logs" is an extremely vague statement). It's always possible that they have some super clever way of preprocessing the data before applying their methods that didn't make it into the paper. But I'd be willing to bet my right arm that no user of Beehive will be able to reliably differentiate between "Snowden-like" behavior and non-adversarial real world use of a network in real time.







                  share|improve this answer










                  New contributor




                  Cliff AB is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.









                  share|improve this answer



                  share|improve this answer








                  edited yesterday





















                  New contributor




                  Cliff AB is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.









                  answered yesterday









                  Cliff AB

                  1214




                  1214




                  New contributor




                  Cliff AB is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.





                  New contributor





                  Cliff AB is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.






                  Cliff AB is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.



























                       

                      draft saved


                      draft discarded















































                       


                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsecurity.stackexchange.com%2fquestions%2f197169%2fcan-beehive-detect-a-snowden-like-actor%23new-answer', 'question_page');

                      );

                      Post as a guest














































































                      這個網誌中的熱門文章

                      How to read a connectionString WITH PROVIDER in .NET Core?

                      In R, how to develop a multiplot heatmap.2 figure showing key labels successfully

                      Museum of Modern and Contemporary Art of Trento and Rovereto