How to get the exact count of lines in a very large text file in R? [duplicate]










0
















This question already has an answer here:



  • Get the number of lines in a text file using R

    5 answers



I have multiple files with over 1.000.000 lines each, but I need to know the exact number of lines for each document using R. How can I achieve that?










share|improve this question













marked as duplicate by hrbrmstr r
Users with the  r badge can single-handedly close r questions as duplicates and reopen them as needed.

StackExchange.ready(function()
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();

);
);
);
Nov 13 '18 at 12:38


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.















  • There are several different ways you could go. I'm not sure what's most efficient with files of that size, but an easy solution would be something like length(readLines(filename))

    – duckmayr
    Nov 13 '18 at 12:20











  • works so far, I thought this one would fail at large documents

    – Daniel Gießing
    Nov 13 '18 at 12:26















0
















This question already has an answer here:



  • Get the number of lines in a text file using R

    5 answers



I have multiple files with over 1.000.000 lines each, but I need to know the exact number of lines for each document using R. How can I achieve that?










share|improve this question













marked as duplicate by hrbrmstr r
Users with the  r badge can single-handedly close r questions as duplicates and reopen them as needed.

StackExchange.ready(function()
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();

);
);
);
Nov 13 '18 at 12:38


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.















  • There are several different ways you could go. I'm not sure what's most efficient with files of that size, but an easy solution would be something like length(readLines(filename))

    – duckmayr
    Nov 13 '18 at 12:20











  • works so far, I thought this one would fail at large documents

    – Daniel Gießing
    Nov 13 '18 at 12:26













0












0








0









This question already has an answer here:



  • Get the number of lines in a text file using R

    5 answers



I have multiple files with over 1.000.000 lines each, but I need to know the exact number of lines for each document using R. How can I achieve that?










share|improve this question















This question already has an answer here:



  • Get the number of lines in a text file using R

    5 answers



I have multiple files with over 1.000.000 lines each, but I need to know the exact number of lines for each document using R. How can I achieve that?





This question already has an answer here:



  • Get the number of lines in a text file using R

    5 answers







r






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 13 '18 at 12:15









Daniel GießingDaniel Gießing

63




63




marked as duplicate by hrbrmstr r
Users with the  r badge can single-handedly close r questions as duplicates and reopen them as needed.

StackExchange.ready(function()
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();

);
);
);
Nov 13 '18 at 12:38


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.






marked as duplicate by hrbrmstr r
Users with the  r badge can single-handedly close r questions as duplicates and reopen them as needed.

StackExchange.ready(function()
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();

);
);
);
Nov 13 '18 at 12:38


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.














  • There are several different ways you could go. I'm not sure what's most efficient with files of that size, but an easy solution would be something like length(readLines(filename))

    – duckmayr
    Nov 13 '18 at 12:20











  • works so far, I thought this one would fail at large documents

    – Daniel Gießing
    Nov 13 '18 at 12:26

















  • There are several different ways you could go. I'm not sure what's most efficient with files of that size, but an easy solution would be something like length(readLines(filename))

    – duckmayr
    Nov 13 '18 at 12:20











  • works so far, I thought this one would fail at large documents

    – Daniel Gießing
    Nov 13 '18 at 12:26
















There are several different ways you could go. I'm not sure what's most efficient with files of that size, but an easy solution would be something like length(readLines(filename))

– duckmayr
Nov 13 '18 at 12:20





There are several different ways you could go. I'm not sure what's most efficient with files of that size, but an easy solution would be something like length(readLines(filename))

– duckmayr
Nov 13 '18 at 12:20













works so far, I thought this one would fail at large documents

– Daniel Gießing
Nov 13 '18 at 12:26





works so far, I thought this one would fail at large documents

– Daniel Gießing
Nov 13 '18 at 12:26












1 Answer
1






active

oldest

votes


















2














1) wc This should be quite fast. First determine the filenames. We have assumed all files in the current directory whose extension is .txt. Change as needed. Then for each file run wc -l and form a data frame from it.



(If you are on Windows then install Rtools and ensure that Rtoolsbin is on your PATH.)



filenames <- dir(pattern = "[.]txt$")
wc <- function(x) shell(paste("wc -l", x), intern = TRUE)
DF <- read.table(text = sapply(filenames, wc), col.names = c("count", "filename"))


2) count.fields An alternative approach is to use count.fields. This does not make use of any external commands. filenames is from above.



sapply(filenames, function(x) length(count.fields(x, sep = "1")))





share|improve this answer































    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    2














    1) wc This should be quite fast. First determine the filenames. We have assumed all files in the current directory whose extension is .txt. Change as needed. Then for each file run wc -l and form a data frame from it.



    (If you are on Windows then install Rtools and ensure that Rtoolsbin is on your PATH.)



    filenames <- dir(pattern = "[.]txt$")
    wc <- function(x) shell(paste("wc -l", x), intern = TRUE)
    DF <- read.table(text = sapply(filenames, wc), col.names = c("count", "filename"))


    2) count.fields An alternative approach is to use count.fields. This does not make use of any external commands. filenames is from above.



    sapply(filenames, function(x) length(count.fields(x, sep = "1")))





    share|improve this answer





























      2














      1) wc This should be quite fast. First determine the filenames. We have assumed all files in the current directory whose extension is .txt. Change as needed. Then for each file run wc -l and form a data frame from it.



      (If you are on Windows then install Rtools and ensure that Rtoolsbin is on your PATH.)



      filenames <- dir(pattern = "[.]txt$")
      wc <- function(x) shell(paste("wc -l", x), intern = TRUE)
      DF <- read.table(text = sapply(filenames, wc), col.names = c("count", "filename"))


      2) count.fields An alternative approach is to use count.fields. This does not make use of any external commands. filenames is from above.



      sapply(filenames, function(x) length(count.fields(x, sep = "1")))





      share|improve this answer



























        2












        2








        2







        1) wc This should be quite fast. First determine the filenames. We have assumed all files in the current directory whose extension is .txt. Change as needed. Then for each file run wc -l and form a data frame from it.



        (If you are on Windows then install Rtools and ensure that Rtoolsbin is on your PATH.)



        filenames <- dir(pattern = "[.]txt$")
        wc <- function(x) shell(paste("wc -l", x), intern = TRUE)
        DF <- read.table(text = sapply(filenames, wc), col.names = c("count", "filename"))


        2) count.fields An alternative approach is to use count.fields. This does not make use of any external commands. filenames is from above.



        sapply(filenames, function(x) length(count.fields(x, sep = "1")))





        share|improve this answer















        1) wc This should be quite fast. First determine the filenames. We have assumed all files in the current directory whose extension is .txt. Change as needed. Then for each file run wc -l and form a data frame from it.



        (If you are on Windows then install Rtools and ensure that Rtoolsbin is on your PATH.)



        filenames <- dir(pattern = "[.]txt$")
        wc <- function(x) shell(paste("wc -l", x), intern = TRUE)
        DF <- read.table(text = sapply(filenames, wc), col.names = c("count", "filename"))


        2) count.fields An alternative approach is to use count.fields. This does not make use of any external commands. filenames is from above.



        sapply(filenames, function(x) length(count.fields(x, sep = "1")))






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 13 '18 at 12:37

























        answered Nov 13 '18 at 12:29









        G. GrothendieckG. Grothendieck

        146k9129233




        146k9129233













            這個網誌中的熱門文章

            How to read a connectionString WITH PROVIDER in .NET Core?

            In R, how to develop a multiplot heatmap.2 figure showing key labels successfully

            Museum of Modern and Contemporary Art of Trento and Rovereto