How to get the exact count of lines in a very large text file in R? [duplicate]
This question already has an answer here:
Get the number of lines in a text file using R
5 answers
I have multiple files with over 1.000.000 lines each, but I need to know the exact number of lines for each document using R. How can I achieve that?
r
marked as duplicate by hrbrmstr
StackExchange.ready(function()
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();
);
);
);
Nov 13 '18 at 12:38
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
add a comment |
This question already has an answer here:
Get the number of lines in a text file using R
5 answers
I have multiple files with over 1.000.000 lines each, but I need to know the exact number of lines for each document using R. How can I achieve that?
r
marked as duplicate by hrbrmstr
StackExchange.ready(function()
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();
);
);
);
Nov 13 '18 at 12:38
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
There are several different ways you could go. I'm not sure what's most efficient with files of that size, but an easy solution would be something likelength(readLines(filename))
– duckmayr
Nov 13 '18 at 12:20
works so far, I thought this one would fail at large documents
– Daniel Gießing
Nov 13 '18 at 12:26
add a comment |
This question already has an answer here:
Get the number of lines in a text file using R
5 answers
I have multiple files with over 1.000.000 lines each, but I need to know the exact number of lines for each document using R. How can I achieve that?
r
This question already has an answer here:
Get the number of lines in a text file using R
5 answers
I have multiple files with over 1.000.000 lines each, but I need to know the exact number of lines for each document using R. How can I achieve that?
This question already has an answer here:
Get the number of lines in a text file using R
5 answers
r
r
asked Nov 13 '18 at 12:15
Daniel GießingDaniel Gießing
63
63
marked as duplicate by hrbrmstr
StackExchange.ready(function()
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();
);
);
);
Nov 13 '18 at 12:38
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
marked as duplicate by hrbrmstr
StackExchange.ready(function()
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();
);
);
);
Nov 13 '18 at 12:38
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
There are several different ways you could go. I'm not sure what's most efficient with files of that size, but an easy solution would be something likelength(readLines(filename))
– duckmayr
Nov 13 '18 at 12:20
works so far, I thought this one would fail at large documents
– Daniel Gießing
Nov 13 '18 at 12:26
add a comment |
There are several different ways you could go. I'm not sure what's most efficient with files of that size, but an easy solution would be something likelength(readLines(filename))
– duckmayr
Nov 13 '18 at 12:20
works so far, I thought this one would fail at large documents
– Daniel Gießing
Nov 13 '18 at 12:26
There are several different ways you could go. I'm not sure what's most efficient with files of that size, but an easy solution would be something like
length(readLines(filename))
– duckmayr
Nov 13 '18 at 12:20
There are several different ways you could go. I'm not sure what's most efficient with files of that size, but an easy solution would be something like
length(readLines(filename))
– duckmayr
Nov 13 '18 at 12:20
works so far, I thought this one would fail at large documents
– Daniel Gießing
Nov 13 '18 at 12:26
works so far, I thought this one would fail at large documents
– Daniel Gießing
Nov 13 '18 at 12:26
add a comment |
1 Answer
1
active
oldest
votes
1) wc This should be quite fast. First determine the filenames. We have assumed all files in the current directory whose extension is .txt
. Change as needed. Then for each file run wc -l
and form a data frame from it.
(If you are on Windows then install Rtools and ensure that Rtoolsbin
is on your PATH.)
filenames <- dir(pattern = "[.]txt$")
wc <- function(x) shell(paste("wc -l", x), intern = TRUE)
DF <- read.table(text = sapply(filenames, wc), col.names = c("count", "filename"))
2) count.fields An alternative approach is to use count.fields
. This does not make use of any external commands. filenames
is from above.
sapply(filenames, function(x) length(count.fields(x, sep = "1")))
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
1) wc This should be quite fast. First determine the filenames. We have assumed all files in the current directory whose extension is .txt
. Change as needed. Then for each file run wc -l
and form a data frame from it.
(If you are on Windows then install Rtools and ensure that Rtoolsbin
is on your PATH.)
filenames <- dir(pattern = "[.]txt$")
wc <- function(x) shell(paste("wc -l", x), intern = TRUE)
DF <- read.table(text = sapply(filenames, wc), col.names = c("count", "filename"))
2) count.fields An alternative approach is to use count.fields
. This does not make use of any external commands. filenames
is from above.
sapply(filenames, function(x) length(count.fields(x, sep = "1")))
add a comment |
1) wc This should be quite fast. First determine the filenames. We have assumed all files in the current directory whose extension is .txt
. Change as needed. Then for each file run wc -l
and form a data frame from it.
(If you are on Windows then install Rtools and ensure that Rtoolsbin
is on your PATH.)
filenames <- dir(pattern = "[.]txt$")
wc <- function(x) shell(paste("wc -l", x), intern = TRUE)
DF <- read.table(text = sapply(filenames, wc), col.names = c("count", "filename"))
2) count.fields An alternative approach is to use count.fields
. This does not make use of any external commands. filenames
is from above.
sapply(filenames, function(x) length(count.fields(x, sep = "1")))
add a comment |
1) wc This should be quite fast. First determine the filenames. We have assumed all files in the current directory whose extension is .txt
. Change as needed. Then for each file run wc -l
and form a data frame from it.
(If you are on Windows then install Rtools and ensure that Rtoolsbin
is on your PATH.)
filenames <- dir(pattern = "[.]txt$")
wc <- function(x) shell(paste("wc -l", x), intern = TRUE)
DF <- read.table(text = sapply(filenames, wc), col.names = c("count", "filename"))
2) count.fields An alternative approach is to use count.fields
. This does not make use of any external commands. filenames
is from above.
sapply(filenames, function(x) length(count.fields(x, sep = "1")))
1) wc This should be quite fast. First determine the filenames. We have assumed all files in the current directory whose extension is .txt
. Change as needed. Then for each file run wc -l
and form a data frame from it.
(If you are on Windows then install Rtools and ensure that Rtoolsbin
is on your PATH.)
filenames <- dir(pattern = "[.]txt$")
wc <- function(x) shell(paste("wc -l", x), intern = TRUE)
DF <- read.table(text = sapply(filenames, wc), col.names = c("count", "filename"))
2) count.fields An alternative approach is to use count.fields
. This does not make use of any external commands. filenames
is from above.
sapply(filenames, function(x) length(count.fields(x, sep = "1")))
edited Nov 13 '18 at 12:37
answered Nov 13 '18 at 12:29
G. GrothendieckG. Grothendieck
146k9129233
146k9129233
add a comment |
add a comment |
There are several different ways you could go. I'm not sure what's most efficient with files of that size, but an easy solution would be something like
length(readLines(filename))
– duckmayr
Nov 13 '18 at 12:20
works so far, I thought this one would fail at large documents
– Daniel Gießing
Nov 13 '18 at 12:26