GNU parallel: deleting line from joblog breaks parallel updating it
If you run GNU parallel with --joblog path/to/logfile
and then delete a line from said logfile while parallel is running, GNU parallel is no longer able to append future completed jobs to it.
Execute this MWE:
#!/usr/bin/bash
parallel -j1 -n0 --joblog log sleep 1 ::: $(seq 10) &
sleep 5 && sed -i '$ d' log
If you tail -f log
prior to execution, you can see that parallel keeps writing to this file. However, if you cat log
after 10 seconds, you will see that nothing was written to the actual file now on disk after the third entry or so.
What's the reason behind this? Is there a way to delete something from the file and have GNU parallel be able to still write to it?
Some background as to why this happened:
Using GNU parallel, I started a few jobs on remote machines with --sshloginfile
. I then needed to pkill
a few jobs on one of the machines because a colleague needed to use it (and I subsequently removed the machine from the sshloginfile so that parallel wouldn't reuse it for new runs). If you pkill
those processes started on the remote machine, they get an Exitval of 0 (it looks like they finished without issues; you can't tell that they were killed). I wanted to remove them immediately from the joblog so that when I restart parallel --resume
later, parallel can have a look at the joblog and determine what's missing.
Turns out, this was a bad idea, as now my joblog is useless.
gnu-parallel
add a comment |
If you run GNU parallel with --joblog path/to/logfile
and then delete a line from said logfile while parallel is running, GNU parallel is no longer able to append future completed jobs to it.
Execute this MWE:
#!/usr/bin/bash
parallel -j1 -n0 --joblog log sleep 1 ::: $(seq 10) &
sleep 5 && sed -i '$ d' log
If you tail -f log
prior to execution, you can see that parallel keeps writing to this file. However, if you cat log
after 10 seconds, you will see that nothing was written to the actual file now on disk after the third entry or so.
What's the reason behind this? Is there a way to delete something from the file and have GNU parallel be able to still write to it?
Some background as to why this happened:
Using GNU parallel, I started a few jobs on remote machines with --sshloginfile
. I then needed to pkill
a few jobs on one of the machines because a colleague needed to use it (and I subsequently removed the machine from the sshloginfile so that parallel wouldn't reuse it for new runs). If you pkill
those processes started on the remote machine, they get an Exitval of 0 (it looks like they finished without issues; you can't tell that they were killed). I wanted to remove them immediately from the joblog so that when I restart parallel --resume
later, parallel can have a look at the joblog and determine what's missing.
Turns out, this was a bad idea, as now my joblog is useless.
gnu-parallel
3
AFAIK, it's always a "bad idea" TM to allow multiple programs to write to the same file!
– Mark Setchell
Nov 15 '18 at 19:29
Yeah, you are of course right. I just expected parallel to not hold on to the log file, but to just append once a new jobs was finished.
– mSSM
Nov 15 '18 at 22:20
add a comment |
If you run GNU parallel with --joblog path/to/logfile
and then delete a line from said logfile while parallel is running, GNU parallel is no longer able to append future completed jobs to it.
Execute this MWE:
#!/usr/bin/bash
parallel -j1 -n0 --joblog log sleep 1 ::: $(seq 10) &
sleep 5 && sed -i '$ d' log
If you tail -f log
prior to execution, you can see that parallel keeps writing to this file. However, if you cat log
after 10 seconds, you will see that nothing was written to the actual file now on disk after the third entry or so.
What's the reason behind this? Is there a way to delete something from the file and have GNU parallel be able to still write to it?
Some background as to why this happened:
Using GNU parallel, I started a few jobs on remote machines with --sshloginfile
. I then needed to pkill
a few jobs on one of the machines because a colleague needed to use it (and I subsequently removed the machine from the sshloginfile so that parallel wouldn't reuse it for new runs). If you pkill
those processes started on the remote machine, they get an Exitval of 0 (it looks like they finished without issues; you can't tell that they were killed). I wanted to remove them immediately from the joblog so that when I restart parallel --resume
later, parallel can have a look at the joblog and determine what's missing.
Turns out, this was a bad idea, as now my joblog is useless.
gnu-parallel
If you run GNU parallel with --joblog path/to/logfile
and then delete a line from said logfile while parallel is running, GNU parallel is no longer able to append future completed jobs to it.
Execute this MWE:
#!/usr/bin/bash
parallel -j1 -n0 --joblog log sleep 1 ::: $(seq 10) &
sleep 5 && sed -i '$ d' log
If you tail -f log
prior to execution, you can see that parallel keeps writing to this file. However, if you cat log
after 10 seconds, you will see that nothing was written to the actual file now on disk after the third entry or so.
What's the reason behind this? Is there a way to delete something from the file and have GNU parallel be able to still write to it?
Some background as to why this happened:
Using GNU parallel, I started a few jobs on remote machines with --sshloginfile
. I then needed to pkill
a few jobs on one of the machines because a colleague needed to use it (and I subsequently removed the machine from the sshloginfile so that parallel wouldn't reuse it for new runs). If you pkill
those processes started on the remote machine, they get an Exitval of 0 (it looks like they finished without issues; you can't tell that they were killed). I wanted to remove them immediately from the joblog so that when I restart parallel --resume
later, parallel can have a look at the joblog and determine what's missing.
Turns out, this was a bad idea, as now my joblog is useless.
gnu-parallel
gnu-parallel
asked Nov 15 '18 at 19:25
mSSMmSSM
29138
29138
3
AFAIK, it's always a "bad idea" TM to allow multiple programs to write to the same file!
– Mark Setchell
Nov 15 '18 at 19:29
Yeah, you are of course right. I just expected parallel to not hold on to the log file, but to just append once a new jobs was finished.
– mSSM
Nov 15 '18 at 22:20
add a comment |
3
AFAIK, it's always a "bad idea" TM to allow multiple programs to write to the same file!
– Mark Setchell
Nov 15 '18 at 19:29
Yeah, you are of course right. I just expected parallel to not hold on to the log file, but to just append once a new jobs was finished.
– mSSM
Nov 15 '18 at 22:20
3
3
AFAIK, it's always a "bad idea" TM to allow multiple programs to write to the same file!
– Mark Setchell
Nov 15 '18 at 19:29
AFAIK, it's always a "bad idea" TM to allow multiple programs to write to the same file!
– Mark Setchell
Nov 15 '18 at 19:29
Yeah, you are of course right. I just expected parallel to not hold on to the log file, but to just append once a new jobs was finished.
– mSSM
Nov 15 '18 at 22:20
Yeah, you are of course right. I just expected parallel to not hold on to the log file, but to just append once a new jobs was finished.
– mSSM
Nov 15 '18 at 22:20
add a comment |
1 Answer
1
active
oldest
votes
While @MarkSetchell is absolutely right in his comment, root problem here is due to man sed
lying:
-i[SUFFIX], --in-place[=SUFFIX]
edit files in place (makes backup if SUFFIX supplied)
sed -i
does not edit files in place.
What it does is to make a temporary file in the same dir, copy the input file to the temporary file while doing the editing, and finally renaming the temporary file to the input file's name. Similar to this:
sed '$ d' log > sedXxO11P
mv sedXxO11P log
It is clear that the original log and sedXxO11P have different inodes - let us call them ino1 and ino2. GNU Parallel has ino1 open and really does not know about the existence of ino2. GNU Parallel will happily append to ino1 completely unaware that when it closes the file, the file will vanish because it has already been unlinked.
So you need to change the content of the file without changing the inode:
#!/usr/bin/bash
seq 10 | parallel -j1 -n0 --joblog log sleep 1 &
sleep 5
# Obvious race condition here:
# Anything appended to log before sed is done is lost.
# This can be avoided by suspending parallel while running this
tmp=$RANDOM$$
cp log $tmp
(rm $tmp; sed '$ d' >log) < $tmp
wait
cat log
This works right now. But do not expect this to be a supported feature - ever.
Hi, thanks for providing this “fix”. You are probably right that this is way to unsecure to be used. Maybe as an alternative question: if I kill a job on a remote host, how do I go about resuming it? If I understand the--resume
and--resume-flags
flags correctly, there is no wayparallel
would know to resume my jobs. If I recall correctly,Exitval
was reported as 0 (= success). Together with the folder pointed to by--results
being populatel, it looks toparallel
as if all went well.
– mSSM
Nov 21 '18 at 20:14
It looks like when I pkill a process started by parallel, it should report an exitval of 143, but it doesn't. Any idea why?
– mSSM
Nov 21 '18 at 20:50
@mSSM I cannot reproduce that. Are you using current release or an ancient, buggy version?
– Ole Tange
Nov 21 '18 at 21:12
Ask a new question with an MCVE.
– Ole Tange
Nov 21 '18 at 21:17
I will crosscheck and submit. Thank you.
– mSSM
Nov 21 '18 at 22:05
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53326619%2fgnu-parallel-deleting-line-from-joblog-breaks-parallel-updating-it%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
While @MarkSetchell is absolutely right in his comment, root problem here is due to man sed
lying:
-i[SUFFIX], --in-place[=SUFFIX]
edit files in place (makes backup if SUFFIX supplied)
sed -i
does not edit files in place.
What it does is to make a temporary file in the same dir, copy the input file to the temporary file while doing the editing, and finally renaming the temporary file to the input file's name. Similar to this:
sed '$ d' log > sedXxO11P
mv sedXxO11P log
It is clear that the original log and sedXxO11P have different inodes - let us call them ino1 and ino2. GNU Parallel has ino1 open and really does not know about the existence of ino2. GNU Parallel will happily append to ino1 completely unaware that when it closes the file, the file will vanish because it has already been unlinked.
So you need to change the content of the file without changing the inode:
#!/usr/bin/bash
seq 10 | parallel -j1 -n0 --joblog log sleep 1 &
sleep 5
# Obvious race condition here:
# Anything appended to log before sed is done is lost.
# This can be avoided by suspending parallel while running this
tmp=$RANDOM$$
cp log $tmp
(rm $tmp; sed '$ d' >log) < $tmp
wait
cat log
This works right now. But do not expect this to be a supported feature - ever.
Hi, thanks for providing this “fix”. You are probably right that this is way to unsecure to be used. Maybe as an alternative question: if I kill a job on a remote host, how do I go about resuming it? If I understand the--resume
and--resume-flags
flags correctly, there is no wayparallel
would know to resume my jobs. If I recall correctly,Exitval
was reported as 0 (= success). Together with the folder pointed to by--results
being populatel, it looks toparallel
as if all went well.
– mSSM
Nov 21 '18 at 20:14
It looks like when I pkill a process started by parallel, it should report an exitval of 143, but it doesn't. Any idea why?
– mSSM
Nov 21 '18 at 20:50
@mSSM I cannot reproduce that. Are you using current release or an ancient, buggy version?
– Ole Tange
Nov 21 '18 at 21:12
Ask a new question with an MCVE.
– Ole Tange
Nov 21 '18 at 21:17
I will crosscheck and submit. Thank you.
– mSSM
Nov 21 '18 at 22:05
add a comment |
While @MarkSetchell is absolutely right in his comment, root problem here is due to man sed
lying:
-i[SUFFIX], --in-place[=SUFFIX]
edit files in place (makes backup if SUFFIX supplied)
sed -i
does not edit files in place.
What it does is to make a temporary file in the same dir, copy the input file to the temporary file while doing the editing, and finally renaming the temporary file to the input file's name. Similar to this:
sed '$ d' log > sedXxO11P
mv sedXxO11P log
It is clear that the original log and sedXxO11P have different inodes - let us call them ino1 and ino2. GNU Parallel has ino1 open and really does not know about the existence of ino2. GNU Parallel will happily append to ino1 completely unaware that when it closes the file, the file will vanish because it has already been unlinked.
So you need to change the content of the file without changing the inode:
#!/usr/bin/bash
seq 10 | parallel -j1 -n0 --joblog log sleep 1 &
sleep 5
# Obvious race condition here:
# Anything appended to log before sed is done is lost.
# This can be avoided by suspending parallel while running this
tmp=$RANDOM$$
cp log $tmp
(rm $tmp; sed '$ d' >log) < $tmp
wait
cat log
This works right now. But do not expect this to be a supported feature - ever.
Hi, thanks for providing this “fix”. You are probably right that this is way to unsecure to be used. Maybe as an alternative question: if I kill a job on a remote host, how do I go about resuming it? If I understand the--resume
and--resume-flags
flags correctly, there is no wayparallel
would know to resume my jobs. If I recall correctly,Exitval
was reported as 0 (= success). Together with the folder pointed to by--results
being populatel, it looks toparallel
as if all went well.
– mSSM
Nov 21 '18 at 20:14
It looks like when I pkill a process started by parallel, it should report an exitval of 143, but it doesn't. Any idea why?
– mSSM
Nov 21 '18 at 20:50
@mSSM I cannot reproduce that. Are you using current release or an ancient, buggy version?
– Ole Tange
Nov 21 '18 at 21:12
Ask a new question with an MCVE.
– Ole Tange
Nov 21 '18 at 21:17
I will crosscheck and submit. Thank you.
– mSSM
Nov 21 '18 at 22:05
add a comment |
While @MarkSetchell is absolutely right in his comment, root problem here is due to man sed
lying:
-i[SUFFIX], --in-place[=SUFFIX]
edit files in place (makes backup if SUFFIX supplied)
sed -i
does not edit files in place.
What it does is to make a temporary file in the same dir, copy the input file to the temporary file while doing the editing, and finally renaming the temporary file to the input file's name. Similar to this:
sed '$ d' log > sedXxO11P
mv sedXxO11P log
It is clear that the original log and sedXxO11P have different inodes - let us call them ino1 and ino2. GNU Parallel has ino1 open and really does not know about the existence of ino2. GNU Parallel will happily append to ino1 completely unaware that when it closes the file, the file will vanish because it has already been unlinked.
So you need to change the content of the file without changing the inode:
#!/usr/bin/bash
seq 10 | parallel -j1 -n0 --joblog log sleep 1 &
sleep 5
# Obvious race condition here:
# Anything appended to log before sed is done is lost.
# This can be avoided by suspending parallel while running this
tmp=$RANDOM$$
cp log $tmp
(rm $tmp; sed '$ d' >log) < $tmp
wait
cat log
This works right now. But do not expect this to be a supported feature - ever.
While @MarkSetchell is absolutely right in his comment, root problem here is due to man sed
lying:
-i[SUFFIX], --in-place[=SUFFIX]
edit files in place (makes backup if SUFFIX supplied)
sed -i
does not edit files in place.
What it does is to make a temporary file in the same dir, copy the input file to the temporary file while doing the editing, and finally renaming the temporary file to the input file's name. Similar to this:
sed '$ d' log > sedXxO11P
mv sedXxO11P log
It is clear that the original log and sedXxO11P have different inodes - let us call them ino1 and ino2. GNU Parallel has ino1 open and really does not know about the existence of ino2. GNU Parallel will happily append to ino1 completely unaware that when it closes the file, the file will vanish because it has already been unlinked.
So you need to change the content of the file without changing the inode:
#!/usr/bin/bash
seq 10 | parallel -j1 -n0 --joblog log sleep 1 &
sleep 5
# Obvious race condition here:
# Anything appended to log before sed is done is lost.
# This can be avoided by suspending parallel while running this
tmp=$RANDOM$$
cp log $tmp
(rm $tmp; sed '$ d' >log) < $tmp
wait
cat log
This works right now. But do not expect this to be a supported feature - ever.
answered Nov 21 '18 at 19:47
Ole TangeOle Tange
19.9k35669
19.9k35669
Hi, thanks for providing this “fix”. You are probably right that this is way to unsecure to be used. Maybe as an alternative question: if I kill a job on a remote host, how do I go about resuming it? If I understand the--resume
and--resume-flags
flags correctly, there is no wayparallel
would know to resume my jobs. If I recall correctly,Exitval
was reported as 0 (= success). Together with the folder pointed to by--results
being populatel, it looks toparallel
as if all went well.
– mSSM
Nov 21 '18 at 20:14
It looks like when I pkill a process started by parallel, it should report an exitval of 143, but it doesn't. Any idea why?
– mSSM
Nov 21 '18 at 20:50
@mSSM I cannot reproduce that. Are you using current release or an ancient, buggy version?
– Ole Tange
Nov 21 '18 at 21:12
Ask a new question with an MCVE.
– Ole Tange
Nov 21 '18 at 21:17
I will crosscheck and submit. Thank you.
– mSSM
Nov 21 '18 at 22:05
add a comment |
Hi, thanks for providing this “fix”. You are probably right that this is way to unsecure to be used. Maybe as an alternative question: if I kill a job on a remote host, how do I go about resuming it? If I understand the--resume
and--resume-flags
flags correctly, there is no wayparallel
would know to resume my jobs. If I recall correctly,Exitval
was reported as 0 (= success). Together with the folder pointed to by--results
being populatel, it looks toparallel
as if all went well.
– mSSM
Nov 21 '18 at 20:14
It looks like when I pkill a process started by parallel, it should report an exitval of 143, but it doesn't. Any idea why?
– mSSM
Nov 21 '18 at 20:50
@mSSM I cannot reproduce that. Are you using current release or an ancient, buggy version?
– Ole Tange
Nov 21 '18 at 21:12
Ask a new question with an MCVE.
– Ole Tange
Nov 21 '18 at 21:17
I will crosscheck and submit. Thank you.
– mSSM
Nov 21 '18 at 22:05
Hi, thanks for providing this “fix”. You are probably right that this is way to unsecure to be used. Maybe as an alternative question: if I kill a job on a remote host, how do I go about resuming it? If I understand the
--resume
and --resume-flags
flags correctly, there is no way parallel
would know to resume my jobs. If I recall correctly, Exitval
was reported as 0 (= success). Together with the folder pointed to by --results
being populatel, it looks to parallel
as if all went well.– mSSM
Nov 21 '18 at 20:14
Hi, thanks for providing this “fix”. You are probably right that this is way to unsecure to be used. Maybe as an alternative question: if I kill a job on a remote host, how do I go about resuming it? If I understand the
--resume
and --resume-flags
flags correctly, there is no way parallel
would know to resume my jobs. If I recall correctly, Exitval
was reported as 0 (= success). Together with the folder pointed to by --results
being populatel, it looks to parallel
as if all went well.– mSSM
Nov 21 '18 at 20:14
It looks like when I pkill a process started by parallel, it should report an exitval of 143, but it doesn't. Any idea why?
– mSSM
Nov 21 '18 at 20:50
It looks like when I pkill a process started by parallel, it should report an exitval of 143, but it doesn't. Any idea why?
– mSSM
Nov 21 '18 at 20:50
@mSSM I cannot reproduce that. Are you using current release or an ancient, buggy version?
– Ole Tange
Nov 21 '18 at 21:12
@mSSM I cannot reproduce that. Are you using current release or an ancient, buggy version?
– Ole Tange
Nov 21 '18 at 21:12
Ask a new question with an MCVE.
– Ole Tange
Nov 21 '18 at 21:17
Ask a new question with an MCVE.
– Ole Tange
Nov 21 '18 at 21:17
I will crosscheck and submit. Thank you.
– mSSM
Nov 21 '18 at 22:05
I will crosscheck and submit. Thank you.
– mSSM
Nov 21 '18 at 22:05
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53326619%2fgnu-parallel-deleting-line-from-joblog-breaks-parallel-updating-it%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
3
AFAIK, it's always a "bad idea" TM to allow multiple programs to write to the same file!
– Mark Setchell
Nov 15 '18 at 19:29
Yeah, you are of course right. I just expected parallel to not hold on to the log file, but to just append once a new jobs was finished.
– mSSM
Nov 15 '18 at 22:20