Bulk Unload from Redshift to S3 Interrupted
I wrote a python script that will do a bulk unload of all tables within a schema to s3, which scales to petabytes of data. While my script was running perfectly okay, my python script got interrupted due to a network disconnection.
Now, I'm in the midst of an unload job, unsure of how I can resume from the last point of failure. While I am debating about rerunning everything from the start, I am obviously thinking of using a Jenkins slave to run my job, however, I don't want to lose the hours of unload that had already completed. Is there a way to resume from where it stopped? My thinking is that it is hard to figure out the point to resume when the files are coming in as a ZIP.
What are the best practices I could use to avoid this in the future?
Any strategies to be able to pick up from where it left off?
Part of my code snippet that does unload:
#function to unload the data from tables into s3
def unloadData(passed_tables,schema,cur):
#loop through each table from the specified schema
for table in passed_tables:
#extract the table name string and store it in a variable
i=(table[0])
try:
#unload query to migrate the table to s3
unload='''unload('select * from 0.1') to 's3://<bucket>/2/3/'
iam_role 'arn:aws:iam::*****:role/***';'''.format(schema,i,schema,i)
cur.execute(unload)
print("Unload in progress! Check S3 bucket in a while to confirm.")
except Exception, e:
print("Failed to unload data! Try again or check the query.")
print(sys.stderr, "Exception: %s" % str(e))
sys.exit(1)
python database amazon-redshift database-migration fault-tolerance
add a comment |
I wrote a python script that will do a bulk unload of all tables within a schema to s3, which scales to petabytes of data. While my script was running perfectly okay, my python script got interrupted due to a network disconnection.
Now, I'm in the midst of an unload job, unsure of how I can resume from the last point of failure. While I am debating about rerunning everything from the start, I am obviously thinking of using a Jenkins slave to run my job, however, I don't want to lose the hours of unload that had already completed. Is there a way to resume from where it stopped? My thinking is that it is hard to figure out the point to resume when the files are coming in as a ZIP.
What are the best practices I could use to avoid this in the future?
Any strategies to be able to pick up from where it left off?
Part of my code snippet that does unload:
#function to unload the data from tables into s3
def unloadData(passed_tables,schema,cur):
#loop through each table from the specified schema
for table in passed_tables:
#extract the table name string and store it in a variable
i=(table[0])
try:
#unload query to migrate the table to s3
unload='''unload('select * from 0.1') to 's3://<bucket>/2/3/'
iam_role 'arn:aws:iam::*****:role/***';'''.format(schema,i,schema,i)
cur.execute(unload)
print("Unload in progress! Check S3 bucket in a while to confirm.")
except Exception, e:
print("Failed to unload data! Try again or check the query.")
print(sys.stderr, "Exception: %s" % str(e))
sys.exit(1)
python database amazon-redshift database-migration fault-tolerance
add a comment |
I wrote a python script that will do a bulk unload of all tables within a schema to s3, which scales to petabytes of data. While my script was running perfectly okay, my python script got interrupted due to a network disconnection.
Now, I'm in the midst of an unload job, unsure of how I can resume from the last point of failure. While I am debating about rerunning everything from the start, I am obviously thinking of using a Jenkins slave to run my job, however, I don't want to lose the hours of unload that had already completed. Is there a way to resume from where it stopped? My thinking is that it is hard to figure out the point to resume when the files are coming in as a ZIP.
What are the best practices I could use to avoid this in the future?
Any strategies to be able to pick up from where it left off?
Part of my code snippet that does unload:
#function to unload the data from tables into s3
def unloadData(passed_tables,schema,cur):
#loop through each table from the specified schema
for table in passed_tables:
#extract the table name string and store it in a variable
i=(table[0])
try:
#unload query to migrate the table to s3
unload='''unload('select * from 0.1') to 's3://<bucket>/2/3/'
iam_role 'arn:aws:iam::*****:role/***';'''.format(schema,i,schema,i)
cur.execute(unload)
print("Unload in progress! Check S3 bucket in a while to confirm.")
except Exception, e:
print("Failed to unload data! Try again or check the query.")
print(sys.stderr, "Exception: %s" % str(e))
sys.exit(1)
python database amazon-redshift database-migration fault-tolerance
I wrote a python script that will do a bulk unload of all tables within a schema to s3, which scales to petabytes of data. While my script was running perfectly okay, my python script got interrupted due to a network disconnection.
Now, I'm in the midst of an unload job, unsure of how I can resume from the last point of failure. While I am debating about rerunning everything from the start, I am obviously thinking of using a Jenkins slave to run my job, however, I don't want to lose the hours of unload that had already completed. Is there a way to resume from where it stopped? My thinking is that it is hard to figure out the point to resume when the files are coming in as a ZIP.
What are the best practices I could use to avoid this in the future?
Any strategies to be able to pick up from where it left off?
Part of my code snippet that does unload:
#function to unload the data from tables into s3
def unloadData(passed_tables,schema,cur):
#loop through each table from the specified schema
for table in passed_tables:
#extract the table name string and store it in a variable
i=(table[0])
try:
#unload query to migrate the table to s3
unload='''unload('select * from 0.1') to 's3://<bucket>/2/3/'
iam_role 'arn:aws:iam::*****:role/***';'''.format(schema,i,schema,i)
cur.execute(unload)
print("Unload in progress! Check S3 bucket in a while to confirm.")
except Exception, e:
print("Failed to unload data! Try again or check the query.")
print(sys.stderr, "Exception: %s" % str(e))
sys.exit(1)
python database amazon-redshift database-migration fault-tolerance
python database amazon-redshift database-migration fault-tolerance
edited Nov 15 '18 at 0:40
Praneeth Turlapati
asked Nov 15 '18 at 0:04
Praneeth TurlapatiPraneeth Turlapati
63
63
add a comment |
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53310613%2fbulk-unload-from-redshift-to-s3-interrupted%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53310613%2fbulk-unload-from-redshift-to-s3-interrupted%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown