Bulk Unload from Redshift to S3 Interrupted

I wrote a python script that will do a bulk unload of all tables within a schema to s3, which scales to petabytes of data. While my script was running perfectly okay, my python script got interrupted due to a network disconnection.

Now, I'm in the midst of an unload job, unsure of how I can resume from the last point of failure. While I am debating about rerunning everything from the start, I am obviously thinking of using a Jenkins slave to run my job, however, I don't want to lose the hours of unload that had already completed. Is there a way to resume from where it stopped? My thinking is that it is hard to figure out the point to resume when the files are coming in as a ZIP.

What are the best practices I could use to avoid this in the future?

Any strategies to be able to pick up from where it left off?

Part of my code snippet that does unload:

#function to unload the data from tables into s3
def unloadData(passed_tables,schema,cur):
 #loop through each table from the specified schema
 for table in passed_tables:
 #extract the table name string and store it in a variable
 i=(table[0])
 try:
 #unload query to migrate the table to s3
 unload='''unload('select * from 0.1') to 's3://<bucket>/2/3/'
 iam_role 'arn:aws:iam::*****:role/***';'''.format(schema,i,schema,i)
 cur.execute(unload)
 print("Unload in progress! Check S3 bucket in a while to confirm.")
 except Exception, e:
 print("Failed to unload data! Try again or check the query.")
 print(sys.stderr, "Exception: %s" % str(e))
 sys.exit(1)

edited Nov 15 '18 at 0:40

asked Nov 15 '18 at 0:04

Praneeth Turlapati

add a comment |

What are the best practices I could use to avoid this in the future?

Any strategies to be able to pick up from where it left off?

Part of my code snippet that does unload:

#function to unload the data from tables into s3
def unloadData(passed_tables,schema,cur):
 #loop through each table from the specified schema
 for table in passed_tables:
 #extract the table name string and store it in a variable
 i=(table[0])
 try:
 #unload query to migrate the table to s3
 unload='''unload('select * from 0.1') to 's3://<bucket>/2/3/'
 iam_role 'arn:aws:iam::*****:role/***';'''.format(schema,i,schema,i)
 cur.execute(unload)
 print("Unload in progress! Check S3 bucket in a while to confirm.")
 except Exception, e:
 print("Failed to unload data! Try again or check the query.")
 print(sys.stderr, "Exception: %s" % str(e))
 sys.exit(1)

edited Nov 15 '18 at 0:40

asked Nov 15 '18 at 0:04

Praneeth Turlapati

add a comment |

What are the best practices I could use to avoid this in the future?

Any strategies to be able to pick up from where it left off?

Part of my code snippet that does unload:

#function to unload the data from tables into s3
def unloadData(passed_tables,schema,cur):
 #loop through each table from the specified schema
 for table in passed_tables:
 #extract the table name string and store it in a variable
 i=(table[0])
 try:
 #unload query to migrate the table to s3
 unload='''unload('select * from 0.1') to 's3://<bucket>/2/3/'
 iam_role 'arn:aws:iam::*****:role/***';'''.format(schema,i,schema,i)
 cur.execute(unload)
 print("Unload in progress! Check S3 bucket in a while to confirm.")
 except Exception, e:
 print("Failed to unload data! Try again or check the query.")
 print(sys.stderr, "Exception: %s" % str(e))
 sys.exit(1)

edited Nov 15 '18 at 0:40

asked Nov 15 '18 at 0:04

Praneeth Turlapati

What are the best practices I could use to avoid this in the future?

Any strategies to be able to pick up from where it left off?

Part of my code snippet that does unload:

#function to unload the data from tables into s3
def unloadData(passed_tables,schema,cur):
 #loop through each table from the specified schema
 for table in passed_tables:
 #extract the table name string and store it in a variable
 i=(table[0])
 try:
 #unload query to migrate the table to s3
 unload='''unload('select * from 0.1') to 's3://<bucket>/2/3/'
 iam_role 'arn:aws:iam::*****:role/***';'''.format(schema,i,schema,i)
 cur.execute(unload)
 print("Unload in progress! Check S3 bucket in a while to confirm.")
 except Exception, e:
 print("Failed to unload data! Try again or check the query.")
 print(sys.stderr, "Exception: %s" % str(e))
 sys.exit(1)

python database amazon-redshift database-migration fault-tolerance

edited Nov 15 '18 at 0:40

asked Nov 15 '18 at 0:04

Praneeth Turlapati

edited Nov 15 '18 at 0:40

asked Nov 15 '18 at 0:04

Praneeth Turlapati

edited Nov 15 '18 at 0:40

asked Nov 15 '18 at 0:04

Praneeth Turlapati

asked Nov 15 '18 at 0:04

Praneeth Turlapati

asked Nov 15 '18 at 0:04

Praneeth Turlapati

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53310613%2fbulk-unload-from-redshift-to-s3-interrupted%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

y,GBFxDw5zQ6iHn4UizLUHNzTGdQUe2ddqYfP0nIczPt3GhoJ,wvJMNYj CH

搜尋此網誌

Odtnhj