Tensorflow batch_join read all images in last batch

up vote
0
down vote

favorite

Similar to Tensorflow batch_join's allow_smaller_final_batch doesn't work?, I want to pass a batch of images to TensorFlow.

I have 365 images on disk, and my batch size is 100. This means, that the last run must take 65 images. But I cannot achieve that.

Here is what I succeed to do, reproducing the Eypros' answer:

for _ in range(nthreads):
 image_list = load_images(input_queue.dequeue())

image_batch = tf.train.batch_join(image_list, batch_size=100, 
 enqueue_many=True, allow_smaller_final_batch=True, 
 capacity=10)
for n in range(3):
 print (n, len(sess.run([image_batch])))

coord.request_stop()
coord.join(threads)

print (n+1, len(sess.run([image_batch ])))

I get the expected

0 100

1 100

2 100

3 10

But if I set capacity to 65, I don't get the desired 65 files in the last batch, I only get 20. I must add that this happpens when nthreads = 4. When I reduce the number of trheads, the outcome is even worse.

What I tried to do, was to query the input queue, and sleep a bit before coord.request_stop().

numq = sess.run(input_queue.size())
print ('after ', n, ' batches, input_queue size:', numq)
if numq > 0:
 time.sleep(0.08)
 numq = sess.run(input_queue.size())
 print ('after sleep, input_queue size:', numq)

This helps a bit, but if sleep is too long (i.e. the input queue becomes 0), my last sess.run() gets infinitely stuck. I don't know why.

I hate the sleep() hack. I am looking for a clean, efficient and reliable way to consume all images in multithreaded session.

I notice that tf.train.batch_join is deprecated, but I don't know how to convert my simple logic to the suggested tf.data.Dataset.interleave(...).batch(batch_size). That is, I don't understand what to write for interleave. Maybe, if I use Dataset, my problem will be easily resolved?

edited Nov 12 at 21:28

asked Nov 11 at 19:06

Alex Cohn

40.6k551183

1

as far as using tf.data, since you have stated you have 365 images, I do not think interleave method of tf.data is necessary, as in general it is used to interleave elements of datasets to form one dataset. You can build a data pipeline through the standard methods that will work very well. This tutorial can help you with getting an introduction to that. They have a basic image pipeline set up using tf.data!
– kvish
Nov 12 at 19:03

add a comment |

up vote
0
down vote

favorite

Similar to Tensorflow batch_join's allow_smaller_final_batch doesn't work?, I want to pass a batch of images to TensorFlow.

I have 365 images on disk, and my batch size is 100. This means, that the last run must take 65 images. But I cannot achieve that.

Here is what I succeed to do, reproducing the Eypros' answer:

for _ in range(nthreads):
 image_list = load_images(input_queue.dequeue())

image_batch = tf.train.batch_join(image_list, batch_size=100, 
 enqueue_many=True, allow_smaller_final_batch=True, 
 capacity=10)
for n in range(3):
 print (n, len(sess.run([image_batch])))

coord.request_stop()
coord.join(threads)

print (n+1, len(sess.run([image_batch ])))

I get the expected

0 100

1 100

2 100

3 10

What I tried to do, was to query the input queue, and sleep a bit before coord.request_stop().

numq = sess.run(input_queue.size())
print ('after ', n, ' batches, input_queue size:', numq)
if numq > 0:
 time.sleep(0.08)
 numq = sess.run(input_queue.size())
 print ('after sleep, input_queue size:', numq)

This helps a bit, but if sleep is too long (i.e. the input queue becomes 0), my last sess.run() gets infinitely stuck. I don't know why.

I hate the sleep() hack. I am looking for a clean, efficient and reliable way to consume all images in multithreaded session.

edited Nov 12 at 21:28

asked Nov 11 at 19:06

Alex Cohn

40.6k551183

1

as far as using tf.data, since you have stated you have 365 images, I do not think interleave method of tf.data is necessary, as in general it is used to interleave elements of datasets to form one dataset. You can build a data pipeline through the standard methods that will work very well. This tutorial can help you with getting an introduction to that. They have a basic image pipeline set up using tf.data!
– kvish
Nov 12 at 19:03

add a comment |

up vote
0
down vote

favorite

Similar to Tensorflow batch_join's allow_smaller_final_batch doesn't work?, I want to pass a batch of images to TensorFlow.

I have 365 images on disk, and my batch size is 100. This means, that the last run must take 65 images. But I cannot achieve that.

Here is what I succeed to do, reproducing the Eypros' answer:

for _ in range(nthreads):
 image_list = load_images(input_queue.dequeue())

image_batch = tf.train.batch_join(image_list, batch_size=100, 
 enqueue_many=True, allow_smaller_final_batch=True, 
 capacity=10)
for n in range(3):
 print (n, len(sess.run([image_batch])))

coord.request_stop()
coord.join(threads)

print (n+1, len(sess.run([image_batch ])))

I get the expected

0 100

1 100

2 100

3 10

What I tried to do, was to query the input queue, and sleep a bit before coord.request_stop().

numq = sess.run(input_queue.size())
print ('after ', n, ' batches, input_queue size:', numq)
if numq > 0:
 time.sleep(0.08)
 numq = sess.run(input_queue.size())
 print ('after sleep, input_queue size:', numq)

This helps a bit, but if sleep is too long (i.e. the input queue becomes 0), my last sess.run() gets infinitely stuck. I don't know why.

I hate the sleep() hack. I am looking for a clean, efficient and reliable way to consume all images in multithreaded session.

edited Nov 12 at 21:28

asked Nov 11 at 19:06

Alex Cohn

40.6k551183

Similar to Tensorflow batch_join's allow_smaller_final_batch doesn't work?, I want to pass a batch of images to TensorFlow.

I have 365 images on disk, and my batch size is 100. This means, that the last run must take 65 images. But I cannot achieve that.

Here is what I succeed to do, reproducing the Eypros' answer:

for _ in range(nthreads):
 image_list = load_images(input_queue.dequeue())

image_batch = tf.train.batch_join(image_list, batch_size=100, 
 enqueue_many=True, allow_smaller_final_batch=True, 
 capacity=10)
for n in range(3):
 print (n, len(sess.run([image_batch])))

coord.request_stop()
coord.join(threads)

print (n+1, len(sess.run([image_batch ])))

I get the expected

0 100

1 100

2 100

3 10

What I tried to do, was to query the input queue, and sleep a bit before coord.request_stop().

numq = sess.run(input_queue.size())
print ('after ', n, ' batches, input_queue size:', numq)
if numq > 0:
 time.sleep(0.08)
 numq = sess.run(input_queue.size())
 print ('after sleep, input_queue size:', numq)

This helps a bit, but if sleep is too long (i.e. the input queue becomes 0), my last sess.run() gets infinitely stuck. I don't know why.

I hate the sleep() hack. I am looking for a clean, efficient and reliable way to consume all images in multithreaded session.

python tensorflow tensorflow-datasets

edited Nov 12 at 21:28

asked Nov 11 at 19:06

Alex Cohn

40.6k551183

edited Nov 12 at 21:28

asked Nov 11 at 19:06

Alex Cohn

40.6k551183

edited Nov 12 at 21:28

asked Nov 11 at 19:06

Alex Cohn

40.6k551183

asked Nov 11 at 19:06

Alex Cohn

40.6k551183

asked Nov 11 at 19:06

Alex Cohn

40.6k551183

1

as far as using tf.data, since you have stated you have 365 images, I do not think interleave method of tf.data is necessary, as in general it is used to interleave elements of datasets to form one dataset. You can build a data pipeline through the standard methods that will work very well. This tutorial can help you with getting an introduction to that. They have a basic image pipeline set up using tf.data!
– kvish
Nov 12 at 19:03

add a comment |

1

as far as using tf.data, since you have stated you have 365 images, I do not think interleave method of tf.data is necessary, as in general it is used to interleave elements of datasets to form one dataset. You can build a data pipeline through the standard methods that will work very well. This tutorial can help you with getting an introduction to that. They have a basic image pipeline set up using tf.data!
– kvish
Nov 12 at 19:03

as far as using tf.data, since you have stated you have 365 images, I do not think interleave method of tf.data is necessary, as in general it is used to interleave elements of datasets to form one dataset. You can build a data pipeline through the standard methods that will work very well. This tutorial can help you with getting an introduction to that. They have a basic image pipeline set up using tf.data!
– kvish
Nov 12 at 19:03

add a comment |

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53252175%2ftensorflow-batch-join-read-all-images-in-last-batch%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

v p,3NO t30lCVUK1dZmlY jAXca4W56h4Bjlc6VO,0G8 iiv,ctAOXy,j15r6 6

搜尋此網誌

Odtnhj