Tensorflow batch_join read all images in last batch









up vote
0
down vote

favorite












Similar to Tensorflow batch_join's allow_smaller_final_batch doesn't work?, I want to pass a batch of images to TensorFlow.



I have 365 images on disk, and my batch size is 100. This means, that the last run must take 65 images. But I cannot achieve that.



Here is what I succeed to do, reproducing the Eypros' answer:



for _ in range(nthreads):
image_list = load_images(input_queue.dequeue())

image_batch = tf.train.batch_join(image_list, batch_size=100,
enqueue_many=True, allow_smaller_final_batch=True,
capacity=10)
for n in range(3):
print (n, len(sess.run([image_batch])))

coord.request_stop()
coord.join(threads)

print (n+1, len(sess.run([image_batch ])))


I get the expected




0 100

1 100

2 100

3 10




But if I set capacity to 65, I don't get the desired 65 files in the last batch, I only get 20. I must add that this happpens when nthreads = 4. When I reduce the number of trheads, the outcome is even worse.



What I tried to do, was to query the input queue, and sleep a bit before coord.request_stop().



numq = sess.run(input_queue.size())
print ('after ', n, ' batches, input_queue size:', numq)
if numq > 0:
time.sleep(0.08)
numq = sess.run(input_queue.size())
print ('after sleep, input_queue size:', numq)


This helps a bit, but if sleep is too long (i.e. the input queue becomes 0), my last sess.run() gets infinitely stuck. I don't know why.



I hate the sleep() hack. I am looking for a clean, efficient and reliable way to consume all images in multithreaded session.



I notice that tf.train.batch_join is deprecated, but I don't know how to convert my simple logic to the suggested tf.data.Dataset.interleave(...).batch(batch_size). That is, I don't understand what to write for interleave. Maybe, if I use Dataset, my problem will be easily resolved?










share|improve this question



















  • 1




    as far as using tf.data, since you have stated you have 365 images, I do not think interleave method of tf.data is necessary, as in general it is used to interleave elements of datasets to form one dataset. You can build a data pipeline through the standard methods that will work very well. This tutorial can help you with getting an introduction to that. They have a basic image pipeline set up using tf.data!
    – kvish
    Nov 12 at 19:03














up vote
0
down vote

favorite












Similar to Tensorflow batch_join's allow_smaller_final_batch doesn't work?, I want to pass a batch of images to TensorFlow.



I have 365 images on disk, and my batch size is 100. This means, that the last run must take 65 images. But I cannot achieve that.



Here is what I succeed to do, reproducing the Eypros' answer:



for _ in range(nthreads):
image_list = load_images(input_queue.dequeue())

image_batch = tf.train.batch_join(image_list, batch_size=100,
enqueue_many=True, allow_smaller_final_batch=True,
capacity=10)
for n in range(3):
print (n, len(sess.run([image_batch])))

coord.request_stop()
coord.join(threads)

print (n+1, len(sess.run([image_batch ])))


I get the expected




0 100

1 100

2 100

3 10




But if I set capacity to 65, I don't get the desired 65 files in the last batch, I only get 20. I must add that this happpens when nthreads = 4. When I reduce the number of trheads, the outcome is even worse.



What I tried to do, was to query the input queue, and sleep a bit before coord.request_stop().



numq = sess.run(input_queue.size())
print ('after ', n, ' batches, input_queue size:', numq)
if numq > 0:
time.sleep(0.08)
numq = sess.run(input_queue.size())
print ('after sleep, input_queue size:', numq)


This helps a bit, but if sleep is too long (i.e. the input queue becomes 0), my last sess.run() gets infinitely stuck. I don't know why.



I hate the sleep() hack. I am looking for a clean, efficient and reliable way to consume all images in multithreaded session.



I notice that tf.train.batch_join is deprecated, but I don't know how to convert my simple logic to the suggested tf.data.Dataset.interleave(...).batch(batch_size). That is, I don't understand what to write for interleave. Maybe, if I use Dataset, my problem will be easily resolved?










share|improve this question



















  • 1




    as far as using tf.data, since you have stated you have 365 images, I do not think interleave method of tf.data is necessary, as in general it is used to interleave elements of datasets to form one dataset. You can build a data pipeline through the standard methods that will work very well. This tutorial can help you with getting an introduction to that. They have a basic image pipeline set up using tf.data!
    – kvish
    Nov 12 at 19:03












up vote
0
down vote

favorite









up vote
0
down vote

favorite











Similar to Tensorflow batch_join's allow_smaller_final_batch doesn't work?, I want to pass a batch of images to TensorFlow.



I have 365 images on disk, and my batch size is 100. This means, that the last run must take 65 images. But I cannot achieve that.



Here is what I succeed to do, reproducing the Eypros' answer:



for _ in range(nthreads):
image_list = load_images(input_queue.dequeue())

image_batch = tf.train.batch_join(image_list, batch_size=100,
enqueue_many=True, allow_smaller_final_batch=True,
capacity=10)
for n in range(3):
print (n, len(sess.run([image_batch])))

coord.request_stop()
coord.join(threads)

print (n+1, len(sess.run([image_batch ])))


I get the expected




0 100

1 100

2 100

3 10




But if I set capacity to 65, I don't get the desired 65 files in the last batch, I only get 20. I must add that this happpens when nthreads = 4. When I reduce the number of trheads, the outcome is even worse.



What I tried to do, was to query the input queue, and sleep a bit before coord.request_stop().



numq = sess.run(input_queue.size())
print ('after ', n, ' batches, input_queue size:', numq)
if numq > 0:
time.sleep(0.08)
numq = sess.run(input_queue.size())
print ('after sleep, input_queue size:', numq)


This helps a bit, but if sleep is too long (i.e. the input queue becomes 0), my last sess.run() gets infinitely stuck. I don't know why.



I hate the sleep() hack. I am looking for a clean, efficient and reliable way to consume all images in multithreaded session.



I notice that tf.train.batch_join is deprecated, but I don't know how to convert my simple logic to the suggested tf.data.Dataset.interleave(...).batch(batch_size). That is, I don't understand what to write for interleave. Maybe, if I use Dataset, my problem will be easily resolved?










share|improve this question















Similar to Tensorflow batch_join's allow_smaller_final_batch doesn't work?, I want to pass a batch of images to TensorFlow.



I have 365 images on disk, and my batch size is 100. This means, that the last run must take 65 images. But I cannot achieve that.



Here is what I succeed to do, reproducing the Eypros' answer:



for _ in range(nthreads):
image_list = load_images(input_queue.dequeue())

image_batch = tf.train.batch_join(image_list, batch_size=100,
enqueue_many=True, allow_smaller_final_batch=True,
capacity=10)
for n in range(3):
print (n, len(sess.run([image_batch])))

coord.request_stop()
coord.join(threads)

print (n+1, len(sess.run([image_batch ])))


I get the expected




0 100

1 100

2 100

3 10




But if I set capacity to 65, I don't get the desired 65 files in the last batch, I only get 20. I must add that this happpens when nthreads = 4. When I reduce the number of trheads, the outcome is even worse.



What I tried to do, was to query the input queue, and sleep a bit before coord.request_stop().



numq = sess.run(input_queue.size())
print ('after ', n, ' batches, input_queue size:', numq)
if numq > 0:
time.sleep(0.08)
numq = sess.run(input_queue.size())
print ('after sleep, input_queue size:', numq)


This helps a bit, but if sleep is too long (i.e. the input queue becomes 0), my last sess.run() gets infinitely stuck. I don't know why.



I hate the sleep() hack. I am looking for a clean, efficient and reliable way to consume all images in multithreaded session.



I notice that tf.train.batch_join is deprecated, but I don't know how to convert my simple logic to the suggested tf.data.Dataset.interleave(...).batch(batch_size). That is, I don't understand what to write for interleave. Maybe, if I use Dataset, my problem will be easily resolved?







python tensorflow tensorflow-datasets






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 12 at 21:28

























asked Nov 11 at 19:06









Alex Cohn

40.6k551183




40.6k551183







  • 1




    as far as using tf.data, since you have stated you have 365 images, I do not think interleave method of tf.data is necessary, as in general it is used to interleave elements of datasets to form one dataset. You can build a data pipeline through the standard methods that will work very well. This tutorial can help you with getting an introduction to that. They have a basic image pipeline set up using tf.data!
    – kvish
    Nov 12 at 19:03












  • 1




    as far as using tf.data, since you have stated you have 365 images, I do not think interleave method of tf.data is necessary, as in general it is used to interleave elements of datasets to form one dataset. You can build a data pipeline through the standard methods that will work very well. This tutorial can help you with getting an introduction to that. They have a basic image pipeline set up using tf.data!
    – kvish
    Nov 12 at 19:03







1




1




as far as using tf.data, since you have stated you have 365 images, I do not think interleave method of tf.data is necessary, as in general it is used to interleave elements of datasets to form one dataset. You can build a data pipeline through the standard methods that will work very well. This tutorial can help you with getting an introduction to that. They have a basic image pipeline set up using tf.data!
– kvish
Nov 12 at 19:03




as far as using tf.data, since you have stated you have 365 images, I do not think interleave method of tf.data is necessary, as in general it is used to interleave elements of datasets to form one dataset. You can build a data pipeline through the standard methods that will work very well. This tutorial can help you with getting an introduction to that. They have a basic image pipeline set up using tf.data!
– kvish
Nov 12 at 19:03

















active

oldest

votes











Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53252175%2ftensorflow-batch-join-read-all-images-in-last-batch%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown






























active

oldest

votes













active

oldest

votes









active

oldest

votes






active

oldest

votes















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53252175%2ftensorflow-batch-join-read-all-images-in-last-batch%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







這個網誌中的熱門文章

Barbados

How to read a connectionString WITH PROVIDER in .NET Core?

Node.js Script on GitHub Pages or Amazon S3