Keras train and validation metric values are different even when using same data (Logistic regression)









up vote
1
down vote

favorite












I have been trying to better understand the train/validation sequence in the keras model fit() loop. So I tried out a simple training loop where I attempted to fit a simple logistic regression model with input data consisting of a single feature.



I feed the same data for both training and validation. Under those conditions, and by specifying batch size to be the same and total data size, one would expect to obtain exactly the same loss and accuracy. But this is not the case.



Here is my code:



Generate some two random data with two classes:



N = 100
x = np.concatenate([np.random.randn(N//2, 1), np.random.randn(N//2, 1)+2])
y = np.concatenate([np.zeros(N//2), np.ones(N//2)])


And plotting the two class data distribution (one feature x):



data = pd.DataFrame('x': x.ravel(), 'y': y)
sns.violinplot(x='x', y='y', inner='point', data=data, orient='h')
pyplot.tight_layout(0)
pyplot.show()


enter image description here



Build and fit the keras model:



model = tf.keras.Sequential([tf.keras.layers.Dense(1, activation='sigmoid', input_dim=1)])
model.compile(optimizer=tf.keras.optimizers.SGD(2), loss='binary_crossentropy', metrics=['accuracy'])
model.fit(x, y, epochs=10, validation_data=(x, y), batch_size=N)


Notice that I have specified the data x and targets y for both training and for validation_data. Also, the batch_size is same as total size batch_size=N.



The training results are:



100/100 [==============================] - 1s 5ms/step - loss: 1.4500 - acc: 0.2300 - val_loss: 0.5439 - val_acc: 0.7200
Epoch 2/10
100/100 [==============================] - 0s 18us/step - loss: 0.5439 - acc: 0.7200 - val_loss: 0.4408 - val_acc: 0.8000
Epoch 3/10
100/100 [==============================] - 0s 16us/step - loss: 0.4408 - acc: 0.8000 - val_loss: 0.3922 - val_acc: 0.8300
Epoch 4/10
100/100 [==============================] - 0s 16us/step - loss: 0.3922 - acc: 0.8300 - val_loss: 0.3659 - val_acc: 0.8400
Epoch 5/10
100/100 [==============================] - 0s 17us/step - loss: 0.3659 - acc: 0.8400 - val_loss: 0.3483 - val_acc: 0.8500
Epoch 6/10
100/100 [==============================] - 0s 16us/step - loss: 0.3483 - acc: 0.8500 - val_loss: 0.3356 - val_acc: 0.8600
Epoch 7/10
100/100 [==============================] - 0s 17us/step - loss: 0.3356 - acc: 0.8600 - val_loss: 0.3260 - val_acc: 0.8600
Epoch 8/10
100/100 [==============================] - 0s 18us/step - loss: 0.3260 - acc: 0.8600 - val_loss: 0.3186 - val_acc: 0.8600
Epoch 9/10
100/100 [==============================] - 0s 18us/step - loss: 0.3186 - acc: 0.8600 - val_loss: 0.3127 - val_acc: 0.8700
Epoch 10/10
100/100 [==============================] - 0s 23us/step - loss: 0.3127 - acc: 0.8700 - val_loss: 0.3079 - val_acc: 0.8800


The results show that val_loss and loss are not the same at the end of each epoch, and also acc and val_acc are not exactly the same. However, based on this setup, one would expect them to be the same.



I have been going through the code in keras, particularly this part:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/engine/training.py#L1364



and so far, all I can say that the difference is due to some different computation through the computation graph.



Does anyone has any idea why there would be such difference?










share|improve this question

























    up vote
    1
    down vote

    favorite












    I have been trying to better understand the train/validation sequence in the keras model fit() loop. So I tried out a simple training loop where I attempted to fit a simple logistic regression model with input data consisting of a single feature.



    I feed the same data for both training and validation. Under those conditions, and by specifying batch size to be the same and total data size, one would expect to obtain exactly the same loss and accuracy. But this is not the case.



    Here is my code:



    Generate some two random data with two classes:



    N = 100
    x = np.concatenate([np.random.randn(N//2, 1), np.random.randn(N//2, 1)+2])
    y = np.concatenate([np.zeros(N//2), np.ones(N//2)])


    And plotting the two class data distribution (one feature x):



    data = pd.DataFrame('x': x.ravel(), 'y': y)
    sns.violinplot(x='x', y='y', inner='point', data=data, orient='h')
    pyplot.tight_layout(0)
    pyplot.show()


    enter image description here



    Build and fit the keras model:



    model = tf.keras.Sequential([tf.keras.layers.Dense(1, activation='sigmoid', input_dim=1)])
    model.compile(optimizer=tf.keras.optimizers.SGD(2), loss='binary_crossentropy', metrics=['accuracy'])
    model.fit(x, y, epochs=10, validation_data=(x, y), batch_size=N)


    Notice that I have specified the data x and targets y for both training and for validation_data. Also, the batch_size is same as total size batch_size=N.



    The training results are:



    100/100 [==============================] - 1s 5ms/step - loss: 1.4500 - acc: 0.2300 - val_loss: 0.5439 - val_acc: 0.7200
    Epoch 2/10
    100/100 [==============================] - 0s 18us/step - loss: 0.5439 - acc: 0.7200 - val_loss: 0.4408 - val_acc: 0.8000
    Epoch 3/10
    100/100 [==============================] - 0s 16us/step - loss: 0.4408 - acc: 0.8000 - val_loss: 0.3922 - val_acc: 0.8300
    Epoch 4/10
    100/100 [==============================] - 0s 16us/step - loss: 0.3922 - acc: 0.8300 - val_loss: 0.3659 - val_acc: 0.8400
    Epoch 5/10
    100/100 [==============================] - 0s 17us/step - loss: 0.3659 - acc: 0.8400 - val_loss: 0.3483 - val_acc: 0.8500
    Epoch 6/10
    100/100 [==============================] - 0s 16us/step - loss: 0.3483 - acc: 0.8500 - val_loss: 0.3356 - val_acc: 0.8600
    Epoch 7/10
    100/100 [==============================] - 0s 17us/step - loss: 0.3356 - acc: 0.8600 - val_loss: 0.3260 - val_acc: 0.8600
    Epoch 8/10
    100/100 [==============================] - 0s 18us/step - loss: 0.3260 - acc: 0.8600 - val_loss: 0.3186 - val_acc: 0.8600
    Epoch 9/10
    100/100 [==============================] - 0s 18us/step - loss: 0.3186 - acc: 0.8600 - val_loss: 0.3127 - val_acc: 0.8700
    Epoch 10/10
    100/100 [==============================] - 0s 23us/step - loss: 0.3127 - acc: 0.8700 - val_loss: 0.3079 - val_acc: 0.8800


    The results show that val_loss and loss are not the same at the end of each epoch, and also acc and val_acc are not exactly the same. However, based on this setup, one would expect them to be the same.



    I have been going through the code in keras, particularly this part:
    https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/engine/training.py#L1364



    and so far, all I can say that the difference is due to some different computation through the computation graph.



    Does anyone has any idea why there would be such difference?










    share|improve this question























      up vote
      1
      down vote

      favorite









      up vote
      1
      down vote

      favorite











      I have been trying to better understand the train/validation sequence in the keras model fit() loop. So I tried out a simple training loop where I attempted to fit a simple logistic regression model with input data consisting of a single feature.



      I feed the same data for both training and validation. Under those conditions, and by specifying batch size to be the same and total data size, one would expect to obtain exactly the same loss and accuracy. But this is not the case.



      Here is my code:



      Generate some two random data with two classes:



      N = 100
      x = np.concatenate([np.random.randn(N//2, 1), np.random.randn(N//2, 1)+2])
      y = np.concatenate([np.zeros(N//2), np.ones(N//2)])


      And plotting the two class data distribution (one feature x):



      data = pd.DataFrame('x': x.ravel(), 'y': y)
      sns.violinplot(x='x', y='y', inner='point', data=data, orient='h')
      pyplot.tight_layout(0)
      pyplot.show()


      enter image description here



      Build and fit the keras model:



      model = tf.keras.Sequential([tf.keras.layers.Dense(1, activation='sigmoid', input_dim=1)])
      model.compile(optimizer=tf.keras.optimizers.SGD(2), loss='binary_crossentropy', metrics=['accuracy'])
      model.fit(x, y, epochs=10, validation_data=(x, y), batch_size=N)


      Notice that I have specified the data x and targets y for both training and for validation_data. Also, the batch_size is same as total size batch_size=N.



      The training results are:



      100/100 [==============================] - 1s 5ms/step - loss: 1.4500 - acc: 0.2300 - val_loss: 0.5439 - val_acc: 0.7200
      Epoch 2/10
      100/100 [==============================] - 0s 18us/step - loss: 0.5439 - acc: 0.7200 - val_loss: 0.4408 - val_acc: 0.8000
      Epoch 3/10
      100/100 [==============================] - 0s 16us/step - loss: 0.4408 - acc: 0.8000 - val_loss: 0.3922 - val_acc: 0.8300
      Epoch 4/10
      100/100 [==============================] - 0s 16us/step - loss: 0.3922 - acc: 0.8300 - val_loss: 0.3659 - val_acc: 0.8400
      Epoch 5/10
      100/100 [==============================] - 0s 17us/step - loss: 0.3659 - acc: 0.8400 - val_loss: 0.3483 - val_acc: 0.8500
      Epoch 6/10
      100/100 [==============================] - 0s 16us/step - loss: 0.3483 - acc: 0.8500 - val_loss: 0.3356 - val_acc: 0.8600
      Epoch 7/10
      100/100 [==============================] - 0s 17us/step - loss: 0.3356 - acc: 0.8600 - val_loss: 0.3260 - val_acc: 0.8600
      Epoch 8/10
      100/100 [==============================] - 0s 18us/step - loss: 0.3260 - acc: 0.8600 - val_loss: 0.3186 - val_acc: 0.8600
      Epoch 9/10
      100/100 [==============================] - 0s 18us/step - loss: 0.3186 - acc: 0.8600 - val_loss: 0.3127 - val_acc: 0.8700
      Epoch 10/10
      100/100 [==============================] - 0s 23us/step - loss: 0.3127 - acc: 0.8700 - val_loss: 0.3079 - val_acc: 0.8800


      The results show that val_loss and loss are not the same at the end of each epoch, and also acc and val_acc are not exactly the same. However, based on this setup, one would expect them to be the same.



      I have been going through the code in keras, particularly this part:
      https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/engine/training.py#L1364



      and so far, all I can say that the difference is due to some different computation through the computation graph.



      Does anyone has any idea why there would be such difference?










      share|improve this question













      I have been trying to better understand the train/validation sequence in the keras model fit() loop. So I tried out a simple training loop where I attempted to fit a simple logistic regression model with input data consisting of a single feature.



      I feed the same data for both training and validation. Under those conditions, and by specifying batch size to be the same and total data size, one would expect to obtain exactly the same loss and accuracy. But this is not the case.



      Here is my code:



      Generate some two random data with two classes:



      N = 100
      x = np.concatenate([np.random.randn(N//2, 1), np.random.randn(N//2, 1)+2])
      y = np.concatenate([np.zeros(N//2), np.ones(N//2)])


      And plotting the two class data distribution (one feature x):



      data = pd.DataFrame('x': x.ravel(), 'y': y)
      sns.violinplot(x='x', y='y', inner='point', data=data, orient='h')
      pyplot.tight_layout(0)
      pyplot.show()


      enter image description here



      Build and fit the keras model:



      model = tf.keras.Sequential([tf.keras.layers.Dense(1, activation='sigmoid', input_dim=1)])
      model.compile(optimizer=tf.keras.optimizers.SGD(2), loss='binary_crossentropy', metrics=['accuracy'])
      model.fit(x, y, epochs=10, validation_data=(x, y), batch_size=N)


      Notice that I have specified the data x and targets y for both training and for validation_data. Also, the batch_size is same as total size batch_size=N.



      The training results are:



      100/100 [==============================] - 1s 5ms/step - loss: 1.4500 - acc: 0.2300 - val_loss: 0.5439 - val_acc: 0.7200
      Epoch 2/10
      100/100 [==============================] - 0s 18us/step - loss: 0.5439 - acc: 0.7200 - val_loss: 0.4408 - val_acc: 0.8000
      Epoch 3/10
      100/100 [==============================] - 0s 16us/step - loss: 0.4408 - acc: 0.8000 - val_loss: 0.3922 - val_acc: 0.8300
      Epoch 4/10
      100/100 [==============================] - 0s 16us/step - loss: 0.3922 - acc: 0.8300 - val_loss: 0.3659 - val_acc: 0.8400
      Epoch 5/10
      100/100 [==============================] - 0s 17us/step - loss: 0.3659 - acc: 0.8400 - val_loss: 0.3483 - val_acc: 0.8500
      Epoch 6/10
      100/100 [==============================] - 0s 16us/step - loss: 0.3483 - acc: 0.8500 - val_loss: 0.3356 - val_acc: 0.8600
      Epoch 7/10
      100/100 [==============================] - 0s 17us/step - loss: 0.3356 - acc: 0.8600 - val_loss: 0.3260 - val_acc: 0.8600
      Epoch 8/10
      100/100 [==============================] - 0s 18us/step - loss: 0.3260 - acc: 0.8600 - val_loss: 0.3186 - val_acc: 0.8600
      Epoch 9/10
      100/100 [==============================] - 0s 18us/step - loss: 0.3186 - acc: 0.8600 - val_loss: 0.3127 - val_acc: 0.8700
      Epoch 10/10
      100/100 [==============================] - 0s 23us/step - loss: 0.3127 - acc: 0.8700 - val_loss: 0.3079 - val_acc: 0.8800


      The results show that val_loss and loss are not the same at the end of each epoch, and also acc and val_acc are not exactly the same. However, based on this setup, one would expect them to be the same.



      I have been going through the code in keras, particularly this part:
      https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/engine/training.py#L1364



      and so far, all I can say that the difference is due to some different computation through the computation graph.



      Does anyone has any idea why there would be such difference?







      python tensorflow machine-learning keras deep-learning






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 12 at 0:23









      Gerges Dib

      2,8831819




      2,8831819






















          1 Answer
          1






          active

          oldest

          votes

















          up vote
          1
          down vote



          accepted










          So after looking more closely at the results, the loss and acc values from the training step are computed BEFORE the current batch is used to update the model.



          Thus, in the case of a single batch per epoch, the train acc and loss are evaluated when the batch is fed in, then the model parameters are updated based on the provided optimizer. After the train step is finished, we compute loss and accuracy by feeding in the validation data, which is now evaluated using a new updated model.



          This is evident from the training results output, where validation accuracy and loss are in epoch 1 are equal to train accuracy and loss in epoch 2, etc...



          A quick check using tensorflow confirmed that values are fetched before variables are updated:



          import tensorflow as tf
          import numpy as np
          np.random.seed(1)

          x = tf.placeholder(dtype=tf.float32, shape=(None, 1), name="x")
          y = tf.placeholder(dtype=tf.float32, shape=(None), name="y")

          W = tf.get_variable(name="W", shape=(1, 1), dtype=tf.float32, initializer=tf.constant_initializer(0))
          b = tf.get_variable(name="b", shape=1, dtype=tf.float32, initializer=tf.constant_initializer(0))
          z = tf.matmul(x, W) + b

          error = tf.square(z - y)
          obj = tf.reduce_mean(error, name="obj")

          opt = tf.train.MomentumOptimizer(learning_rate=0.025, momentum=0.9)
          grads = opt.compute_gradients(obj)
          train_step = opt.apply_gradients(grads)

          N = 100
          x_np = np.random.randn(N).reshape(-1, 1)
          y_np = 2*x_np + 3 + np.random.randn(N)

          with tf.Session() as sess:
          sess.run(tf.global_variables_initializer())
          for i in range(2):
          res = sess.run([obj, W, b, train_step], feed_dict=x: x_np, y: y_np)
          print('MSE: , W: , b: '.format(res[0], res[1][0, 0], res[2][0]))


          Output:



          MSE: 14.721437454223633, W: 0.0, b: 0.0
          MSE: 13.372591018676758, W: 0.08826743811368942, b: 0.1636980175971985


          Since the parameters W and b were initialized to 0, then it is clear that the fetched values is still 0 even though session was run with gradient update request...






          share|improve this answer




















            Your Answer






            StackExchange.ifUsing("editor", function ()
            StackExchange.using("externalEditor", function ()
            StackExchange.using("snippets", function ()
            StackExchange.snippets.init();
            );
            );
            , "code-snippets");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "1"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53254586%2fkeras-train-and-validation-metric-values-are-different-even-when-using-same-data%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            1
            down vote



            accepted










            So after looking more closely at the results, the loss and acc values from the training step are computed BEFORE the current batch is used to update the model.



            Thus, in the case of a single batch per epoch, the train acc and loss are evaluated when the batch is fed in, then the model parameters are updated based on the provided optimizer. After the train step is finished, we compute loss and accuracy by feeding in the validation data, which is now evaluated using a new updated model.



            This is evident from the training results output, where validation accuracy and loss are in epoch 1 are equal to train accuracy and loss in epoch 2, etc...



            A quick check using tensorflow confirmed that values are fetched before variables are updated:



            import tensorflow as tf
            import numpy as np
            np.random.seed(1)

            x = tf.placeholder(dtype=tf.float32, shape=(None, 1), name="x")
            y = tf.placeholder(dtype=tf.float32, shape=(None), name="y")

            W = tf.get_variable(name="W", shape=(1, 1), dtype=tf.float32, initializer=tf.constant_initializer(0))
            b = tf.get_variable(name="b", shape=1, dtype=tf.float32, initializer=tf.constant_initializer(0))
            z = tf.matmul(x, W) + b

            error = tf.square(z - y)
            obj = tf.reduce_mean(error, name="obj")

            opt = tf.train.MomentumOptimizer(learning_rate=0.025, momentum=0.9)
            grads = opt.compute_gradients(obj)
            train_step = opt.apply_gradients(grads)

            N = 100
            x_np = np.random.randn(N).reshape(-1, 1)
            y_np = 2*x_np + 3 + np.random.randn(N)

            with tf.Session() as sess:
            sess.run(tf.global_variables_initializer())
            for i in range(2):
            res = sess.run([obj, W, b, train_step], feed_dict=x: x_np, y: y_np)
            print('MSE: , W: , b: '.format(res[0], res[1][0, 0], res[2][0]))


            Output:



            MSE: 14.721437454223633, W: 0.0, b: 0.0
            MSE: 13.372591018676758, W: 0.08826743811368942, b: 0.1636980175971985


            Since the parameters W and b were initialized to 0, then it is clear that the fetched values is still 0 even though session was run with gradient update request...






            share|improve this answer
























              up vote
              1
              down vote



              accepted










              So after looking more closely at the results, the loss and acc values from the training step are computed BEFORE the current batch is used to update the model.



              Thus, in the case of a single batch per epoch, the train acc and loss are evaluated when the batch is fed in, then the model parameters are updated based on the provided optimizer. After the train step is finished, we compute loss and accuracy by feeding in the validation data, which is now evaluated using a new updated model.



              This is evident from the training results output, where validation accuracy and loss are in epoch 1 are equal to train accuracy and loss in epoch 2, etc...



              A quick check using tensorflow confirmed that values are fetched before variables are updated:



              import tensorflow as tf
              import numpy as np
              np.random.seed(1)

              x = tf.placeholder(dtype=tf.float32, shape=(None, 1), name="x")
              y = tf.placeholder(dtype=tf.float32, shape=(None), name="y")

              W = tf.get_variable(name="W", shape=(1, 1), dtype=tf.float32, initializer=tf.constant_initializer(0))
              b = tf.get_variable(name="b", shape=1, dtype=tf.float32, initializer=tf.constant_initializer(0))
              z = tf.matmul(x, W) + b

              error = tf.square(z - y)
              obj = tf.reduce_mean(error, name="obj")

              opt = tf.train.MomentumOptimizer(learning_rate=0.025, momentum=0.9)
              grads = opt.compute_gradients(obj)
              train_step = opt.apply_gradients(grads)

              N = 100
              x_np = np.random.randn(N).reshape(-1, 1)
              y_np = 2*x_np + 3 + np.random.randn(N)

              with tf.Session() as sess:
              sess.run(tf.global_variables_initializer())
              for i in range(2):
              res = sess.run([obj, W, b, train_step], feed_dict=x: x_np, y: y_np)
              print('MSE: , W: , b: '.format(res[0], res[1][0, 0], res[2][0]))


              Output:



              MSE: 14.721437454223633, W: 0.0, b: 0.0
              MSE: 13.372591018676758, W: 0.08826743811368942, b: 0.1636980175971985


              Since the parameters W and b were initialized to 0, then it is clear that the fetched values is still 0 even though session was run with gradient update request...






              share|improve this answer






















                up vote
                1
                down vote



                accepted







                up vote
                1
                down vote



                accepted






                So after looking more closely at the results, the loss and acc values from the training step are computed BEFORE the current batch is used to update the model.



                Thus, in the case of a single batch per epoch, the train acc and loss are evaluated when the batch is fed in, then the model parameters are updated based on the provided optimizer. After the train step is finished, we compute loss and accuracy by feeding in the validation data, which is now evaluated using a new updated model.



                This is evident from the training results output, where validation accuracy and loss are in epoch 1 are equal to train accuracy and loss in epoch 2, etc...



                A quick check using tensorflow confirmed that values are fetched before variables are updated:



                import tensorflow as tf
                import numpy as np
                np.random.seed(1)

                x = tf.placeholder(dtype=tf.float32, shape=(None, 1), name="x")
                y = tf.placeholder(dtype=tf.float32, shape=(None), name="y")

                W = tf.get_variable(name="W", shape=(1, 1), dtype=tf.float32, initializer=tf.constant_initializer(0))
                b = tf.get_variable(name="b", shape=1, dtype=tf.float32, initializer=tf.constant_initializer(0))
                z = tf.matmul(x, W) + b

                error = tf.square(z - y)
                obj = tf.reduce_mean(error, name="obj")

                opt = tf.train.MomentumOptimizer(learning_rate=0.025, momentum=0.9)
                grads = opt.compute_gradients(obj)
                train_step = opt.apply_gradients(grads)

                N = 100
                x_np = np.random.randn(N).reshape(-1, 1)
                y_np = 2*x_np + 3 + np.random.randn(N)

                with tf.Session() as sess:
                sess.run(tf.global_variables_initializer())
                for i in range(2):
                res = sess.run([obj, W, b, train_step], feed_dict=x: x_np, y: y_np)
                print('MSE: , W: , b: '.format(res[0], res[1][0, 0], res[2][0]))


                Output:



                MSE: 14.721437454223633, W: 0.0, b: 0.0
                MSE: 13.372591018676758, W: 0.08826743811368942, b: 0.1636980175971985


                Since the parameters W and b were initialized to 0, then it is clear that the fetched values is still 0 even though session was run with gradient update request...






                share|improve this answer












                So after looking more closely at the results, the loss and acc values from the training step are computed BEFORE the current batch is used to update the model.



                Thus, in the case of a single batch per epoch, the train acc and loss are evaluated when the batch is fed in, then the model parameters are updated based on the provided optimizer. After the train step is finished, we compute loss and accuracy by feeding in the validation data, which is now evaluated using a new updated model.



                This is evident from the training results output, where validation accuracy and loss are in epoch 1 are equal to train accuracy and loss in epoch 2, etc...



                A quick check using tensorflow confirmed that values are fetched before variables are updated:



                import tensorflow as tf
                import numpy as np
                np.random.seed(1)

                x = tf.placeholder(dtype=tf.float32, shape=(None, 1), name="x")
                y = tf.placeholder(dtype=tf.float32, shape=(None), name="y")

                W = tf.get_variable(name="W", shape=(1, 1), dtype=tf.float32, initializer=tf.constant_initializer(0))
                b = tf.get_variable(name="b", shape=1, dtype=tf.float32, initializer=tf.constant_initializer(0))
                z = tf.matmul(x, W) + b

                error = tf.square(z - y)
                obj = tf.reduce_mean(error, name="obj")

                opt = tf.train.MomentumOptimizer(learning_rate=0.025, momentum=0.9)
                grads = opt.compute_gradients(obj)
                train_step = opt.apply_gradients(grads)

                N = 100
                x_np = np.random.randn(N).reshape(-1, 1)
                y_np = 2*x_np + 3 + np.random.randn(N)

                with tf.Session() as sess:
                sess.run(tf.global_variables_initializer())
                for i in range(2):
                res = sess.run([obj, W, b, train_step], feed_dict=x: x_np, y: y_np)
                print('MSE: , W: , b: '.format(res[0], res[1][0, 0], res[2][0]))


                Output:



                MSE: 14.721437454223633, W: 0.0, b: 0.0
                MSE: 13.372591018676758, W: 0.08826743811368942, b: 0.1636980175971985


                Since the parameters W and b were initialized to 0, then it is clear that the fetched values is still 0 even though session was run with gradient update request...







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Nov 12 at 5:17









                Gerges Dib

                2,8831819




                2,8831819



























                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    To learn more, see our tips on writing great answers.





                    Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                    Please pay close attention to the following guidance:


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53254586%2fkeras-train-and-validation-metric-values-are-different-even-when-using-same-data%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    這個網誌中的熱門文章

                    How to read a connectionString WITH PROVIDER in .NET Core?

                    In R, how to develop a multiplot heatmap.2 figure showing key labels successfully

                    Museum of Modern and Contemporary Art of Trento and Rovereto