Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keras functional API can not save right weights in h5 files #40

Open
jiayugedede opened this issue Jul 31, 2023 · 2 comments
Open

Keras functional API can not save right weights in h5 files #40

jiayugedede opened this issue Jul 31, 2023 · 2 comments
Assignees

Comments

@jiayugedede
Copy link

jiayugedede commented Jul 31, 2023

Please go to TF Forum for help and support:

https://discuss.tensorflow.org/tag/keras

If you open a GitHub issue, here is our policy:

It must be a bug, a feature request, or a significant problem with the documentation (for small docs fixes please send a PR instead).
The form below must be filled out.

Here's why we have that policy:.

Keras developers respond to issues. We want to focus on work that benefits the whole community, e.g., fixing bugs and adding features. Support only helps individuals. GitHub also notifies thousands of people when issues are filed. We want them to see you communicating an interesting problem, rather than being redirected to Stack Overflow.

System information.

  • Have I written custom code (as opposed to using a stock example script provided in Keras):Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): ubuntu 20.04 LST
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below): 2.11.0
  • Python version: 3.9.0
  • Bazel version (if compiling from source):
  • GPU model and memory: RTX 3090, 24GB memory.
  • Exact command to reproduce:

I have used functional API to build a ResNet neural network algorithm. However, this constructed model can not save the neural networks' weights in an appropriate result. The validated result was very good. But the inference result is very bad.

When I use tf.config.run_functions_eagerly(True) in the model training stage, the inference result is very good. Otherwise, the inference result was very bad. To tackle this problem, I have searched some sample code of Keras.applications. Bud, not working in model weight save.

The implemented ResNet code is shown in follow:

Describe the problem.

import tensorflow.compat.v2 as tf
from keras.regularizers import l2
from keras import layers
from keras.engine import sequential
from keras.engine import training as training_lib
import keras as K

identitys=None


def Bottleneck(inputs, out_channel, name, downsample, strides=1):
    expansion = 4
    key = out_channel * expansion
    identity = inputs
    global  identitys

    if downsample:
        identitys = layers.Conv2D(key, kernel_size=1, strides=strides,
                                  use_bias=False, kernel_initializer='he_normal',
                                  padding="SAME", kernel_regularizer=l2(1.e-5), name=name + "ds_conv")(identity)
        identitys = layers.BatchNormalization(momentum=0.9, epsilon=1e-5, name=name + "ds_normal")(identitys)
    else:
        identitys = inputs

    xb = layers.Conv2D(out_channel, kernel_size=1, use_bias=False, kernel_initializer='he_normal',
                       kernel_regularizer=l2(1.e-4), name=name + "Conv2D_1")(inputs)
    xb = layers.BatchNormalization(momentum=0.9,
                                   epsilon=1e-5, name=name + "BN_1")(xb)
    xb = layers.Activation(tf.keras.activations.swish, name=name + "ACT_1")(xb)

    xb = layers.Conv2D(out_channel, kernel_size=3, use_bias=False, strides=strides, padding="SAME",
                       kernel_initializer='he_normal', kernel_regularizer=l2(1.e-4), name=name + "Conv2D_2")(xb)
    xb = layers.BatchNormalization(momentum=0.9,
                                   epsilon=1e-5, name=name + "BN_3")(xb)
    xb = layers.ReLU(name=name + "ReLU")(xb)

    xb = layers.Conv2D(key, kernel_size=1, use_bias=False,
                       kernel_initializer='he_normal',
                       kernel_regularizer=l2(1.e-4),
                       name=name + "Conv2D_3")(xb)

    xb = layers.BatchNormalization(momentum=0.9,
                                   epsilon=1e-5,
                                   name=name + "BN_4")(xb)

    xb = layers.Add(name=name + "addition")([identitys, xb])
    xb = layers.BatchNormalization(momentum=0.9,
                                   epsilon=1e-5,
                                   name=name + "Last_BN")(xb)
    xb = layers.ReLU(name=name + "LastReLU")(xb)

    return xb


def _make_layer(inputs, make_block, channel, block_num, layer_name, strides, down_sample):
    i = 0
    name = layer_name + f"block_{i + 1}_"

    xm = make_block(inputs=inputs, out_channel=channel, name=name, strides=strides, downsample=down_sample)

    for i in range(1, block_num):
        i += 1
        name = layer_name + f"block_{i}_"
        xm = make_block(inputs=xm, out_channel=channel, name=name, strides=1, downsample=False)

    return xm


class ResnetBuilder(object):
    @staticmethod
    def build(block, blocks_num, im_width=224, im_height=224, num_classes=1000):
        img_input = layers.Input(shape=(im_width, im_height, 3),
                                 dtype="float32",
                                 name="layers_inputs")

        x = layers.Conv2D(filters=64, kernel_size=7, strides=2,
                          padding="SAME", use_bias=False,
                          name="layers_conv1")(img_input)  # 把这一行替换成ContourOperator

        x = layers.BatchNormalization(momentum=0.9, epsilon=1e-5, name="x_FBN")(x)
        x = layers.ReLU(name="FReLU")(x)

        x = layers.DepthwiseConv2D(kernel_size=3, padding="SAME", use_bias=False,
                                   depthwise_initializer=tf.keras.initializers.TruncatedNormal(mean=0.0,
                                                                                               stddev=0.05, seed=None)
                                   , kernel_regularizer=l2(1.e-4), name="FDW")(x)

        x = layers.BatchNormalization(momentum=0.9, epsilon=1e-5, name="FBN")(x)

        x = layers.Activation(tf.keras.activations.swish, name="FirstACT")(x)

        x = layers.MaxPool2D(pool_size=3, strides=2, padding="SAME", name="f_MP")(x)

        x = _make_layer(x, block, 64, block_num=blocks_num[0], layer_name="ml1", strides=1, down_sample=True)

        x = _make_layer(x, block, 128, block_num=blocks_num[1], layer_name="ml2", strides=2, down_sample=True)

        x = _make_layer(x, block, 256, block_num=blocks_num[2], layer_name="ml3",  strides=2, down_sample=True)

        x = _make_layer(x, block, 512, block_num=blocks_num[3], layer_name="ml4",  strides=2, down_sample=True)

        x = layers.GlobalAvgPool2D(name="GAP2D")(x)  # pool + flatten

        x = layers.Dense(num_classes, name="logits")(x)

        predict = layers.Softmax(name="SoftMax")(x)

        model = training_lib.Model(inputs=img_input,
                                   outputs=predict, name="model")


        return model

    @staticmethod
    def resnet101(im_width=448, im_height=448,
                  include_top=True, num_classes=5):
        return ResnetBuilder.build(Bottleneck, [3, 4, 23, 3],
                                   im_width, im_height, num_classes)

    @staticmethod
    def resnet50(im_width=448, im_height=448,
                 include_top=True,
                 num_classes=5, **kwargs):
        return ResnetBuilder.build(Bottleneck, [3, 4, 6, 3],
                                   im_width, im_height, num_classes)

Describe the problem clearly here. Be sure to convey here why it's a bug in Keras or why the requested feature is needed.

Describe the current behavior.

The inference result can not match the validation result.
The inference code is shown in follow:

from keras import Model
from keras.utils import image_utils
import tensorflow as tf
import numpy as np
import os
from test_code import ResnetBuilder

gpus = tf.config.experimental.list_physical_devices(device_type='GPU')
tf.config.experimental.set_visible_devices(devices=gpus[1], device_type='GPU')


def preprocess_image(img_path, target_size=(448, 448)):
    """Preprocess the image by reshape and normalization.

    Args:
        img_path:  A string.
        target_size: A tuple, reshape to this size.
    Return:
        An image ndarray.
    """
    img = image_utils.load_img(img_path, target_size=target_size)
    img = image_utils.img_to_array(img)
    img /= 255.0

    return img


def load_trained_model():
    model = ResnetBuilder.resnet50(448, 448, 5)
    model_name = r"./model.30-.h5"
    model.load_weights(model_name, by_name=True)
    print('model load success.')
    return model


def get_category_name(full_image_path, model):
    img = preprocess_image(full_image_path)
    img_tensor = np.expand_dims(img, axis=0)

    heatmap_model = Model([model.inputs], [model.output])

    predictions = heatmap_model(img_tensor)
    category_id = np.argmax(predictions[0])
    label_name = ['A1', 'A2', 'A3', "A4", "A5"]
    category_name = label_name[category_id]

    return category_name


model = load_trained_model()
model.summary()

image_folder = r"[Image_Path]"
save_path = r"[Save_Path]"

name_list = os.listdir(image_folder)
for file_name in name_list:
    full_image_name = image_folder + "/" + file_name
    category_name = get_category_name(full_image_name, model)
    save_name = category_name + "_" + file_name # just print result, not save image with a new name.
    print(save_name)

Describe the expected behavior.

Saving the right weights in model.h5 files, and inference the right result from the weight model.

Standalone code to reproduce the issue.

Provide a reproducible test case that is the bare minimum necessary to generate
the problem. If possible, please share a link to Colab/Jupyter/any notebook.

Sorry, inconvenient to provide

@tilakrayal
Copy link
Collaborator

@jiayugedede,
I was facing a different issue/error while executing the above mentioned code. Kindly find the gist of it here and provide the required code to analyse the issue in an effective way. Thank you!

@jiayugedede
Copy link
Author

import tensorflow.compat.v2 as tf
from keras.regularizers import l2
from keras import layers
from keras.engine import sequential
from keras.engine import training as training_lib
import keras as K

identitys=None


def Bottleneck(inputs, out_channel, name, downsample, strides=1):
    expansion = 4
    key = out_channel * expansion
    identity = inputs
    global  identitys

    if downsample:
        identitys = layers.Conv2D(key, kernel_size=1, strides=strides,
                                  use_bias=False, kernel_initializer='he_normal',
                                  padding="SAME", kernel_regularizer=l2(1.e-5), name=name + "ds_conv")(identity)
        identitys = layers.BatchNormalization(momentum=0.9, epsilon=1e-5, name=name + "ds_normal")(identitys)
    else:
        identitys = inputs

    xb = layers.Conv2D(out_channel, kernel_size=1, use_bias=False, kernel_initializer='he_normal',
                       kernel_regularizer=l2(1.e-4), name=name + "Conv2D_1")(inputs)
    xb = layers.BatchNormalization(momentum=0.9,
                                   epsilon=1e-5, name=name + "BN_1")(xb)
    xb = layers.Activation(tf.keras.activations.swish, name=name + "ACT_1")(xb)

    xb = layers.Conv2D(out_channel, kernel_size=3, use_bias=False, strides=strides, padding="SAME",
                       kernel_initializer='he_normal', kernel_regularizer=l2(1.e-4), name=name + "Conv2D_2")(xb)
    xb = layers.BatchNormalization(momentum=0.9,
                                   epsilon=1e-5, name=name + "BN_3")(xb)
    xb = layers.ReLU(name=name + "ReLU")(xb)

    xb = layers.Conv2D(key, kernel_size=1, use_bias=False,
                       kernel_initializer='he_normal',
                       kernel_regularizer=l2(1.e-4),
                       name=name + "Conv2D_3")(xb)

    xb = layers.BatchNormalization(momentum=0.9,
                                   epsilon=1e-5,
                                   name=name + "BN_4")(xb)

    xb = layers.Add(name=name + "addition")([identitys, xb])
    xb = layers.BatchNormalization(momentum=0.9,
                                   epsilon=1e-5,
                                   name=name + "Last_BN")(xb)
    xb = layers.ReLU(name=name + "LastReLU")(xb)

    return xb


def _make_layer(inputs, make_block, channel, block_num, layer_name, strides, down_sample):
    i = 0
    name = layer_name + f"block_{i + 1}_"

    xm = make_block(inputs=inputs, out_channel=channel, name=name, strides=strides, downsample=down_sample)

    for i in range(1, block_num):
        i += 1
        name = layer_name + f"block_{i}_"
        xm = make_block(inputs=xm, out_channel=channel, name=name, strides=1, downsample=False)

    return xm


class ResnetBuilder(object):
    @staticmethod
    def build(block, blocks_num, im_width=224, im_height=224, num_classes=1000):
        img_input = layers.Input(shape=(im_width, im_height, 1),
                                 dtype="float32",
                                 name="layers_inputs")

        x = layers.Conv2D(filters=64, kernel_size=7, strides=2,
                          padding="SAME", use_bias=False,
                          name="layers_conv1")(img_input)  # 把这一行替换成ContourOperator

        x = layers.BatchNormalization(momentum=0.9, epsilon=1e-5, name="x_FBN")(x)
        x = layers.ReLU(name="FReLU")(x)

        x = layers.DepthwiseConv2D(kernel_size=3, padding="SAME", use_bias=False,
                                   depthwise_initializer=tf.keras.initializers.TruncatedNormal(mean=0.0,
                                                                                               stddev=0.05, seed=None)
                                   , kernel_regularizer=l2(1.e-4), name="FDW")(x)

        x = layers.BatchNormalization(momentum=0.9, epsilon=1e-5, name="FBN")(x)

        x = layers.Activation(tf.keras.activations.swish, name="FirstACT")(x)

        x = layers.MaxPool2D(pool_size=3, strides=2, padding="SAME", name="f_MP")(x)

        x = _make_layer(x, block, 64, block_num=blocks_num[0], layer_name="ml1", strides=1, down_sample=True)

        x = _make_layer(x, block, 128, block_num=blocks_num[1], layer_name="ml2", strides=2, down_sample=True)

        x = _make_layer(x, block, 256, block_num=blocks_num[2], layer_name="ml3",  strides=2, down_sample=True)

        x = _make_layer(x, block, 512, block_num=blocks_num[3], layer_name="ml4",  strides=2, down_sample=True)

        x = layers.GlobalAvgPool2D(name="GAP2D")(x)  # pool + flatten

        x = layers.Dense(num_classes, name="logits")(x)

        predict = layers.Softmax(name="SoftMax")(x)

        model = training_lib.Model(inputs=img_input,
                                   outputs=predict, name="model")


        return model

    @staticmethod
    def resnet101(im_width=448, im_height=448,
                  include_top=True, num_classes=5):
        return ResnetBuilder.build(Bottleneck, [3, 4, 23, 3],
                                   im_width, im_height, num_classes)

    @staticmethod
    def resnet50(im_width=448, im_height=448,
                 include_top=True,
                 num_classes=5, **kwargs):
        return ResnetBuilder.build(Bottleneck, [3, 4, 6, 3],
                                   im_width, im_height, num_classes)
import warnings
import numpy as np
from keras.callbacks import Callback


class MultiGPUCheckpointCallback(Callback):
    def __init__(self, filepath, base_model, monitor='val_loss', verbose=0,
                 save_best_only=False, save_weights_only=False,
                 mode='auto', period=1):
        super(MultiGPUCheckpointCallback, self).__init__()
        self.base_model = base_model
        self.monitor = monitor
        self.verbose = verbose
        self.filepath = filepath
        self.save_best_only = save_best_only
        self.save_weights_only = save_weights_only
        self.period = period
        self.epochs_since_last_save = 0

        if mode not in ['auto', 'min', 'max']:
            warnings.warn('ModelCheckpoint mode %s is unknown, '
                          'fallback to auto mode.' % (mode),
                          RuntimeWarning)
            mode = 'auto'

        if mode == 'min':
            self.monitor_op = np.less
            self.best = np.Inf
        elif mode == 'max':
            self.monitor_op = np.greater
            self.best = -np.Inf
        else:
            if 'acc' in self.monitor or self.monitor.startswith('fmeasure'):
                self.monitor_op = np.greater
                self.best = -np.Inf
            else:
                self.monitor_op = np.less
                self.best = np.Inf

    def on_epoch_end(self, epoch, logs=None):
        logs = logs or {}
        self.epochs_since_last_save += 1
        if self.epochs_since_last_save >= self.period:
            self.epochs_since_last_save = 0
            filepath = self.filepath.format(epoch=epoch + 1, **logs)
            if self.save_best_only:
                current = logs.get(self.monitor)
                if current is None:
                    warnings.warn('Can save best model only with %s available, '
                                  'skipping.' % (self.monitor), RuntimeWarning)
                else:
                    if self.monitor_op(current, self.best):
                        if self.verbose > 0:
                            print('Epoch %05d: %s improved from %0.5f to %0.5f,'
                                  ' saving model to %s'
                                  % (epoch + 1, self.monitor, self.best,
                                     current, filepath))
                        self.best = current
                        if self.save_weights_only:
                            self.base_model.save_weights(filepath, overwrite=True)
                        else:
                            self.base_model.save(filepath, overwrite=True)
                    else:
                        if self.verbose > 0:
                            print('Epoch %05d: %s did not improve' %
                                  (epoch + 1, self.monitor))
            else:
                if self.verbose > 0:
                    print('Epoch %05d: saving model to %s' % (epoch + 1, filepath))
                if self.save_weights_only:
                    self.base_model.save_weights(filepath, overwrite=True)
                else:
                    self.base_model.save(filepath, overwrite=True)
import tensorflow_addons as tfa
import tensorflow as tf


def scheduler(epoch):
    if epoch < 40:
        return 0.1
    if epoch < 80:
        return 0.06
    if epoch < 150:
        return 0.01
    if epoch < 200:  # 200 epoch之后有一轮准确率提升。
        return 0.006
    if epoch < 230:
        return 0.004
    if epoch < 250:
        return 0.002
    if epoch < 280:
        return 0.001
    if epoch < 300:
        return 0.0006  # 当学习率为0.0006的时候,性能表现最好。
    if epoch < 350:
        return 0.0004
    if epoch < 380:
        return 0.00008
    return 0.00004


# https://tensorflow.google.cn/api_docs/python/tf/keras/optimizers/schedules/PiecewiseConstantDecay?hl=en
def AdamOptimizer():
    step = tf.Variable(0, trainable=False)
    schedule = tf.optimizers.schedules.PiecewiseConstantDecay(
        [100, 150, 200, 250], [1e-4, 8e-5, 4e-5, 1e-5, 1e-6])
    # lr and wd can be a function or a tensor
    lr = schedule(step)
    wd = schedule(step)
    optimizer = tfa.optimizers.AdamW(learning_rate=lr, weight_decay=wd)
    return optimizer

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
@author: sam
"""
import os
import tensorflow as tf

resolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='grpc://' + os.environ['COLAB_TPU_ADDR'])
tf.config.experimental_connect_to_cluster(resolver)
tf.tpu.experimental.initialize_tpu_system(resolver)
strategy = tf.distribute.experimental.TPUStrategy(resolver)



NUM_CLASSES = 10
BatchSize = 128
save_train_path = r"/content/drive/MyDrive/save_weight"
save_train_data = r"/content/drive/MyDrive/save_weight/H5File"
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

# expand the channel dimension
x_train = x_train.reshape(x_train.shape[0], 28, 28, 1)
x_test = x_test.reshape(x_test.shape[0], 28, 28, 1)
input_shape = (28, 28, 1)

# make the value of pixels from [0, 255] to [0, 1] for further process
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.

# convert class vectors to binary class matrics
y_train = tf.keras.utils.to_categorical(y_train, NUM_CLASSES)
y_test = tf.keras.utils.to_categorical(y_test, NUM_CLASSES)

with strategy.scope():

  model = ResnetBuilder.resnet50(im_width=28, im_height=28, num_classes=NUM_CLASSES)
  model.build((BatchSize, 28,  28, 1))
  model.summary()

  recall = tf.keras.metrics.Recall()
  precision = tf.keras.metrics.Precision()

  step = tf.Variable(0, trainable=False)
  schedule = tf.optimizers.schedules.PiecewiseConstantDecay([100, 150, 200, 250], [1e-4, 8e-5, 4e-5, 1e-5, 1e-6])
  # lr and wd can be a function or a tensor
  lr = schedule(step)
  wd = schedule(step)
  optimizer = tfa.optimizers.AdamW(learning_rate=lr, weight_decay=wd)

  model.compile(optimizer=optimizer,loss='categorical_crossentropy', metrics=['accuracy', recall, precision])



checkpoint_path = save_train_data + "/" + "model.{epoch:02d}-" + ".h5"

save_weight = MultiGPUCheckpointCallback(filepath=checkpoint_path,
                                         base_model=model,
                                         save_weights_only=True)

kept = tf.keras.callbacks.ModelCheckpoint(save_train_data + "/" + "model.{epoch:02d}-{val_loss:.4f}-" + ".h5", monitor='val_accuracy', verbose=1, save_best_only=True, mode='max')
model.fit(x_train, y_train, batch_size=BatchSize, epochs=30, verbose=1, validation_data=(x_test, y_test), callbacks=[save_weight])
from keras import Model
from keras.utils import image_utils

# gpus = tf.config.experimental.list_physical_devices(device_type='GPU')
# tf.config.experimental.set_visible_devices(devices=gpus[0], device_type='GPU')


def preprocess_image(img_path, target_size=(28, 28)):
    """Preprocess the image by reshape and normalization.

    Args:
        img_path:  A string.
        target_size: A tuple, reshape to this size.
    Return:
        An image ndarray.
    """
    img = image_utils.load_img(img_path, target_size=target_size, color_mode="grayscale")
    img = image_utils.img_to_array(img)
    img /= 255.0

    return img


def load_trained_model():
  with tf.device('/cpu:0'):
    model = ResnetBuilder.resnet101(28, 28, False, 10)
    model_name = r"/content/drive/MyDrive/save_weight/H5File/model.30-.h5"
    model.load_weights(model_name, by_name=True)
    print('model load success.')
  return model


def get_category_name(full_image_path, model):
    img = preprocess_image(full_image_path)
    img_tensor = np.expand_dims(img, axis=0)

    heatmap_model = Model([model.inputs], [model.output])

    predictions = heatmap_model(img_tensor)
    category_id = np.argmax(predictions[0])
    label_name = ['0', '1', '2', "3", "4", "5", "6", "7", "8", "9"]
    category_name = label_name[category_id]
    return category_name


model = load_trained_model()
model.summary()

image_folder = r"/content/drive/MyDrive/save_weight/mnist_img"

name_list = os.listdir(image_folder)
for file_name in name_list:
    full_image_name = image_folder+"/"+ file_name
    category_name = get_category_name(full_image_name, model)
    file_names = file_name.split("_")
    save_name = "Prediction: "+ category_name + "  label: " + file_names[0]
    print(save_name)

Weight can be found in here https://drive.google.com/file/d/12A6VFM8AmKcthUf6J8p_vwtO4jAzJkWg/view?usp=drive_link
Test image can be found in here https://drive.google.com/drive/folders/1-Dyx73axg7swBHiQKuoSTtvKQkKjlzo7?usp=drive_link

@sachinprasadhs sachinprasadhs transferred this issue from keras-team/keras Sep 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants