How to embed tokenization #2371

piotrkowalczuk · 2024-10-19T13:06:42Z

How to embed tokenization ❓

Models created using the Create ML app provide this sleek API that hides some complexity:

/// Model Prediction Input Type
@available(macOS 10.14, iOS 12.0, tvOS 12.0, watchOS 5.0, visionOS 1.0, *)
class ExampleClassifierInput : MLFeatureProvider {

    /// Input text as string value
    var text: String

    var featureNames: Set<String> { ["text"] }

    func featureValue(for featureName: String) -> MLFeatureValue? {
        if featureName == "text" {
            return MLFeatureValue(string: text)
        }
        return nil
    }

    init(text: String) {
        self.text = text
    }
}

While trying to convert the model:

def build_inference_model(weights):
    text = tf.keras.Input(shape=(), dtype=tf.string, name='text')
    input_ids = tf.keras.layers.TextVectorization(
        output_mode='int',
        output_sequence_length=MAX_LENGTH,
        vocabulary='bert_vocabulary.txt',
    )(text)
    attention_mask = tf.cast(tf.not_equal(input_ids, 0), dtype=tf.int32)

    bert = transformers.TFDistilBertForSequenceClassification.from_pretrained('distilbert-base-cased',
        num_labels=len(class_labels))
    bert.set_weights(weights)

    predictions = bert(input_ids=input_ids, attention_mask=attention_mask)
    predicted_label = ArgmaxAndLabelMappingLayer(class_labels, name="category")(predictions.logits)

    model = tf.keras.Model(inputs=text, outputs=predicted_label, name="classifier")

    model.summary()

    return model

I encountered this error TypeError: dtype=<class 'coremltools.converters.mil.mil.types.type_str.str'> is unsupported for inputs/outputs of the model.

What do you think is the best way to handle my use case? How far the coremltools library can take me, without me having to create a Swift Package that will wrap the model?

The text was updated successfully, but these errors were encountered:

TobyRoseman · 2024-10-21T20:59:39Z

Hi @piotrkowalczuk - I'm confused here. What are you trying to do here? Do you have a Core ML model and you're trying to get predictions from it in Python?

piotrkowalczuk · 2024-10-21T21:43:43Z

I have a Tensorflow model that I am trying to convert using the Python tooling. I found that conversion has some limitations. I want a user experience similar to what the Create ML.app offers. A similar text classifier trained using the Create ML.app, has a tokenizer included. Such model is easier to distribute.

piotrkowalczuk added the question Response providing clarification needed. Will not be assigned to a release. (type) label Oct 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to embed tokenization #2371

How to embed tokenization #2371

piotrkowalczuk commented Oct 19, 2024

TobyRoseman commented Oct 21, 2024

piotrkowalczuk commented Oct 21, 2024

How to embed tokenization #2371

How to embed tokenization #2371

Comments

piotrkowalczuk commented Oct 19, 2024