Skip to content

Commit

Permalink
Support for TFX 1.4, TF 2.6, Beam 2.33.0 (#58)
Browse files Browse the repository at this point in the history
* ermoved used _float_feature

* Updated dependencies

* updated tfdv notebook

* updated apache beam example

* updated components

* updated interactive notebook, working with tfx 1.4

* airflow updates

* updated beam pipeline example

* kubeflow updates

* updated beam_arg

* added vertex example

* renamed to vertex

* Updated readme
  • Loading branch information
hanneshapke authored Nov 23, 2021
1 parent 35a545a commit c8bf023
Show file tree
Hide file tree
Showing 20 changed files with 24,062 additions and 2,795 deletions.
8 changes: 7 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@

Code repository for the O'Reilly publication ["Building Machine Learning Pipelines"](http://www.buildingmlpipelines.com) by Hannes Hapke & Catherine Nelson

## Update

* The example code has been updated to work with TFX 1.4.0, TensorFlow 2.6.1, and Apache Beam 2.33.0. A GCP Vertex example (training and serving) was added.

## Set up the demo project

Download the initial dataset. From the root of this repository, execute
Expand Down Expand Up @@ -63,7 +67,9 @@ Chapter 14. Code for training a differentially private version of the demo proje

The code was written and tested for version 0.22.

- As of 11/23/21, the examples have been updated to support TFX 1.4.0, TensorFlow 2.6.1, and Apache Beam 2.33.0. A GCP Vertex example (training and serving) was added.

- As of 9/22/20, the interactive pipeline runs on TFX version 0.24.0rc1.
Due to tiny TFX bugs, the pipelines currently don't work on the releases 0.23 and 0.24-rc0. Github issues have been filed with the TFX team specifically for the book pipelines ([Issue 2500](https://github.com/tensorflow/tfx/issues/2500#issuecomment-695363847)). We will update the repository once the issue is resolved.

- As of 9/14/20, TFX only supports Python 3.8 with version >0.24.0rc0.
- As of 9/14/20, TFX only supports Python 3.8 with version >0.24.0rc0.
15 changes: 4 additions & 11 deletions chapters/data_ingestion/convert_data_to_tfrecords.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,7 @@


def _bytes_feature(value):
return tf.train.Feature(
bytes_list=tf.train.BytesList(value=[value.encode()])
)


def _float_feature(value):
return tf.train.Feature(float_list=tf.train.FloatList(value=[value]))
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value.encode()]))


def _int64_feature(value):
Expand All @@ -26,12 +20,13 @@ def clean_rows(row):
row["zip_code"] = "99999"
return row


def convert_zipcode_to_int(zipcode):
if isinstance(zipcode, str) and "XX" in zipcode:
zipcode = zipcode.replace("XX", "00")
int_zipcode = int(zipcode)
return int_zipcode


original_data_file = "../../data/consumer_complaints_with_narrative.csv"
tfrecords_filename = "consumer-complaints.tfrecords"
Expand All @@ -53,9 +48,7 @@ def convert_zipcode_to_int(zipcode):
"company": _bytes_feature(row["company"]),
"company_response": _bytes_feature(row["company_response"]),
"timely_response": _bytes_feature(row["timely_response"]),
"consumer_disputed": _bytes_feature(
row["consumer_disputed"]
),
"consumer_disputed": _bytes_feature(row["consumer_disputed"]),
}
)
)
Expand Down
Loading

0 comments on commit c8bf023

Please sign in to comment.