Shown below is a snippet of data generated from gauges and meters fitted into the tank:
Ser No. |
Eng Hrs Run (in Hrs) |
Vibration |
Coolant Temp (degreeC) |
Oil Pressure (in Kg/cmsq) |
1 |
260 |
270 |
60 |
9 |
2 |
961 |
428 |
78 |
8.2 |
3 |
517 |
287 |
80 |
8.4 |
This example uses the tf.keras
API, see
this guide for details.
import urllib import io import numpy as np import tensorflow as tf import matplotlib.pyplot as plt import logging import traceback logging.getLogger('tensorflow').setLevel(logging.DEBUG) print(tf.__version__)
1.13.0-rc1
COLUMNS=["Ser","EngHrs","Vibration","CoolantTemp","OilPressure"] RECORDS_ALL=[[0.0], [0.0], [0.0], [0.0],[0.0]]In general, we normalize the data to prevent the so-called vanishing/exploding gradient. In our case, we will skip this as our data is well behaved.
def input_fn(data_file, batch_size, num_epoch=None): # Step 1 def parse_csv(value): columns = tf.decode_csv(value, record_defaults=RECORDS_ALL) features = dict(zip(COLUMNS, columns)) features.pop('Ser') labels = features.pop('EngHrs') return features, labels # Extract lines from input files using the Dataset API. dataset = (tf.data.TextLineDataset(data_file) # Read text file .skip(1) # Skip header row .map(parse_csv)) dataset = dataset.repeat(num_epoch) dataset = dataset.batch(batch_size) # Step 3 iterator = dataset.make_one_shot_iterator() features, labels = iterator.get_next() print("Features %s, Labels %s"%(features, labels)) return features, labels
Consume the data: Extract lines from input files using the Dataset API.
Import the data: Parse CSV File
This function parses the CSV file with the method tf.decode_csv that returns the features and the label. Code explained below:Create the iterator
Now you are ready for the second step, that is, to create an iterator that returns the elements in the dataset for which we use make_one_shot_iterator.Following will show the first line of the CSV file. You can try to run this code many times with different batch size.
next_batch = input_fn(df_train, batch_size=1, num_epoch=None) with tf.Session() as sess: first_batch = sess.run(next_batch) print(first_batch)
Features {'Vibration': <tf.Tensor 'IteratorGetNext:2' shape=(?,) dtype=float32>, 'CoolantTemp': <tf.Tensor 'IteratorGetNext:0' shape=(?,) dtype=float32>, 'OilPressure': <tf.Tensor 'IteratorGetNext:1' shape=(?,) dtype=float32>}, Labels Tensor("IteratorGetNext:3", shape=(?,), dtype=float32, device=/device:CPU:0)
X1=tf.feature_column.numeric_column('Vibration') X2=tf.feature_column.numeric_column('CoolantTemp') X3=tf.feature_column.numeric_column('OilPressure') base_columns = [X1, X2, X3]
model=tf.estimator.LinearRegressor(feature_columns=base_columns, model_dir='train')We have skipped customizing the optimizer. Instead, we will use the default. An optimizer comes in handy to quickly get to the optimum value of votes (weight in TensorFlow terminology) that each input feature gets in deciding the outcome.
model.train(steps=500, input_fn=lambda: input_fn(df_train, batch_size=128, num_epoch=None))
num_epoch: Epoch is the number of times all the data is taken into account. batch_size: Batch size is the number of inputs received at a time step
num_epoch: Steps is the number of batches considered for training.
If you run thru the data too many times, the model seems to remember your data, this is called overfitting and is not good. When you have an overfitted model, you will get accurate results for features that match the input features, but results will swing wildly if you use different sets of values. Contrast this with an under a fitted model where the results are not very accurate. The sweet spot is somewhere in between, and you can hit it intuitively by tuning various estimators parameters.results = model.evaluate(steps=None,input_fn=lambda: input_fn(df_eval, batch_size=27, num_epoch=1)) for key in results: print(" {}, was: {}".format(key, results[key]))
average_loss, was: 57064.83984375 label/mean, was: 453.77777099609375 loss, was: 3081501.25 prediction/mean, was: 614.8172607421875 global_step, was: 2110
prediction_input = { 'Vibration': [325, 272], 'CoolantTemp': [65,75], 'OilPressure': [8.2,9.2] } def test_input_fn(): dataset = tf.data.Dataset.from_tensors(prediction_input) return dataset pred_results = model.predict(input_fn=test_input_fn) for pred in enumerate(pred_results): print("Engine remaining life is %d"%(pred[1]['predictions']))
Engine life is 611