Data description
Shown below is a snippet of data generated from gauges and meters fitted into the tank:
Ser No. |
Eng Hrs Run (in Hrs) |
Vibration |
Coolant Temp (degreeC) |
Oil Pressure (in Kg/cmsq) |
1 |
260 |
270 |
60 |
9 |
2 |
961 |
428 |
78 |
8.2 |
3 |
517 |
287 |
80 |
8.4 |
Tensorflow Implementation
This example uses the tf.keras
API, see
this guide for details.
import urllib import io import numpy as np import tensorflow as tf import matplotlib.pyplot as plt import logging import traceback logging.getLogger('tensorflow').setLevel(logging.DEBUG) print(tf.__version__)
1.13.0-rc1
1. Define the path and the format of the data
Declare the columns and their type that you want to use for training, as shown below. Floats variable is defined by [0.].COLUMNS=["Ser","EngHrs","Vibration","CoolantTemp","OilPressure"] RECORDS_ALL=[[0.0], [0.0], [0.0], [0.0],[0.0]]In general, we normalize the data to prevent the so-called vanishing/exploding gradient. In our case, we will skip this as our data is well behaved.
2. Define the input_fn function
We need an input function that can give us predictors datasets as requested. Below is the overall code to define the function.def input_fn(data_file, batch_size, num_epoch=None): # Step 1 def parse_csv(value): columns = tf.decode_csv(value, record_defaults=RECORDS_ALL) features = dict(zip(COLUMNS, columns)) features.pop('Ser') labels = features.pop('EngHrs') return features, labels # Extract lines from input files using the Dataset API. dataset = (tf.data.TextLineDataset(data_file) # Read text file .skip(1) # Skip header row .map(parse_csv)) dataset = dataset.repeat(num_epoch) dataset = dataset.batch(batch_size) # Step 3 iterator = dataset.make_one_shot_iterator() features, labels = iterator.get_next() print("Features %s, Labels %s"%(features, labels)) return features, labels
Consume the data: Extract lines from input files using the Dataset API.
- tf.data.TextLineDataset(data_file): This line read the csv file
- .skip(1) : skip the header
- .map(parse_csv)): parse the records into the tensorsYou need to define a function to instruct the map object. You can call this function parse_csv.
Import the data: Parse CSV File
This function parses the CSV file with the method tf.decode_csv that returns the features and the label. Code explained below:- tf.decode_csv(value, record_defaults= RECORDS_ALL): the method decode_csv uses the output of the TextLineDataset to read the CSV file. record_defaults instructs TensorFlow about the type of the columns.
- dict(zip(_CSV_COLUMNS, columns)): Populate the dictionary with all the columns extracted during this data processing
- features.pop('EngHrs'): Exclude the target variable from the feature variable and create a label variable
Create the iterator
Now you are ready for the second step, that is, to create an iterator that returns the elements in the dataset for which we use make_one_shot_iterator.3.Consume the data
Code below shows how input_fn can be used to generate data for the estimators. The batch size and number of epoch defines how much data it is going to generate. num_epoch: Epoch is the number of times the fill data is taken into account. batch_size: Batch size is the number of inputs taken at a time output: output is printed as the features in a dictionary and the label as an array.Following will show the first line of the CSV file. You can try to run this code many times with different batch size.
next_batch = input_fn(df_train, batch_size=1, num_epoch=None) with tf.Session() as sess: first_batch = sess.run(next_batch) print(first_batch)
Features {'Vibration': <tf.Tensor 'IteratorGetNext:2' shape=(?,) dtype=float32>, 'CoolantTemp': <tf.Tensor 'IteratorGetNext:0' shape=(?,) dtype=float32>, 'OilPressure': <tf.Tensor 'IteratorGetNext:1' shape=(?,) dtype=float32>}, Labels Tensor("IteratorGetNext:3", shape=(?,), dtype=float32, device=/device:CPU:0)
4. Define the feature column
You need to define the numeric columns as follow, and then combine all the variables in a bucket.X1=tf.feature_column.numeric_column('Vibration') X2=tf.feature_column.numeric_column('CoolantTemp') X3=tf.feature_column.numeric_column('OilPressure') base_columns = [X1, X2, X3]
5. Build the model
You can train the model with the estimator LinearRegressor. Instead of building a new estimator, we are using a 'Canned Estimator' provided by TensorFlow. We will save the trained model in the 'train' directory.model=tf.estimator.LinearRegressor(feature_columns=base_columns, model_dir='train')We have skipped customising the optimiser. Instead, we will use the default. An optimiser comes in handy to quickly get to the optimum value of votes(weight in TensorFlow terminology) that each input feature gets in deciding the outcome.
6. Train the model
It is now time to train the model, using a lambda function as the input argument to method train, which provides a way of iteratively going thru the predictors.model.train(steps=500, input_fn=lambda: input_fn(df_train, batch_size=128, num_epoch=None))
num_epoch: Epoch is the number of times all the data is taken into account. batch_size: Batch size is the number of inputs received at a time step
num_epoch: Steps is the number of batches considered for training.
If you run thru the data too many times the model seems to remember your data, this is called overfitting and is not good. When you have an overfitted model, you will get accurate results for features that match the input features, but results will swing wildly if you use different sets of values. Contrast this with an under a fitted model where the results are not very accurate. The sweet spot is somewhere in between, and you can hit it intuitively by tuning various estimators parameters.7. Evaluate the model
Now evaluate the fit of your model by using the data set aside for evaluation:results = model.evaluate(steps=None,input_fn=lambda: input_fn(df_eval, batch_size=27, num_epoch=1)) for key in results: print(" {}, was: {}".format(key, results[key]))
average_loss, was: 57064.83984375 label/mean, was: 453.77777099609375 loss, was: 3081501.25 prediction/mean, was: 614.8172607421875 global_step, was: 2110
8. Prediction
The last step is predicting the value of based on the value of the matrices of the features. You can write a dictionary with the values you want to predict. The model has three elements, so you need to provide a value for each. The model will give a prediction for each of them. In the code below, you wrote the values of each feature that is contained in the df_predict CSV file. You need to write a new input_fn function because there is no label in the dataset. You can use the API from_tensor from the Dataset.prediction_input = { 'Vibration': [325, 272], 'CoolantTemp': [65,75], 'OilPressure': [8.2,9.2] } def test_input_fn(): dataset = tf.data.Dataset.from_tensors(prediction_input) return dataset pred_results = model.predict(input_fn=test_input_fn) for pred in enumerate(pred_results): print("Engine remaining life is %d"%(pred[1]['predictions']))
Engine life is 611