Time series data modeling predicts new coronavirus

The first article : I am Octopus, the name comes from my Chinese name - octopus; I love programming, algorithms, and open source. All the source code is in my personal github  ; this blog is to record the bits and pieces of my learning, if you are interested in Python, Java, AI, and algorithms, you can follow my news, learn together, and make progress together.

  related articles:

  1. Why sonarlint recommends Deque instead of Stack
  2. Time series data modeling predicts new coronavirus
  3. fastjson converts json to Map type data
  4. 5 main factors affecting java parallel performance

   Article directory

1. Prepare the data

Second, define the model

Third, train the model

Fourth, the evaluation model

Fifth, use the model 

Six, save and use the model


         The domestic new crown pneumonia epidemic has been going on for more than 3 months since it was discovered. This disaster, which originated from eating game, has affected everyone's life in many ways.

        Some classmates are income, some are emotional, some are psychological, and some are weight. So when will the domestic new crown pneumonia epidemic end? When will we be free again?

This article will use TensorFlow2.0 to build a time series RNN model to predict the end time of the domestic new crown pneumonia epidemic

1. Prepare the data

 The data set in this article is taken from tushare, and the data set is in the data directory of this project.

  1. import numpy as np
  2. import pandas as pd
  3. import matplotlib.pyplot as plt
  4. import tensorflow as tf
  5. from tensorflow.keras import models,layers,losses,metrics,callbacks
  1. % matplotlib inline
  2. %config InlineBackend.figure_format = 'svg'
  3. df = pd.read_csv("./data/covid-19.csv",sep = "\t")
  4. df.plot(x = "date",y = ["confirmed_num","cured_num","dead_num"],figsize=(10,6))
  5. plt.xticks(rotation=60)

  1. dfdata = df.set_index("date")
  2. dfdiff = dfdata.diff(periods=1).dropna()
  3. dfdiff = dfdiff.reset_index("date")
  4. dfdiff.plot(x = "date",y = ["confirmed_num","cured_num","dead_num"],figsize=(10,6))
  5. plt.xticks(rotation=60)
  6. dfdiff = dfdiff.drop("date",axis = 1).astype("float32")

  1. #Use the window data of 8 days before a certain day as input to predict the data of the day
  2. WINDOW_SIZE = 8
  3. def batch_dataset(dataset):
  4. dataset_batched = dataset.batch(WINDOW_SIZE,drop_remainder=True)
  5. return dataset_batched
  6. ds_data = tf.data.Dataset.from_tensor_slices(tf.constant(dfdiff.values,dtype = tf.float32)) \
  7. .window(WINDOW_SIZE,shift=1).flat_map(batch_dataset)
  8. ds_label = tf.data.Dataset.from_tensor_slices(
  9. tf.constant(dfdiff.values[WINDOW_SIZE:],dtype = tf.float32))
  10. #Data is small, you can put all training data into one batch to improve performance
  11. ds_train = tf.data.Dataset.zip((ds_data,ds_label)).batch(38).cache()

Second, define the model

        There are three ways to build a model using the Keras interface: use Sequential to build a model in layer order, use a functional API to build an arbitrary structural model, and inherit the Model base class to build a custom model.

        The choice here is to build arbitrary structural models using a functional API.

  1. #Considering that newly diagnosed, newly cured, the number of new deaths cannot be less than 0, the following structure is designed
  2. class Block(layers.Layer):
  3. def __init__(self, **kwargs):
  4. super(Block, self).__init__(**kwargs)
  5. def call(self, x_input,x):
  6. x_out = tf.maximum((1+x)*x_input[:,-1,:],0.0)
  7. return x_out
  8. def get_config(self):
  9. config = super(Block, self).get_config()
  10. return config
  1. tf.keras.backend.clear_session()
  2. x_input = layers.Input(shape = (None,3),dtype = tf.float32)
  3. x = layers.LSTM(3,return_sequences = True,input_shape=(None,3))(x_input)
  4. x = layers.LSTM(3,return_sequences = True,input_shape=(None,3))(x)
  5. x = layers.LSTM(3,return_sequences = True,input_shape=(None,3))(x)
  6. x = layers.LSTM(3,input_shape=(None,3))(x)
  7. x = layers.Dense(3)(x)
  8. #Considering that newly diagnosed, newly cured, the number of new deaths cannot be less than 0, the following structure is designed
  9. #x = tf.maximum((1+x)*x_input[:,-1,:],0.0)
  10. x = Block()(x_input,x)
  11. model = models.Model(inputs = [x_input],outputs = [x])
  12. model.summary()
  1. Model: "model"
  2. ___________________________________________________
  3. Layer (type) Output Shape Param #
  4. ===================================================== ===============
  5. input_1 (InputLayer) [(None, None, 3)] 0
  6. ___________________________________________________
  7. lstm (LSTM) (None, None, 3) 84
  8. ___________________________________________________
  9. lstm_1 (LSTM) (None, None, 3) 84
  10. ___________________________________________________
  11. lstm_2 (LSTM) (None, None, 3) 84
  12. ___________________________________________________
  13. lstm_3 (LSTM) (None, 3) 84
  14. ___________________________________________________
  15. dense (Dense) (None, 3) 12
  16. ___________________________________________________
  17. block (Block) (None, 3) 0
  18. ===================================================== ===============
  19. Total params: 348
  20. Trainable params: 348
  21. Non-trainable params: 0
  22. ___________________________________________________

Third, train the model

        There are usually 3 ways to train a model, the built-in fit method, the built-in train_on_batch method, and a custom training loop. Here we choose the most common and simplest built-in fit method.

        Note: It is difficult to debug the cyclic neural network, and it is necessary to set multiple different learning rates for multiple attempts to achieve better results.

  1. #Custom loss function, considering the ratio of squared difference and predicted target
  2. class MSPE(losses.Loss):
  3. def call ( self ,y_true,y_pred) :
  4. err_percent = (y_true - y_pred)**2/(tf.maximum(y_true**2,1e-7))
  5. mean_err_percent = tf.reduce_mean(err_percent)
  6. return mean_err_percent
  7. def get_config(self):
  8. config = super(MSPE, self).get_config()
  9. return config
  1. import them
  2. import datetime
  3. optimizer = tf.keras.optimizers.Adam(learning_rate=0.01)
  4. model.compile(optimizer=optimizer,loss=MSPE(name = "MSPE"))
  5. stamp = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
  6. logdir = os.path.join('data', 'autograph', stamp)
  7. ## It is recommended to use pathlib to correct the path of each operating system under Python3
  8. # from pathlib import Path
  9. # stamp = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
  10. # logdir = str(Path('./data/autograph/' + stamp))
  11. tb_callback = tf.keras.callbacks.TensorBoard(logdir, histogram_freq=1)
  12. # If the loss does not increase after 100 epochs, the learning rate is halved.
  13. lr_callback = tf.keras.callbacks.ReduceLROnPlateau(monitor="loss",factor = 0.5, patience = 100)
  14. #When the loss does not improve after 200 epochs, the training is terminated early.
  15. stop_callback = tf.keras.callbacks.EarlyStopping(monitor = "loss", patience= 200)
  16. callbacks_list = [tb_callback,lr_callback,stop_callback]
  17. history = model.fit(ds_train,epochs=500,callbacks = callbacks_list)

Fourth, the evaluation model

        The evaluation model generally needs to set a validation set or a test set. Due to the small amount of data in this case, we only visualize the iteration of the loss function on the training set.

  1. % matplotlib inline
  2. %config InlineBackend.figure_format = 'svg'
  3. import matplotlib.pyplot as plt
  4. def plot_metric(history, metric):
  5. train_metrics = history.history[metric]
  6. epochs = range(1, len(train_metrics) + 1)
  7. plt.plot(epochs, train_metrics, 'bo--')
  8. plt.title('Training '+ metric)
  9. plt.xlabel("Epochs")
  10. plt.ylabel(metric)
  11. plt.legend(["train_"+metric])
  12. plt.show()
plot_metric(history,"loss")

Fifth, use the model 

Here we use the model to predict the end of the epidemic, that is, the time when the number of new confirmed cases is 0.

  1. #Use dfresult to record existing data and future predicted epidemic data
  2. dfresult = dfdiff[["confirmed_num","cured_num","dead_num"]].copy()
  3. dfresult.tail()
  1. #Predict the new trend in the next 100 days and add the result to dfresult
  2. for i in range(200):
  3. arr_predict = model.predict(tf.constant(tf.expand_dims(dfresult.values[-38:,:],axis = 0)))
  4. dfpredict = pd.DataFrame(tf.cast(tf.floor(arr_predict),tf.float32).numpy(),
  5. columns = dfresult.columns)
  6. dfresult = dfresult.append(dfpredict,ignore_index=True)
  1. dfresult.query("confirmed_num==0").head()
  2. # On the 55th day, the number of new confirmed diagnoses dropped to 0, and the 45th day corresponds to March 10, that is, 10 days later, that is, it is expected that the new confirmed diagnoses will drop to 0 on March 20.
  3. # Note: The forecast is optimistic
  1. dfresult.query("cured_num==0").head()
  2. # The 164th day starts to reduce the new cure to 0, and the 45th day corresponds to March 10th, which is about 4 months later, that is, July 10th.
  3. # Note: The forecast is pessimistic and has problems. If the daily number of newly cured patients is added up, it will exceed the cumulative number of confirmed diagnoses.
  1. dfresult.query("dead_num==0").head()
  2. # Starting from the 60th day, the number of new deaths is reduced to 0, and the 45th day corresponds to March 10, which is about 15 days later, that is, 20200325
  3. # This prediction is reasonable

Six, save and use the model

  1. model.save('./data/tf_model_savedmodel', save_format="tf")
  2. print('export saved model.')
  1. model_loaded = tf.keras.models.load_model('./data/tf_model_savedmodel',compile=False)
  2. optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
  3. model_loaded.compile(optimizer=optimizer,loss=MSPE(name = "MSPE"))
  4. model_loaded.predict(ds_train)

Related: Time series data modeling predicts new coronavirus