Predict US stocks closing movements
Acknowledgements: I extend my gratitude to Optiver and Kaggle for facilitating this competition. The challenge presented posed a significant and complex problem within the domain of time series forecasting in financial markets.
Methodology: My approach integrates LightGBM and neural network models, with minimal feature engineering for neural networks. The objective was to synergize these models to attenuate the variance in the final predictions.
Feature Engineering:
diff()
for temporal variation.Deviation features and online learning helped to reduce the error by a big margin.
deviation from median function used on raw features:
def create_deviation_within_seconds(df, num_features):
groupby_cols = ['date_id', 'seconds_in_bucket']
new_columns = {}
for feature in num_features:
grouped_median = df.groupby(groupby_cols)[feature].transform('median')
deviation_col_name = f'deviation_from_median_{feature}'
new_columns[deviation_col_name] = df[feature] - grouped_median
return pd.concat([df, pd.DataFrame(new_columns)], axis=1)
Neural Network Architecture: The architecture includes LSTM and ConvNet models, incorporating global stock statistics and deviation features for improved convergence.
I had already published the NN models structure here on Kaggle, in this post:
https://www.kaggle.com/competitions/optiver-trading-at-the-close/discussion/462639
Validation Strategy: Employed a straightforward time-based split for model validation
and the extended Convolutional 1D model is shown below:
def apply_conv_layers(input_layer, kernel_sizes, filters=16, do_ratio=0.5):
conv_outputs = []
for kernel_size in kernel_sizes:
conv_layer = Conv1D(filters=filters, kernel_size=kernel_size, activation='relu', padding='same')(input_layer)
conv_layer = BatchNormalization()(conv_layer)
conv_layer = Dropout(do_ratio)(conv_layer)
shortcut = conv_layer
conv_layer = Conv1D(filters=filters, kernel_size=kernel_size, padding='same')(conv_layer)
conv_layer = BatchNormalization()(conv_layer)
conv_layer = Activation('relu')(conv_layer)
# Add the output of the first Conv1D layer
conv_layer = Add()([conv_layer, shortcut])
conv_outputs.append(conv_layer)
concatenated_conv = Concatenate(axis=-1)(conv_outputs)
flattened_conv_output = Flatten()(concatenated_conv)
return flattened_conv_output
def create_rnn_model_with_residual(window_size, numerical_features, initial_learning_rate=0.001):
categorical_features = 'seconds_in_bucket'
categorical_uniques = { 'seconds_in_bucket' : 55}
embedding_dim = {'seconds_in_bucket' : 10}
input_layer = Input(shape=(window_size, len(numerical_features) + 1), name="combined_input")
# Split the input into numerical and categorical parts
numerical_input = Lambda(lambda x: x[:, :, :-1], name="numerical_part")(input_layer)
categorical_input = Lambda(lambda x: x[:, :, -1:], name="categorical_part")(input_layer)
first_numerical = Lambda(lambda x: x[:, 0])(numerical_input)
# diffrentiate layers
def create_difference_layer(lag):
return Lambda(lambda x: x[:, lag:, :] - x[:, :-lag, :], name=f"difference_layer_lag{lag}")
difference_layers = []
for lag in range(1, window_size):
diff_layer = create_difference_layer(lag)(numerical_input)
padding = ZeroPadding1D(padding=(lag, 0))(diff_layer) # Add padding to the beginning of the sequence
difference_layers.append(padding)
combined_diff_layer = Concatenate(name="combined_difference_layer")(difference_layers)
# Embedding for categorical part
vocab_size, embedding_dim = categorical_uniques[categorical_features], embedding_dim[categorical_features]
embedding = Embedding(vocab_size, embedding_dim, input_length=window_size)(categorical_input)
embedding = Reshape((window_size, -1))(embedding)
first_embedding = Lambda(lambda x: x[:, 0])(embedding)
# Concatenate numerical input and embedding
# conv_input = concatenate([enhanced_numerical_input, embedding], axis=-1)
kernel_sizes = [2,3]
do_ratio = 0.4
flattened_conv_output = apply_conv_layers(numerical_input, kernel_sizes, do_ratio=do_ratio)
flattened_conv_output_cat = apply_conv_layers(embedding, kernel_sizes, do_ratio=do_ratio)
flattened_conv_output_diff = apply_conv_layers(combined_diff_layer, kernel_sizes, do_ratio=do_ratio)
dense_output = Concatenate(axis=-1)([flattened_conv_output,flattened_conv_output_cat,flattened_conv_output_diff, Reshape((-1,))(combined_diff_layer),first_numerical,first_embedding])
dense_sizes = [512, 256, 128, 64, 32]
do_ratio = 0.5
for size in dense_sizes:
dense_output = Dense(size, activation='swish')(dense_output)
dense_output = BatchNormalization()(dense_output)
dense_output = Dropout(do_ratio)(dense_output)
# Output layer
output = Dense(1, name='output_layer')(dense_output)
# Learning rate schedule
lr_schedule = ExponentialDecay(
initial_learning_rate=initial_learning_rate,
decay_steps=10000,
decay_rate=0.7,
staircase=True)
# Create and compile the model
model = Model(inputs=input_layer, outputs=output)
optimizer = Adam(learning_rate=lr_schedule)
model.compile(optimizer=optimizer, loss="mean_absolute_error")
return model
Please sign in to reply to this topic.
Posted a year ago
Congrats @nimashahbazi Hard work pays off! Thanks for sharing your notebook, I got to know new things.
Posted a year ago
· 14th in this Competition
@nimashahbazi you were always at the top end of the LB right from the start till the end. This is a huge achievement in my opinion.
All the best and best wishes for the future!
Congratulations for the prize and all the best!
Posted a year ago
· 7th in this Competition
Thanks Ravi,
Congrats to your team as well.
Posted a year ago
· 106th in this Competition
Hi Nima, thanks a lot for sharing this code. This is by far the best NN solution that has been shared!
There is one line I don't understand.
In the code below, what is the purpose/rationale of the the last 3 inputs(the first 3 are easy to understand)?
Why adding the first num and cat features standalone and the diff layer (without conv)?
dense_output = Concatenate(axis=-1)([flattened_conv_output,flattened_conv_output_cat,flattened_conv_output_diff, Reshape((-1,))(combined_diff_layer),first_numerical,first_embedding])
Also, why did you use a different set of features for GBM and CNN?
thanks a lot