Kaggle uses cookies from Google to deliver and enhance the quality of its services and to analyze traffic.

Learn more

OK, Got it.

Optiver · Featured Code Competition · a year ago

Optiver - Trading at the Close

Predict US stocks closing movements

Optiver - Trading at the Close

Overview Data Code Models Discussion Leaderboard Rules

Nima Shahbazi · 7th in this Competition · Posted a year ago

Prize-Winner (7th place solution)

Acknowledgements: I extend my gratitude to Optiver and Kaggle for facilitating this competition. The challenge presented posed a significant and complex problem within the domain of time series forecasting in financial markets.

Methodology: My approach integrates LightGBM and neural network models, with minimal feature engineering for neural networks. The objective was to synergize these models to attenuate the variance in the final predictions.

Feature Engineering:

LightGBM Enhancements: Utilized features include:
- Order Book Imbalance: Leveraging publicly shared imb1, imb2, etc.
- Trend Indicators: Employing diff() for temporal variation.
- Volume-Based Cumulatives: Aggregating volumes over time.
- Global Stock Statistics: Calculating mean, median, and standard deviation for historical stock data.
- Deviation Features: Both tree-based and neural network models benefited from raw features representing deviations from the median shown below:
- Online Learning: both for NNs and LGBM model.

Deviation features and online learning helped to reduce the error by a big margin.

deviation from median function used on raw features:

  def create_deviation_within_seconds(df, num_features):
      groupby_cols = ['date_id', 'seconds_in_bucket']
      new_columns = {}
      for feature in num_features:
          grouped_median = df.groupby(groupby_cols)[feature].transform('median')
          deviation_col_name = f'deviation_from_median_{feature}'
          new_columns[deviation_col_name] = df[feature] - grouped_median
      return pd.concat([df, pd.DataFrame(new_columns)], axis=1)

Neural Network Architecture: The architecture includes LSTM and ConvNet models, incorporating global stock statistics and deviation features for improved convergence.

I had already published the NN models structure here on Kaggle, in this post:
https://www.kaggle.com/competitions/optiver-trading-at-the-close/discussion/462639

ConvNet: Optiver Conv Just IMB Inference Cleanup
LSTM: Optiver No FE LSTM Inference Cleanup

Validation Strategy: Employed a straightforward time-based split for model validation

and the extended Convolutional 1D model is shown below:

def apply_conv_layers(input_layer, kernel_sizes, filters=16, do_ratio=0.5):
    conv_outputs = []

    for kernel_size in kernel_sizes:
        conv_layer = Conv1D(filters=filters, kernel_size=kernel_size, activation='relu', padding='same')(input_layer)
        conv_layer = BatchNormalization()(conv_layer)
        conv_layer = Dropout(do_ratio)(conv_layer)

        shortcut = conv_layer

        conv_layer = Conv1D(filters=filters, kernel_size=kernel_size, padding='same')(conv_layer)
        conv_layer = BatchNormalization()(conv_layer)
        conv_layer = Activation('relu')(conv_layer)

        # Add the output of the first Conv1D layer
        conv_layer = Add()([conv_layer, shortcut])
        conv_outputs.append(conv_layer)


    concatenated_conv = Concatenate(axis=-1)(conv_outputs)
    flattened_conv_output = Flatten()(concatenated_conv)

    return flattened_conv_output

def create_rnn_model_with_residual(window_size, numerical_features, initial_learning_rate=0.001):

    categorical_features = 'seconds_in_bucket'
    categorical_uniques  = { 'seconds_in_bucket' : 55}
    embedding_dim        = {'seconds_in_bucket' : 10}

    input_layer = Input(shape=(window_size, len(numerical_features) + 1), name="combined_input")

    # Split the input into numerical and categorical parts
    numerical_input = Lambda(lambda x: x[:, :, :-1], name="numerical_part")(input_layer)
    categorical_input = Lambda(lambda x: x[:, :, -1:], name="categorical_part")(input_layer)

    first_numerical = Lambda(lambda x: x[:, 0])(numerical_input)


    # diffrentiate layers
    def create_difference_layer(lag):
        return Lambda(lambda x: x[:, lag:, :] - x[:, :-lag, :], name=f"difference_layer_lag{lag}")

    difference_layers = []
    for lag in range(1, window_size):
        diff_layer = create_difference_layer(lag)(numerical_input)
        padding = ZeroPadding1D(padding=(lag, 0))(diff_layer)  # Add padding to the beginning of the sequence
        difference_layers.append(padding)
    combined_diff_layer = Concatenate(name="combined_difference_layer")(difference_layers)


    # Embedding for categorical part
    vocab_size, embedding_dim = categorical_uniques[categorical_features], embedding_dim[categorical_features]
    embedding = Embedding(vocab_size, embedding_dim, input_length=window_size)(categorical_input)
    embedding = Reshape((window_size, -1))(embedding)

    first_embedding = Lambda(lambda x: x[:, 0])(embedding)

    # Concatenate numerical input and embedding
    # conv_input = concatenate([enhanced_numerical_input, embedding], axis=-1)

    kernel_sizes = [2,3]
    do_ratio = 0.4

    flattened_conv_output = apply_conv_layers(numerical_input, kernel_sizes, do_ratio=do_ratio)
    flattened_conv_output_cat = apply_conv_layers(embedding, kernel_sizes, do_ratio=do_ratio)
    flattened_conv_output_diff = apply_conv_layers(combined_diff_layer, kernel_sizes, do_ratio=do_ratio)


    dense_output = Concatenate(axis=-1)([flattened_conv_output,flattened_conv_output_cat,flattened_conv_output_diff, Reshape((-1,))(combined_diff_layer),first_numerical,first_embedding])

    dense_sizes = [512, 256, 128, 64, 32]
    do_ratio = 0.5
    for size in dense_sizes:
        dense_output = Dense(size, activation='swish')(dense_output)
        dense_output = BatchNormalization()(dense_output)
        dense_output = Dropout(do_ratio)(dense_output)

    # Output layer
    output = Dense(1, name='output_layer')(dense_output)

    # Learning rate schedule
    lr_schedule = ExponentialDecay(
        initial_learning_rate=initial_learning_rate,
        decay_steps=10000,
        decay_rate=0.7,
        staircase=True)

    # Create and compile the model
    model = Model(inputs=input_layer, outputs=output)
    optimizer = Adam(learning_rate=lr_schedule)

    model.compile(optimizer=optimizer, loss="mean_absolute_error")

    return model

For further exploration, the repository Optiver Trading Close is also available, which I will update with the full winning code

Please sign in to reply to this topic.

9 Comments

Devang Giri Goswami

Posted a year ago

Congrats @nimashahbazi Hard work pays off! Thanks for sharing your notebook, I got to know new things.

Ravi Ramakrishnan

Posted a year ago

· 14th in this Competition

@nimashahbazi you were always at the top end of the LB right from the start till the end. This is a huge achievement in my opinion.
All the best and best wishes for the future!
Congratulations for the prize and all the best!

Nima Shahbazi

Topic Author

Posted a year ago

· 7th in this Competition

Thanks Ravi,
Congrats to your team as well.

C R Suthikshn Kumar

Posted a year ago

· 344th in this Competition

Congratulations on securing the 7th place in this competition. Thanks for sharing the details of your solution.

WindClimber

Posted a year ago

· 23rd in this Competition

Thank you for sharing your great ideas. I learned a lot from your NN architecture!

Nima Shahbazi

Topic Author

Posted a year ago

· 7th in this Competition

Thanks, glad to hear that.

achaosss

Posted a year ago

Nice work. Congratulations!

AssskyRui

Posted 5 months ago

· 590th in this Competition

Hi Nima, thanks for sharing your solution. Just a simple question, did you spend a lot of time on tuning and architech the neural network? I also implement a structure similar to you, but the performance is much worser than you.

giampaolo1980

Posted a year ago

· 106th in this Competition

Hi Nima, thanks a lot for sharing this code. This is by far the best NN solution that has been shared!
There is one line I don't understand.
In the code below, what is the purpose/rationale of the the last 3 inputs(the first 3 are easy to understand)?
Why adding the first num and cat features standalone and the diff layer (without conv)?
dense_output = Concatenate(axis=-1)([flattened_conv_output,flattened_conv_output_cat,flattened_conv_output_diff, Reshape((-1,))(combined_diff_layer),first_numerical,first_embedding])
Also, why did you use a different set of features for GBM and CNN?
thanks a lot