Predict US stocks closing movements
Thank you to Optiver and Kaggle for hosting this competition. This time I wanted to gain hands-on experience applying deep learning to trading and time series data, and didn't focus on extensive feature engineering or boosting trees based models.
Data Preprocessing: Zero imputation to handle missing values, and standard scaling to normalize the features.
Feature Engineering: Total of 35-36 features for the models, which included the raw input features, binary flags for the 'seconds_in_bucket' variable, and additional features borrowed from public notebooks such as volume, mid price, and various imbalance measures (liquidity, matched, size, pairwise, harmonic).
Modeling Approach:
All models produced outputs of shape (batch_size, number_stocks, 55). To leverage the time series nature of the competition with revealed targets, online incremental learning was performed, where the models were updated each day using only the newly unseen data (for the decoder).
Validation Strategy: Simple time-based split was used for validation, with the first 359 days used for training and the last 121 days used for validation. Because the evaluation metric was unstable after each training epoch, exponential moving average was used to smooth the values and compare models. To assess online incremental learning, models were validated with the latest 20 days of data.
Postprocessing: All models except one were trained with an additional constraint to enforce that the sum of the model outputs is zero.
Ensembling: The final ensemble consisted of an average of predictions from the 3 transformer models and 1 GRU model.
Final Results: Final submission placed 6th on the private leaderboard with a mean absolute error (MAE) of 5.4285.
Please sign in to reply to this topic.
Posted a year ago
· 5th in this Competition
Really impressive stuff - well done on the result. Is it at all possible to see the model code for either of the models? Would be really interested.
Posted a year ago
· 6th in this Competition
Thank you, and congrats to your team for an exceptional 5th position. I require some time to prepare the code for sharing, but I'm more than happy to address any specific questions you might have in the meantime.
Posted a year ago
· 14th in this Competition
Great work @danielfg
Congratulations for the prize winning solution and best wishes for the future.
I appreciate the different approach and complete reliance on deep learning models rather than the typical boosted tree solutions for such problems
Posted a year ago
· 6th in this Competition
Thank you, congratulations to your team too
Posted a year ago
· 106th in this Competition
HI Daniel, I personally think what you have done here is amazing. Achieving the 6th place using less than 40 features…. Had you used ~100ish and more, you would have probably won the competition. Anyway, please post the modelling part of the code if/when you find time, if it is still something that you intend to do. Myself -as well as many other fellow kagglers- are eager to learn the tricks about transformers for time-series( as in: 'stuff that really works').
Posted a year ago
Congratulations!
As for cross-sectional attention(stock-wise), will you incorparate this into every cross-section during sequence modeling or only use this after temporal dim reduction (pooling or use last timestamp)? I guess use t this for every cross-section is not GRAM friendly…
Posted a year ago
· 6th in this Competition
will you incorparate this into every cross-section
Yes
Posted a year ago
Congratulations! But why do you call seq2seq gru-based model decoder only? I think it might be called encoder-decoder paradigm?
Posted a year ago
· 6th in this Competition
I initially designed the network as seq2seq, but ultimately opted for a single-decoder approach in the GRU model due to the following reasons:
- It blends better with the transformer model.
- The encoder didn't contribute significantly and controlling overfitting became challenging.
Posted a year ago
· 552nd in this Competition
Hello Daniel,
Impressive work on the Kaggle Optiver competition! Your approach, especially with the seq2seq transformers and GRU model, caught my eye. Any chance you could share your code or dive a bit into how you tackled the feature engineering and preprocessing? I'm keen to learn from your techniques and apply some insights to my own projects.
Thanks a ton!
Posted a year ago
· 6th in this Competition
I just used features that are already in most of the public notebooks.
df['seconds_in_bucket_flag_1'] = df['seconds_in_bucket'] >= 300 - 60
df['seconds_in_bucket_flag_2'] = df['seconds_in_bucket'] >= 300
df['seconds_in_bucket_flag_3'] = df['seconds_in_bucket'] >= 480 - 60
df['seconds_in_bucket_flag_4'] = df['seconds_in_bucket'] >= 480
df["volume"] = df['ask_size'] + df['bid_size']
df["mid_price"] = (df['ask_price'] + df['bid_price']) / 2
df["liquidity_imbalance"] = (df['bid_size'] - df['ask_size']) / df["volume"]
df["matched_imbalance"] = (df['imbalance_size'] - df['matched_size']) / (df['imbalance_size'] + df['matched_size'])
df["size_imbalance"] = df['bid_size'] / df['ask_size']
df['harmonic_imbalance'] = 2 / ((1 / df['bid_size']) + (1 / df['ask_size'] ))
from itertools import combinations
prices = ["reference_price", "far_price", "near_price", "ask_price", "bid_price", "wap"]
for c in combinations(prices, 2):
df[f"{c[0]}_{c[1]}_imb"] = df.eval(f"({c[0]} - {c[1]})/({c[0]} + {c[1]})")
For preprocessing, please refer to StandardScaler and SimpleImputer.