Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

freqai transformer continual_learning save model error #10034

Open
piaofeifengxinzi opened this issue Apr 1, 2024 · 5 comments
Open

freqai transformer continual_learning save model error #10034

piaofeifengxinzi opened this issue Apr 1, 2024 · 5 comments
Assignees
Labels
freqAI Issues and PR's related to freqAI

Comments

@piaofeifengxinzi
Copy link

piaofeifengxinzi commented Apr 1, 2024

Describe your environment

  • Operating system: Ubuntu 22.04.3 LTS
  • Python Version: Python 3.10.12
  • CCXT version: ccxt==4.2.78
  • Freqtrade Version: freqtrade 2024.3-dev-7d6d3d38f

Describe the problem:

I'm using freqai backtesting, I set "continual_learning": true in the configuration, but I get the error TypeError: cannot pickle '_thread.lock' object, when I set it to false, the error disappeared

When continual_learning is turned on, freqai will load the model from the old model and continue training when there is a model. However, when the training is completed, an error will be reported when saving the model. If "pytrainer": self is commented out, no error will be reported when saving, but an error will be reported when loading. at model = zip["pytrainer"]

Steps to reproduce:

  1. freqtrade backtesting --freqaimodel PyTorchTransformerRegressor --strategy strategy4 --config ./user_data/config4temp1.json --timerange 20231002-20240325


Observed Results:

  • What happened?
  • What did you expect to happen?

Relevant code exceptions or logs

Note: Please copy/paste text of the messages, no screenshots of logs please.

File "/home/xxxx/freqtrade/freqtrade/freqai/freqai_interface.py", line 161, in start
    dk = self.start_backtesting(dataframe, metadata, self.dk, strategy)
  File "/home/xxxx/freqtrade/freqtrade/freqai/freqai_interface.py", line 365, in start_backtesting
    self.dd.save_data(self.model, pair, dk)
  File "/home/xxxx/freqtrade/freqtrade/freqai/data_drawer.py", line 484, in save_data
    model.save(save_path / f"{dk.model_filename}_model.zip")
  File "/home/xxxx/freqtrade/freqtrade/freqai/torch/PyTorchModelTrainer.py", line 190, in save
    torch.save({
  File "/home/xxxx/freqtrade/.venv/lib/python3.10/site-packages/torch/serialization.py", line 629, in save
    _save(obj, opened_zipfile, pickle_module, pickle_protocol, _disable_byteorder_record)
  File "/home/xxxx/freqtrade/.venv/lib/python3.10/site-packages/torch/serialization.py", line 841, in _save
    pickler.dump(obj)
TypeError: cannot pickle '_thread.lock' object
@piaofeifengxinzi piaofeifengxinzi added the Triage Needed Issues yet to verify label Apr 1, 2024
@xmatthias xmatthias added the freqAI Issues and PR's related to freqAI label Apr 1, 2024
@piaofeifengxinzi
Copy link
Author

piaofeifengxinzi commented Apr 1, 2024

I also set continual_learning: true, and when I set it to false, the error disappeared,Can't both of them be enabled at the same time in backtesting? Or in backtest mode, will the previous model be automatically loaded when training a new model?

@piaofeifengxinzi
Copy link
Author

I found the reason, the save method of PyTorchModelTrainer saves self

torch.save({
             "model_state_dict": self.model.state_dict(),
             "optimizer_state_dict": self.optimizer.state_dict(),
             "model_meta_data": self.model_meta_data,
             "pytrainer": self
         }, path)

But I don't know how to fix it

@piaofeifengxinzi piaofeifengxinzi changed the title freqai backtest save model error freqai Transformer continual_learning save model error Apr 3, 2024
@piaofeifengxinzi piaofeifengxinzi changed the title freqai Transformer continual_learning save model error freqai transformer continual_learning save model error Apr 3, 2024
@robcaulk
Copy link
Member

robcaulk commented Apr 4, 2024

Hello,

continual_learning for PyTorch models may only be suppprted in Dry Live modes.

If you need continual_learning in backtesting as well, you can use one of the other supported 18 models like XGBoost/Catboost.

You may have noticed in the documentation of continual_learning that this is a highly experimental feature, so if you would like to use it with PyTorch in backtesting, you may need to wait until someone has time to try to debug it for you. I will try my best, but it is not a top priority at the moment since it is an edge case on an experimental feature.

https://www.freqtrade.io/en/stable/freqai-running/#continual-learning

cheers,

rob

@piaofeifengxinzi
Copy link
Author

Hello,

continual_learning for PyTorch models may only be suppprted in Dry Live modes.

If you need continual_learning in backtesting as well, you can use one of the other supported 18 models like XGBoost/Catboost.

You may have noticed in the documentation of continual_learning that this is a highly experimental feature, so if you would like to use it with PyTorch in backtesting, you may need to wait until someone has time to try to debug it for you. I will try my best, but it is not a top priority at the moment since it is an edge case on an experimental feature.

https://www.freqtrade.io/en/stable/freqai-running/#continual-learning

cheers,

rob

Thank you for your work. Using continual_learning in Dry mode does not work. This problem may only exist with PyTorchTransformerRegressor. I have not tried continual_learning on other pytorch-based models, and I think turning on continual_learning on the Transformer model may have better results. Hope this issue can be resolved.

@xmatthias xmatthias removed the Triage Needed Issues yet to verify label Apr 7, 2024
@piaofeifengxinzi
Copy link
Author

I solved this problem by modifying the following code,Modify the fit method of the file PyTorchTransformerRegressor.py

def fit(self, data_dictionary: Dict, dk: FreqaiDataKitchen, **kwargs) -> Any:
        """
        User sets up the training and test data to fit their desired model here
        :param data_dictionary: the dictionary holding all data for train, test,
            labels, weights
        :param dk: The datakitchen object for the current coin/model
        """

        n_features = data_dictionary["train_features"].shape[-1]
        n_labels = data_dictionary["train_labels"].shape[-1]
        model = PyTorchTransformerModel(
            input_dim=n_features,
            output_dim=n_labels,
            time_window=self.window_size,
            **self.model_kwargs
        )
        model.to(self.device)
        optimizer = torch.optim.AdamW(model.parameters(), lr=self.learning_rate)
        criterion = torch.nn.MSELoss()
        # check if continual_learning is activated, and retreive the model to continue training
        temp_trainer = self.get_init_model(dk.pair)
        if temp_trainer is None:
            trainer = PyTorchTransformerTrainer(
                model=model,
                optimizer=optimizer,
                criterion=criterion,
                device=self.device,
                data_convertor=self.data_convertor,
                window_size=self.window_size,
                tb_logger=self.tb_logger,
                **self.trainer_kwargs,
            )
        else:
            trainer = PyTorchTransformerTrainer(
                model=temp_trainer.model,
                optimizer=temp_trainer.optimizer,
                model_meta_data=temp_trainer.model_meta_data,
                criterion=criterion,
                device=self.device,
                data_convertor=self.data_convertor,
                window_size=self.window_size,
                tb_logger=self.tb_logger,
                **self.trainer_kwargs,
            )
        trainer.fit(data_dictionary, self.splits)
        return trainer

I don't know if this modification is correct, but it makes it run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
freqAI Issues and PR's related to freqAI
Projects
None yet
Development

No branches or pull requests

3 participants