New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] fetch public trades #9066
base: develop
Are you sure you want to change the base?
[Feature] fetch public trades #9066
Conversation
a5660c6
to
137ee07
Compare
This is awesome! Orderflow has a lot of valuable information, can't wait to start experimenting with this! |
This comment was marked as off-topic.
This comment was marked as off-topic.
@dijvar it's an open PR. Docs will be written before merging the PR (which the PR description also mentions). @TheJoeSchr you may want to look at the conflicts (don't just look at the conflicting lines though ...). I think the diff of #9065 will be helpful to show what changed .... i don't think much else changed ... but these changes WILL impact this PR for sure. |
thanks for the headsup @xmatthias . I think the overlap is minimal, because my "trades" are about L2 data aka orderflow trades and that PR is about trades the user/bot send to the exchange to execute. but the naming overlap is unfortunate, I'm not sure how to handle it so there won't be any more confusion |
Thanks for showing interest, this helps keeping my motivation up to finally write the docs. did you try the |
return types, timings and other issues
needed here to be used for call before analyze also removes need for internal exchange function checking if public_trades is enabled
f98a862
to
f0b26ec
Compare
f0b26ec
to
c49f854
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see a few things that i don't like and make a review very difficult
For example:
- changes that don't belong here (see explicit comment)
- docstring (and signature) changes on methods that should not have been changed, causing pointless diffs (
refresh_latest_ohlcv()
- for example) - reintroduction of arrow to exchange (which has been removed in Add datetime helpers, reduce arrow usage to a minimum #8661, and was replaced with internal helpers directly for datetime)
- duplicate methods in the same class (one will overwrite the other)
Due to the above, doing a proper review is currently near impossible - a the diff is unnecessarily large (especially in the exchange file).
Please do a review yourself (for example vscode's github pull requests extension can assist you with this by showing you the diff locally, allowing you to change what you don't like), comparing the differences - and identify what you actually INTENDED to change. That's something that's VERY difficult for me to identify.
Things you think should be changed but are not directly related to this pr (for example the max_calls point) - should either be removed, or extracted into individual PR's.
this becomes increasingly important with bigger PR's - small pr's combining 2-3 things - not really a problem, usually
but in a big feature like this - very problematic - as it draws focus apart - having high potential to introduce unwanted bugs due to these "non-directly" related changes.
Hi @TheJoeSchr, first of all, thank you for your contribution 👍🏽 ; Just a quick question: the 'plot-dataframe' command is expected to work with trades data out of the box or some update are needed? |
I don't think it is - but I'd also not invest time into plot-dataframe at this point - which i do consider a deprecated functionality (though with no immediate plans of removal). |
I'm using plot dataframe extencively, and it is allowing to do advanced indicators development. I'm not convinced that freq UI could cover efficiently such use cases. |
@Axel-CH this is not the place to discuss this topic in depth. There's no plans on having support for plot-dataframe included in this PR. This doesn't exclude eventual followup PR's eventually - but i don't see support for plot-dataframe as a requirement for this feature or pull request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole exchange area (refresh_latest_trades()
and children of it) will need some tests (ignore the already existing functions - tests for these exit).
The logic "is refresh needed" and consorts is quite prone to bugs (i know that from the ohlcv part) - which tests can help discover.
you should be able to reuse at least part of the logic for refresh_latest_ohlcv()
for these tests - which should get you most of the way (though it's new logic - so needs dedicated tests to ensure it's not broken now - and won't break in the future, either).
# used in _now_is_time_to_refresh_trades | ||
df['candle_end'] = df['candle_start'] + \ | ||
pd.Timedelta(minutes=timeframe_minutes) | ||
df.drop(columns=['datetime'], inplace=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should use timeframe_to_next_date()
instead.
You're assuming timeframe in minutes - which may not hold true on longer (or shorter) timeframes.
doing this (and the other comment) should allow us to simplify the code by removing _convert_timeframe_to_pandas_frequency
which is faulty on bigger timeframes.
(_, timeframe_minutes) = _convert_timeframe_to_pandas_frequency(timeframe) | ||
candle_next = candle_start + \ | ||
pd.Timedelta(minutes=timeframe_minutes) | ||
# skip if there are no trades at next candle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here - timeframe_to_next_date()
will be your friend.
freqtrade/data/dataprovider.py
Outdated
_candle_type = CandleType.from_string( | ||
candle_type) if candle_type != '' else self._config['candle_type_def'] | ||
data_handler = get_datahandler( | ||
self._config['datadir'], data_format=self._config['dataformat_trades']) | ||
ticks = data_handler.trades_load(pair) | ||
trades_df = public_trades_to_dataframe( | ||
ticks.values.tolist(), pair=pair) | ||
return trades_df |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes - but the point is - dataprovider is available to the stratgy via self.dp.xxx
- so - if we assume self.dp.trades()
is used within a callback (for whatever reason) - then this should have "lookahead bias protection" for backtesting (truncate data at "current date") - otherwise i can use a callback (say, confirm_trade_entry()
) and reject all trades where the price in 1h is not above current price - though in live, that's info i would never be able to have.
If we don't wanna do that for now - i'm fine with a small hint in the docstring saying "this is not meant to be used in callbacks" or similar - to alert users that this is potentially problematic.
freqtrade/exchange/exchange.py
Outdated
[ticks_pair, new_ticks] = self._download_trades_history(pair, | ||
since=since_ms if since_ms else first_candle_ms, # noqa | ||
until=until, | ||
from_id=from_id) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
df = self.klines((pair, timeframe, candle_type), True) | ||
_calculate_ohlcv_candle_start_and_end(df, timeframe) | ||
timeframe_to_seconds(timeframe) | ||
plr = round(df.iloc[-1]["candle_end"].timestamp()) | ||
now = int(timeframe_to_prev_date(timeframe).timestamp()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if you reverse this logic - it can be simplified quite significantly (including the removal of the "copy" argument on klines()
).
simply getting the last candle date (df.iloc[-1]['date']
) - and then adding 1x timeframe to it will suffice.
_calculate_ohlcv_candle_start_and_end()
may be useful - but it's doing the calculation on 1000 candles - while we only need it on the most recent one.
else: | ||
until = int(timeframe_to_prev_date(timeframe).timestamp()) * 1000 | ||
all_stored_ticks_df = data_handler.trades_load(f"{pair}-cached") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason to use a "-cached" filename here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to avoid filename collisions. You think we don't need it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well i don't know - what's the reason to have this "duplicated"?
like - is there a problem we expect by appending to the already existing files?
Sure, it could delay initial startup (avoid holes in the data) ...
But on the other hand, assume someone would like to backtest over a period of a year - and has a dry-run running at the same time ...
at the end, they'd have the same exact data twice - once in -cached files, once without.
"stacked_imbalance_range": 3, # needs at least this amount of imblance next to each other | ||
"imbalance_volume": 1, # filters out below | ||
"imbalance_ratio": 300 # filters out ratio lower than | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's a ratio - isn't 300 a bit excessive? (like - a ratio should go from 0 to 1 (which means 100%)) ?
'scale': {'type': 'number', 'minimum': 0.0}, | ||
'stacked_imbalance_range': {'type': 'number'}, | ||
'imbalance_volume': {'type': 'number'}, | ||
'imbalance_ratio': {'type': 'number'}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TheJoeSchr I'd apreciate if you could have a look at these
i assume we're able to define ranges for some of these ... though from the docs, the actual range wasn't clear to me - and i (intentionally) didn't want to discover it by reading the code.
Also, i ASSUME most are mandatory .. though i'm not entirely sur, so i'd rather leave that to you.
Orderflow
Summary
This PR enables to use ccxt fetch_public_trades so freqtrade can use trades data for backtesting and trading
Quickstart:
--dl-trades
to fetch trades for timerangeconfig.json
config.json
:TODO
populate_indicators
etcdataframe['trades']
dataframe['orderflow']
dataframe['delta']
--dl-trades
to get trade datafrom previous feedback by @xmatthias :
Some diffs are a bit strange at first glance (especially in the exchange class), so expect some questions once i look at it more carefully (might be the webUI doing the diffs oddly if 2 functions are next to each other and are similar enough though) ...
and i usually don't like random formatting changes (like - an unnecessary linebreak in a function that's not even touched otherwise) => fixed with commit 137ee07
i also assume we'll want to move some things around in the end (converter seems to become quite big now) - but i can eventually help with that.
What's new?