Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to not search feature correlation with all y target? #25

Open
wanga10000 opened this issue Jun 8, 2022 · 2 comments
Open

How to not search feature correlation with all y target? #25

wanga10000 opened this issue Jun 8, 2022 · 2 comments

Comments

@wanga10000
Copy link

wanga10000 commented Jun 8, 2022

Hi,
First of all, thanks for developing this tool, it's an excellent tool for feature selecting.
Not only for the algorithm but also for the integration and processing of all indicators.

Here's my situation,
So I got a strategy, and I want to search features correlating to win or lose.
That is, there's only a few of points in my y target that is "activated" instead of using n-point return or the other.
Therefore I tried to make y target like the following:
Assume there's a 10-day OHLCV, and the strategy activated at the third day and seventh day.
y = {0,0,1,0,0,0,-1,0,0,0} where 1 stands for win and -1 stands for loss.

It's probably not a reasonable way to do this.
Cause the tool print like 5-6 features whose correlation to targets over 0.9.
And I realized that those features the tool found only correlated to "activated" points instead of win or lose.
So I think it would be good if the algorithm can search the points that is "activated" and mask the other points.

Do you have any suggestion of implementing this kind of usage? Thanks!

@jmrichardson
Copy link
Owner

Hi @wanga10000 ,

I am not sure I completely follow your example. It looks like the "activated" points are third and seventh day which are win and lose (1, -1). But you said,

tool found only correlated to "activated" points instead of win or lose.

I think I would have tried the same thing as you with your target y being as you described. However, as I think you pointed out, there are indicators that are highly correlated by virtue of being close to 0 which is most of your data points. So, I am assume what you are looking for is a way to include a mask on "0"s after the indicators have been calculated and only use dcor on the -1, 1 values?

That does sound like a good feature if I understand correctly. However, at the moment I am not sure I can get to it soon as I am very busy on another project. Perhaps if you could describe how you would architect the solution. I am thinking that you may want to include a fit parameter such as "mask" that is the same size as y that could be used to filter the observations prior to dcor.

Also happy to merge a PR if you would like to implement yourself.

@wanga10000
Copy link
Author

I am assume what you are looking for is a way to include a mask on "0"s after the indicators have been calculated and only use dcor on the -1, 1 values?

Yes, exactly. If doing so, I think this tool would give more practicality to algo trading, which is really good.

I am thinking that you may want to include a fit parameter such as "mask" that is the same size as y that could be used to filter the observations prior to dcor.

That sounds like a feasible work. And you can make the mask input default to all 0 so it wouldn't affect the original usage.

Happy to see you thought that this is not a bad idea. I'll look foward to this feature coming online :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants