-
Notifications
You must be signed in to change notification settings - Fork 295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added custom semantic segmentation trainer tutorial #1897
base: main
Are you sure you want to change the base?
Conversation
@adamjstewart I think there is a bug (or at least some weird behavior going on) here I can create a new class that extends SemanticSegmentationTask:
Then instantiate and everything works great
However when I go to load from file:
I get an error:
I can add (@isaaccorley in case you've seen this) |
Note that https://www.reviewnb.com/ is free for 'educational' use - would enable previewing the notebook |
|
||
# ## Test model | ||
# | ||
# Finally, we test the model on the test set and visualize the results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
visualize the results
- would make me expect some plots, it is a table of metrics however
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep -- not finished here. Need to get some agreement on:
- How to review
- How to deal with long running notebooks in CI. If I run this notebook for 250 epochs (7 hours on my machine) then I get 85.95 test mIoU which is better than the results reported in https://arxiv.org/pdf/2005.02264.pdf. I think fake datasets are less interesting.
- How to deal with the above issue I pointed out where the
ignore
parameter is being saved as a hparam, then breaking the subclass constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be fine to train for 1 batch as this notebook is about demonstrating the technique, in which case a CPU can be used
@robmarkcole thanks for the review! Was that easy enough to do (vs reviewnb)? On my side, I just have to run |
Yep |
I don't see an ignore param in your custom class. Did you modify the class code you used after training that checkpoint? If so, one thing you can do is to just load the checkpoint, delete that param in the hparam dict and then save the checkpoint which would fix the error. |
The I want to be able to do something like this:
i.e. use the constructor from SemanticSegmentationTask and not have to copy paste the args and logic from SemanticSegmentationTask. This works fine but, when I try to load a version of this class from checkpoint |
One workaround that I'm checking is just adding |
I think this makes the most sense since it's not an actual hparam. |
The way that I've come up with to check whether an .ipynb is in sync with the corresponding .py is
|
I don't know why notebook test is being cancelled (maybe because it is trying to run the LandCoverAI split script?). |
Tests being canceled means the job either ran out of time, space, or memory. Here, my guess would be space. We want to use the smallest datasets possible, such as EuroSAT100. Might be worth creating a LandCoverAI100 or something like that. |
Haven't yet had time to evaluate jupytext to decide whether or not it's what we should use. @nilsleh what did you end up using for lightning-uq-box? |
I really don't like the idea of making custom datasets/datamodules just to have pretty CI -- it is a large overhead for something that makes the tutorial less cool. |
And I really don't like having tutorials that take 30+ minutes to download a dataset and train a model for hundreds of epochs, or tutorials that can't be tested in CI because they involve more data than our runners can store. There's always a tradeoff. You can also find a smaller dataset instead of making your own. |
# | ||
# The remainder of the turial is straightforward and follows the typical [PyTorch Lightning](https://lightning.ai/) training routine. We instantiate a `DataModule` for the LandCover.AI dataset, instantiate a `CustomSemanticSegmentationTask` with a U-Net and ResNet-50 backbone, then train the model using a Lightning trainer. | ||
|
||
dm = LandCoverAIDataModule(root="data/", batch_size=64, num_workers=8, download=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dm = LandCoverAIDataModule(root="data/", batch_size=64, num_workers=8, download=True) | |
dm = LandCoverAIDataModule(root="data/", batch_size=2, num_workers=8, download=True) |
Luckily none of that happens here ;). LandCover.ai is 1.5GB (this will take 30+ minutes to download if your download speed is < 0.83 MB/s), training happens for 1 batch (and I can reduce the batch size to make this faster). I'm saying that we shouldn't be catching bugs with overly sanitized examples -- if LandCoverAI breaks or Lightning training breaks then our other tests will break. If the example notebooks are catching bugs then we should ask ourselves why. Downloading LandCoverAI and running this now and per release doesn't seem to be a big burden. |
How about-- is it possible to change LandCoverAI datamodule to use the test data that we already have spent time creating for this notebook (then comments saying if you actually want to play with data, then do this other thing)? |
So I tried jupytext, but for my tutorials I couldn't get the jupytext scripts to execute and be displayed on the documentation. I went back to notebooks, and the notebooks are now run when the documentation builds. However, I don't need to download any data and model fitting is fast, since it's just toy problems. |
This is also undesirable because the notebook by default will train and display predictions on random noise. I would much rather have a tiny dataset with real data. I'm happy to make this myself (maybe next week). |
I don't think it needs to display predictions -- as if we're only training for a batch for "making CI pretty" reasons then it will display noise regardless. |
We can train for multiple epochs in the tutorial, but use |
This is a tutorial notebook that shows users how to override a custom semantic segmentation class for training on LandCoverAI.