-
Notifications
You must be signed in to change notification settings - Fork 294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VectorDataset sampling triples between 2 identical inputs #1649
Comments
Can you upload an example shapefile and image so this is easier to reproduce? |
then modify the two relevant lines in the example code to point to the right location:
|
to check also the raster dataset, uncomment and move the relevant line: then compare between tif, sdat, shp |
Thanks for the data, I was able to reproduce your issue with the following MRE: from torchgeo.datasets import VectorDataset
ds = VectorDataset("path/to/shp")
print(len(ds)) The rest of the above code isn't needed to reproduce the issue. The cause of the problem: your The solution to your problem: you should really subclass VectorDataset and create a new dataset that sets class MyDataset(VectorDataset):
filename_glob = "*.shp" Hopefully that helps, and let me know if our docs aren't clear about this. |
thanks Adam, now can clearly see this is a user mistake. here are thoughts about documentation:
as a specific example, could suggest the following for preventing wrong user usage of classes:
overall, need to look at documentation from the reader's point of view. HTH. |
Description
description
VectorDataset seems to produce triple number of samples when giving it a folder or a shapefile.
in the below example code can see how the same Shapefile is loaded to a VectorDataset in two ways but the number of samples is always triple when pointing to a folder.
the expected behavior is that the VectorDataset will have identical results if given a folder with file or the actual file.
NOTE: when used a RasterDataset with tif or sdat layers, the results were as expected in size and identify between tif, sdat, folders and files. however, the number of samples produced by the GridGeoSampler was different from when using a VectorDataset. not opening an issue on that, still worth checking VectorDataset compared to RasterDataset as well.
example code
example output
folder with Shapefile
actual Shapefile
Steps to reproduce
repeat the steps again, but in step 3, with the mask pointing to the actual shapefile.
compare results of step 6
see example code in description.
Version
torchgeo 0.5.0, QGIS version 3.18.3-Zürich
The text was updated successfully, but these errors were encountered: