-
Notifications
You must be signed in to change notification settings - Fork 508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Add multi-groups version of the DBSCAN #5412
base: branch-23.08
Are you sure you want to change the base?
[FEA] Add multi-groups version of the DBSCAN #5412
Conversation
Merge latest commits from rapidsai/cuml: branch-23.06
Pull requests from external contributors require approval from a |
Hi @georgeliu95, thanks for contributing this feature. Can you please provide benchmarks to demonstrate the performance of this design compared to the existing dbscan implementation? There's a lot of code in this PR taken and modified from the building blocks in RAFT and I'd like to keep the algorithms in cuml based on the reusable building blocks that have been centralized in RAFT. I think after benchmarks, a next step would be consolidating these back into RAFT. I'm also fine exposing these as multi-group building blocks if it makes sense. Another note- we should prefer using |
`raft::linalg::map_offset`
Thank you very much for your help and useful advice! @cjnolet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @georgeliu95 for this work! It is great to have an efficient implementation for performing a batch of smaller independent DBSCAN clustering!
I see that you arranged the new implementation to have a similar code structure as the existing DBSCAN code in cuML, thank you for this effort! I still think that we can sacrifice some of this similarity to simplify the code, and I have left a few comments along this line.
The main issue, as @cjnolet has already mentioned, is consolidating the computational primitives in raft (epsilon neighborhood, coalesced reduction, adj_to_csr). It is clear that we need a modified version of these primitives for the multi group case, but there is a large overlap between the existing single group version and the proposed multi-group variants. The goal is to reduce code duplication and improve maintainability of the code. These changes could be done in separate smaller PRs for raft.
Apart from these issues, the implementation looks good overall, and I am excited to have this feature added to cuML!
* @param[in] custom_workspace workspace buffer provided by user | ||
* @param[in] custom_workspace_size required size of workspace buffer provided by user |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need extra workspace arg, as opposed to using a pooling allocator (like below), and keeping allocations inside? I am not opposed to the current API, just curious if we could simplify it without loosing perf.
rmm::mr::pool_memory_resource<rmm::mr::device_memory_resource> mr(rmm::mr::get_current_device_resource(), 1<<30);
rmm::mr::set_current_device_resource(&mr);
ML::Dbscan::fit(...)
@@ -207,5 +211,223 @@ void dbscanFitImpl(const raft::handle_t& handle, | |||
metric); | |||
} | |||
|
|||
template <typename Index_ = int> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a docstring.
this->n_groups * sizeof(Index_t), | ||
cudaMemcpyDeviceToDevice, | ||
stream)); | ||
h_adj_group_offset = new std::size_t[this->n_groups]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is recommended to manage the lifetime of such allocation with a vector object, e.g.:
h_adj_group_offset = new std::size_t[this->n_groups]; | |
h_adj_group_offset = raft::make_host_vector<size_t>(this->n_groups); |
|
||
/** | ||
* The implementation is based on | ||
* https://github.com/rapidsai/raft/blob/branch-23.06/cpp/include/raft/sparse/convert/detail/adj_to_csr.cuh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally the changes here should be added to raft to minimize code duplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should refactoring this into a gbench, and adding it to https://github.com/rapidsai/cuml/blob/branch-23.06/cpp/bench/sg/dbscan.cu
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @georgeliu95 for this work! It is great to have an efficient implementation for performing a batch of smaller independent DBSCAN clustering!
I see that you arranged the new implementation to have a similar code structure as the existing DBSCAN code in cuML, thank you for this effort! I still think that we can sacrifice some of this similarity to simplify the code, and I have left a few comments along this line.
The main issue, as @cjnolet has already mentioned, is consolidating the computational primitives in raft (epsilon neighborhood, coalesced reduction, adj_to_csr). It is clear that we need a modified version of these primitives for the multi group case, but there is a large overlap between the existing single group version and the proposed multi-group variants. The goal is to reduce code duplication and improve maintainability of the code. These changes could be done in separate smaller PRs for raft.
Apart from these issues, the implementation looks good overall, and I am excited to have this feature added to cuML!
- some docstring modification - remove dispatcher for `VertexDeg` and `AdjGraph` - replace CUDA runtime API `cudaMemcpyAsync` with `raft::copy` - correct the value of `accum_est_mem` in `mgrp_dbscan_scheduler` Co-authored-by: Tamas Bela Feher <tfeher@nvidia.com>
/ok to test |
This draft PR is for the discussion on implementation details.