Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Add multi-groups version of the DBSCAN #5412

Draft
wants to merge 9 commits into
base: branch-23.08
Choose a base branch
from

Conversation

georgeliu95
Copy link

This draft PR is for the discussion on implementation details.

@rapids-bot
Copy link

rapids-bot bot commented May 10, 2023

Pull requests from external contributors require approval from a rapidsai organization member with write or admin permissions before CI can begin.

@cjnolet
Copy link
Member

cjnolet commented May 22, 2023

Hi @georgeliu95, thanks for contributing this feature. Can you please provide benchmarks to demonstrate the performance of this design compared to the existing dbscan implementation?

There's a lot of code in this PR taken and modified from the building blocks in RAFT and I'd like to keep the algorithms in cuml based on the reusable building blocks that have been centralized in RAFT. I think after benchmarks, a next step would be consolidating these back into RAFT. I'm also fine exposing these as multi-group building blocks if it makes sense.

Another note- we should prefer using raft::linalg::map_offset over thrust::for_each. We centralize these operations behind RAFT APIs so we have more control over future changes and optimizations.

@georgeliu95
Copy link
Author

Thank you very much for your help and useful advice! @cjnolet
I just add the benchmark for multi-group DBSCAN and using raft::linalg::map / raft::linalg::map_offset instead of thrust::for_each as you suggested.
And if directly run this benchmark with cosine metric, there will be some difference compared to the baseline due to #5360.

Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @georgeliu95 for this work! It is great to have an efficient implementation for performing a batch of smaller independent DBSCAN clustering!

I see that you arranged the new implementation to have a similar code structure as the existing DBSCAN code in cuML, thank you for this effort! I still think that we can sacrifice some of this similarity to simplify the code, and I have left a few comments along this line.

The main issue, as @cjnolet has already mentioned, is consolidating the computational primitives in raft (epsilon neighborhood, coalesced reduction, adj_to_csr). It is clear that we need a modified version of these primitives for the multi group case, but there is a large overlap between the existing single group version and the proposed multi-group variants. The goal is to reduce code duplication and improve maintainability of the code. These changes could be done in separate smaller PRs for raft.

Apart from these issues, the implementation looks good overall, and I am excited to have this feature added to cuML!

cpp/examples/dbscan/mgrp_dbscan_example.cpp Outdated Show resolved Hide resolved
cpp/include/cuml/cluster/dbscan.hpp Outdated Show resolved Hide resolved
Comment on lines +127 to +128
* @param[in] custom_workspace workspace buffer provided by user
* @param[in] custom_workspace_size required size of workspace buffer provided by user
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need extra workspace arg, as opposed to using a pooling allocator (like below), and keeping allocations inside? I am not opposed to the current API, just curious if we could simplify it without loosing perf.

rmm::mr::pool_memory_resource<rmm::mr::device_memory_resource> mr(rmm::mr::get_current_device_resource(), 1<<30);
rmm::mr::set_current_device_resource(&mr);
ML::Dbscan::fit(...)

cpp/include/cuml/cluster/dbscan_api.h Outdated Show resolved Hide resolved
@@ -207,5 +211,223 @@ void dbscanFitImpl(const raft::handle_t& handle,
metric);
}

template <typename Index_ = int>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a docstring.

this->n_groups * sizeof(Index_t),
cudaMemcpyDeviceToDevice,
stream));
h_adj_group_offset = new std::size_t[this->n_groups];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is recommended to manage the lifetime of such allocation with a vector object, e.g.:

Suggested change
h_adj_group_offset = new std::size_t[this->n_groups];
h_adj_group_offset = raft::make_host_vector<size_t>(this->n_groups);

cpp/src/dbscan/multigroups/mgrp_adjgraph.cuh Outdated Show resolved Hide resolved
cpp/src/dbscan/multigroups/mgrp_adjgraph.cuh Outdated Show resolved Hide resolved

/**
* The implementation is based on
* https://github.com/rapidsai/raft/blob/branch-23.06/cpp/include/raft/sparse/convert/detail/adj_to_csr.cuh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally the changes here should be added to raft to minimize code duplication.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should refactoring this into a gbench, and adding it to https://github.com/rapidsai/cuml/blob/branch-23.06/cpp/bench/sg/dbscan.cu

Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @georgeliu95 for this work! It is great to have an efficient implementation for performing a batch of smaller independent DBSCAN clustering!

I see that you arranged the new implementation to have a similar code structure as the existing DBSCAN code in cuML, thank you for this effort! I still think that we can sacrifice some of this similarity to simplify the code, and I have left a few comments along this line.

The main issue, as @cjnolet has already mentioned, is consolidating the computational primitives in raft (epsilon neighborhood, coalesced reduction, adj_to_csr). It is clear that we need a modified version of these primitives for the multi group case, but there is a large overlap between the existing single group version and the proposed multi-group variants. The goal is to reduce code duplication and improve maintainability of the code. These changes could be done in separate smaller PRs for raft.

Apart from these issues, the implementation looks good overall, and I am excited to have this feature added to cuML!

- some docstring modification
- remove dispatcher for `VertexDeg` and `AdjGraph`
- replace CUDA runtime API `cudaMemcpyAsync` with `raft::copy`
- correct the value of `accum_est_mem` in `mgrp_dbscan_scheduler`

Co-authored-by: Tamas Bela Feher <tfeher@nvidia.com>
@tfeher tfeher changed the base branch from branch-23.06 to branch-23.08 June 7, 2023 18:51
@tfeher tfeher added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jun 7, 2023
@dantegd
Copy link
Member

dantegd commented Jul 12, 2023

/ok to test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake CUDA/C++ improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants