[Data] Ray Data continues autoscaling even when pipeline is backpressured by iteration #45331

bveeramani · 2024-05-14T17:40:57Z

What happened + What you expected to happen

I'm doing training, and my compute config looks like this:

My cluster autoscales CPU nodes and eventually GPU nodes to process more data, even though my trainer doesn't need more data.

Versions / Dependencies

2.21

Reproduction script

import ray
import numpy as np
import time

def generate_block(row):
    return {"data": np.zeros((128 * 1024 * 1024,), dtype=np.uint8)}


ds = ray.data.range(1000, override_num_blocks=1000).map(generate_block)
for block in ds.iter_batches(batch_size=None):
    time.sleep(5)

Issue Severity

Medium: It is a significant difficulty but I can work around it.

bveeramani added bug Something that is supposed to be working; but isn't P1 Issue that should be fixed within a few weeks data Ray Data-related issues labels May 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Data] Ray Data continues autoscaling even when pipeline is backpressured by iteration #45331

[Data] Ray Data continues autoscaling even when pipeline is backpressured by iteration #45331

bveeramani commented May 14, 2024 •

edited

[Data] Ray Data continues autoscaling even when pipeline is backpressured by iteration #45331

[Data] Ray Data continues autoscaling even when pipeline is backpressured by iteration #45331

Comments

bveeramani commented May 14, 2024 • edited

What happened + What you expected to happen

Versions / Dependencies

Reproduction script

Issue Severity

bveeramani commented May 14, 2024 •

edited