Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics metadata support #1240

Open
lutzroeder opened this issue Mar 6, 2024 · 8 comments
Open

Metrics metadata support #1240

lutzroeder opened this issue Mar 6, 2024 · 8 comments

Comments

@lutzroeder
Copy link
Owner

Scenarios:

  1. Allowing the app to contain implementations to compute metrics. See #204.
  2. Open existing files defined by framework and format vendors. Which formats and tools exist?
  3. Supporting scripted solutions and formats from custom tools. See #1234.

Questions:

  • Are computed or loaded metrics different from metadata provided via onnx/onnx#5938 or metadata_schema.fbs?
  • Should these values show as part of the Node metadata UI or separate?
  • What API structure supports all scenarios?
@kylesayrs
Copy link

kylesayrs commented Jun 7, 2024

I'd be interested in helping with this feature. I implemented a similar feature for Neural Magic's Sparsezoo calculate_ops.py

I propose support of both weight sparsity metrics and operations metrics.

Counting operations depends upon whether the runtime engine and hardware supports sparsity, block sparsity, and quantization. The UI design should be capable of supporting these subtypes, if not now then in the future.

  1. Since onnx/onnx#5938 and metadata_schema.fbs seem to be unstructured, supporting these kinds of visualizations seems to be a separate issue.
  2. I propose showing weight sparsity within the node metadata UI alongside the weight, ie
name: model.3.conv.weight
category: Initializer
tensor: float32[64,32,3,3] (81% sparsity)

As for visualizing operations, I'm in favor of separating the UI from the node metadata tab as to make it clear that these performance (operation) metrics are computed values separate from the data which is embedded in the model file. For example, these could be a togglable UI displayed to the left of a node.

@kylesayrs
Copy link

Another UI idea might be togglable horizontal bars which appear to the left of a node and have different sizes depending on how many operations of which types are associated with it.

There should also be a UI element for the total number of ops/sparsity in the model, perhaps in the bottom right

@lutzroeder
Copy link
Owner Author

lutzroeder commented Jun 8, 2024

@kylesayrs all great questions.

Fundamentally, there seem to be 3 types of data and a question how these are unified and exposed at which API layer.

  1. Metrics that are included in the file format or computed with external tools and provided via supplemental files. How do these get surfaced in the API. If metrics are included in metadata, do they surface as metadata or get filtered into metrics during loading?
  2. Metrics that require format specific computation. Do such metrics need a format specific implementation and should it be in the actual model API or separate?
  3. Metrics that can be generally computed for all formats. Is this another layer that takes over if neither of the other two exist.

Since there are likely metrics at the model, graph, node and weights level, initially exposing them as another section in properties pages might be a good way to get started. Which data types exist for metrics in the API. For example, if sparsity is a float percentage, could such a single metric later be selected in the properties and used to augment and color the graph, see #1241.

@kylesayrs
Copy link

kylesayrs commented Jun 9, 2024

@lutzroeder Hm, we can implement two classes, NodeMetrics and GraphMetrics.

NodeMetrics

  • Node implements a member called metrics of type onnx.NodeMetrics
  • When Node is constructed, node metrics are calculated based on the weight and bias arguments†.
  • This class has members such as
parameters: int
sparse_parameters: int
parameter_sparsity: float
operations: int
sparse_operations: int
operation_sparsity: float
  • This class hooks into the Node Properties sidebar and is shown in a metrics section

GraphMetrics

  • GraphMetrics is instantiated as a member of Graph and depends upon _nodes, specifically each node.metrics of type NodeMetrics
  • This class aggregates metrics across nodes and includes members such as
parameters: int
sparse_parameters: int
parameter_sparsity: float
operations: int
sparse_operations: int
operation_sparsity: float
  • This class hooks into the Model Properties sidebar and is shown in a Metrics section

To respond to the questions you posed

  1. I'm not familiar with metric formats provided in supplemental files, but this could be supported with a ModelMetrics class instance on the Model class.
  2. Similarly to (1), these could be implemented in a separate class on the Model API
  3. This would be implemented by the two APIs proposed above

Let me know what you think

†Note that in order to calculate per-node metrics for ONNX, we'll need to hard code which arguments are weights and which arguments are biases for each op type

@kylesayrs
Copy link

Expanded metrics view
Screenshot 2024-06-08 at 18 27 18

Compact metrics view
Screenshot 2024-06-08 at 18 20 32

I prefer the compact view, at least for the frontend. The backend can maintain a separate members for sparsity: float, ect. to better support metric-based visualization, but I think the compact view looks nicer for users.

@lutzroeder
Copy link
Owner Author

lutzroeder commented Jun 9, 2024

For weight tensors there should be a Tensor Properties view similar to #1122. This will be needed for visualizing tensor data #65, avoids duplicating tensor information, gives each tensor a less crowded space, and solves the issue of mixing node metrics and tensor metrics. The individual metrics would be rendered similar to attribute or metadata, which hopefully results in a single mechanism across attributes, metadata, metrics to annotate the graph.

For implementation, foo.Node::metrics and foo.Tensor::metrics similar to foo.Node::attributes and foo.Tensor::metadata which returns a list of foo.Argument. This allows the mechanism to be extensible for format-specific new metrics. The initial implementation could be all wrapped in a single get metrics(). This would only by used for format-specific overrides which are hopefully rare as the code will add maintenance complexity and increase the file size for a feature that is likely used much less frequently.

lutzroeder added a commit that referenced this issue Jun 9, 2024
@kylesayrs
Copy link

kylesayrs commented Jun 9, 2024

@lutzroeder In order to analyze sparsity, foo.Tensor must decode the tensor data. Afaict this is only implemented within view.js, some version of _decodeData's implementation should be moved to a shared helper file

lutzroeder added a commit that referenced this issue Jun 9, 2024
@lutzroeder
Copy link
Owner Author

lutzroeder commented Jun 9, 2024

Tensor decoding is generalized in view.Tensor. A lot of effort went into having a single view onto the many different tensor formats. Ideally, metrics should operate at that level and automatically work for all or most formats. The format-specific API is more for keeping options open. Would be interesting to discover the edge cases where format-specific metrics need tensor access, initially not supported.

  1. Generalized metric implementation in view.Tensor::metrics drives view.TensorSidebar and calls into xxx.Tensor::metrics to honor custom format-specific metrics or implementations if available. Should cover most cases. Node metrics might be more interesting as various optimizations like inlining const nodes are hidden in the general API.
  2. xxx.Tensor::metrics can provide format-specific implementation to override the general case when needed.
  3. Model formats might store metrics as metadata or provide metrics via external files. The model loading code could detect these scenarios and expose them via xxx:Tensor::metrics. This might include additional metrics that are not known to the app, similar to metadata which is often unstructured but might include known types of metadata.
  4. Other tools might provide a generalized metrics format that integrates at the view.Tensor::metrics or view.Node::metrics layer. Until there is more information what these look like not a main concern but the implementation should make it possible to fork or opt-in later.

If the general metrics implementations get complex and impact load times it might be worth considering dynamically loading a module from view.Tensor::metrics, view.Node::metrics, too early to tell.

For tensor, the challenge is multiple changes are needed to enable #1285. Some formats have separate concepts for tensor initializer and tensor and how to opt-in quantization, what level of abstraction should this view operate on. view.Tensor is generated on demand while other objects like view.Node are in the view object model to enable selection and activation. The actual tensor data access can be expensive and needs to be re-factored to not happen in the constructor if those objects exist in the view object model. How to dispose the potentially large cached tensor data if other objects are selected.

lutzroeder added a commit that referenced this issue Jun 9, 2024
lutzroeder added a commit that referenced this issue Jun 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants