Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I convert onnx to ncnn successfully, but all my inference is all nan. Eg, the output of net.extract() is all nan. #5442

Closed
Suncheng2022 opened this issue Apr 29, 2024 · 16 comments

Comments

@Suncheng2022
Copy link

Suncheng2022 commented Apr 29, 2024

error log | 日志或报错信息 | ログ

context | 编译/运行环境 | バックグラウンド

Ubuntu 18.04.6 LTS
ncnn-20240410-android-shared
android-ndk-r17c

how to reproduce | 复现步骤 | 再現方法

1.Just run ./sc_ncnn img.jpg with adb shell on Android.
2.I try to excute net.extract("x_lr_A", A); in C++, and A is all nan. It depressed me...
pre_process:
1.cv.imread
2.resize(512,512)
3.hwc-->chw, maybe wrong, but shouldn't get nan.
4.normalize [0,1] maybe not right, but shouldn't get nan.
I'm new to ncnn, but curious about the extreme mobile performance of ncnn, so I must go to ncnn right now! I will paste my code below, Th

more | 其他 | その他

ncnn.zip

中文版:
我在ubuntu上make编译(在别人的帮助下编译通过,我不太懂部署相关的,但对ncnn的极致非常钦佩,所以想试一下)出sc_ncnn放到Android上执行,模型输出2个值name分别为x_lr_A、x_lr_b,但是我通过net.extract()得到的值都是nan,我不知道哪里错了。附件上传了.param和.bin,希望得到帮助,感谢!
Help, please.

#include <iostream>
#include <string>
#include <fstream>
#include <ctime>
using namespace std;

#include "opencv2/imgproc/imgproc.hpp"
#include "opencv2/imgcodecs/imgcodecs.hpp"
#include "opencv2/opencv.hpp"
#include "opencv2/core.hpp"
using namespace cv;

#include <vector>
#include <omp.h>
#include <cmath>

#include "mat.h"
#include "net.h"


int main(int argc, char** argv){
    // 加载模型
    ncnn::Net model_ncnn;
    model_ncnn.load_param("best_sim.param");
    model_ncnn.load_model("best_sim.bin");
    model_ncnn.opt.lightmode = true;
    model_ncnn.opt.num_threads = 1;
    // model_ncnn.opt.use_fp16_storage = false;
    // model_ncnn.opt.use_fp16_arithmetic = false;

    vector<const char*> input_names = model_ncnn.input_names();
    vector<const char*> output_names = model_ncnn.output_names();
    for (const char* name : input_names){
        cout << "input: " << name << endl;
    }
    for (const char* name : output_names){
        cout << "output: " << name << endl;
    }
    vector<const char*> output_names_ = model_ncnn.output_names();
    for (const char* name : output_names_){
        cout << "output_: " << name << endl;
    }

    // 准备输入
    string img_path = argv[1];
    cv::Mat img = cv::imread(img_path);
    // x_hr
    int x_hr_size = 2048;
    cv::Mat x_hr;
    cv::resize(img, x_hr, cv::Size(x_hr_size, x_hr_size));
    // x_lr
    int x_lr_size = 512;
    cv::Mat x_lr;
    cv::resize(x_hr, x_lr, cv::Size(x_lr_size, x_lr_size));
    // cv::cvtColor(x_lr, x_lr, cv::COLOR_BGR2RGB);
    // cv::Mat x_lr_float;
    // x_lr.convertTo(x_lr_float, CV_32FC3);
    // cout << "x_lr_floag" << x_lr_float << endl;      # 即便归一化,from_pixels竟然会映射到0-255
    cout << "定义x_lr" << endl;
    // x_lr转CHW
    // vector<float> x_lr_chw;
    // vector<Mat> channels(3);
    // split(x_lr, channels);
    // for (auto i = 0; i < channels.size(); i++){
    //     vector<float> data = vector<float>(channels[i].reshape(1, 1));
    //     x_lr_chw.insert(x_lr_chw.end(), data.begin(), data.end());
    // }
    // cout << "x_lr转CHW" << endl;
    // x_lr_chw转Mat
    // cv::Mat x_lr_mat(x_lr_size, x_lr_size, CV_32FC3);
    // int index = 0;
    // for (int y = 0; y < x_lr_size; ++y) {
    //     for (int x = 0; x < x_lr_size; ++x) {
    //         // 对于每个像素,将三个通道的值复制到 cv::Mat 中
    //         x_lr_mat.at<cv::Vec3f>(y, x) = cv::Vec3f(x_lr_chw[index], x_lr_chw[index + 1], x_lr_chw[index + 2]);
    //         index += 3; // 更新索引以处理下一个像素
    //     }
    // }
    // cout << "x_lr_chw转Mat" << endl;

    // 模型输入
    ncnn::Mat in = ncnn::Mat::from_pixels(x_lr.data, ncnn::Mat::PIXEL_BGR2RGB, x_lr.cols, x_lr.rows);
    int in_min = 256.0;
    int in_max = 0.0;
    for (int i = 0; i < 3 * x_lr_size * x_lr_size; i++){
        if (in[i] > in_max){
            in_max = in[i];
        } else if (in[i] < in_min){
            in_min = in[i];
        }
    }
    cout << "in 最值: "<< in_min << " " << in_max << endl;
    // for (int i = 0; i < 3 * x_lr_size * x_lr_size; i++){
    //     in[i] = (in[i] - in_min) / (in_max - in_min);
    // }
    const float mean_vals[3] = {0.f, 0.f, 0.f};
    const float norm_vals[3] = {1.0 / 255.0, 1.0 / 255.0, 1.0 / 255.0};
    in.substract_mean_normalize(mean_vals, norm_vals);
    // for (int i = 0; i < 3 * x_lr_size * x_lr_size; i++){
    //     cout << in[i] << " ";
    // }
    

    // 模型推理
    ncnn::Mat A;
    ncnn::Mat b;
    ncnn::Extractor ex = model_ncnn.create_extractor();
    ex.input("x_lr", in);
    ex.extract("A", A);
    ex.extract("b", b);
    cout << "推理结束" << endl;
    double A_min = 257.0;
    double A_max = 0.0;
    for (int i = 0; i < 3 * x_lr_size * x_lr_size; i++){
        if (A[i] > A_max){
            A_max = A[i];
        } else if (A[i] < A_min){
            A_min = A[i];
        }
        // cout << b[i] << " ";
    }
    cout << "A 最值: "<< A_min << " " << A_max << endl;

    // 清理模型
    // model_ncnn.clear();

    return 0;
}```
@HHscut
Copy link

HHscut commented May 13, 2024

What is the result when transfering the model into .param & .bin. Some op not support? I check the output from different output layer and find it prints "NAN" after some middle layers, but I cant locate it. So maybe unsupport op exist, can you upload the original model file (be like .onnx) so I can check the model structure further.

@Suncheng2022
Copy link
Author

What is the result when transfering the model into .param & .bin. Some op not support? I check the output from different output layer and find it prints "NAN" after some middle layers, but I cant locate it. So maybe unsupport op exist, can you upload the original model file (be like .onnx) so I can check the model structure further.

Thank you for your reply!
I have found the reason why the model outputs nan. The original author implemented a custom LayerNorm operation. This operation can be implemented in Pytorch:

class LayerNorm2d_Sc(nn.Module):
    """ 作者实现的自定义LayerNorm,理论上Pytorch通过调整维度是能做到的,我也验证了这一点,但是ncnn中暂无法实现 """

    def __init__(self, channels, eps=1e-6):
        super(LayerNorm2d_Sc, self).__init__()
        self.register_parameter('weight', nn.Parameter(torch.ones(channels)))
        self.register_parameter('bias', nn.Parameter(torch.zeros(channels)))
        self.eps = eps
        self.torch_layernorm = torch.nn.LayerNorm(channels, eps=eps, elementwise_affine=False)

    def forward(self, x):
        # 我尝试使用Pytorch的LayerNorm替换,Pytorch代码和导出的onnx均可以得到正常的结果,但转ncnn失败
        # C = x.shape[1]
        # x_ = x.clone()
        # x_ = x_.permute(0, 2, 3, 1)
        # y = self.torch_layernorm(x_)
        # y = y.permute(0, 3, 1, 2)
        # # y = self.weight.view(1, C, 1, 1) * y + self.bias.view(1, C, 1, 1)
        # return y

        # 原作者实现的自定义LayerNorm。Pytorch和导出的onnx均可以得到正常结果,但转ncnn后推理得到全黑的图像
        C = x.shape[1]
        x_ = x.clone()
        mu = x_.mean(dim=1, keepdim=True)
        var = (x_ - mu).pow(2).mean(dim=1, keepdim=True)
        y = (x_ - mu) / (var + self.eps).sqrt()
        y = self.weight.view(1, C, 1, 1) * y + self.bias.view(1, C, 1, 1)
        return y

I tried using numpy instead of Pytorch. The inference result was not completely black, but it was not normal either.
I saw in ncnn's wiki that the implementation layer can be customized, and I am trying to add the author's custom LayerNorm (if I understand correctly, the dimension processed by the ncnn model in C++ is WHC, and the output is also WHC. But in Python, ncnn output seems to be CHW. At least I can get normal results by CHW. Of course, I am more concerned about the results in C++.)

@HHscut
Copy link

HHscut commented May 13, 2024

Hello!
1、but in my practice,the dimension processed by the ncnn model in C++ is also CDHW, and the output is also CDHW. See the code in C++ to flatten the output below. It means [Batch,Channel,Height,Width]. So,

void pretty_print(const ncnn::Mat &m, std::vector<float> &vec_heap) {
    for (int q = 0; q < m.c; q++) {
        const float *ptr = m.channel(q);
        for (int z = 0; z < m.d; z++) {
            for (int y = 0; y < m.h; y++) {
                for (int x = 0; x < m.w; x++) {
                    vec_heap.emplace_back(ptr[x]);
                }
                ptr += m.w;
            }
        }
    }
}

2、Your own LayerNorm2d_Sc works the same with the original one. If your own LayerNorm2d_Sc works but fails in transfering to ncnn model. Maybe you can update the ncnn version and compile the layernorm operation (see #5262 (comment) for detail). Could you post the error message?

@HHscut
Copy link

HHscut commented May 13, 2024

And for 转ncnn后推理得到全黑的图像 , maybe u need to re-normalize the output to [0,256] and get the final output.

@Suncheng2022
Copy link
Author

What is the result when transfering the model into .param & .bin. Some op not support? I check the output from different output layer and find it prints "NAN" after some middle layers, but I cant locate it. So maybe unsupport op exist, can you upload the original model file (be like .onnx) so I can check the model structure further.
Here is the onnx from Pytorch w/o onnxsim.
model_trace_1.4M_512.onnx.zip

@Suncheng2022
Copy link
Author

Hello! 1、but in my practice,the dimension processed by the ncnn model in C++ is also CDHW, and the output is also CDHW. See the code in C++ to flatten the output below. It means [Batch,Channel,Height,Width]. So,

void pretty_print(const ncnn::Mat &m, std::vector<float> &vec_heap) {
    for (int q = 0; q < m.c; q++) {
        const float *ptr = m.channel(q);
        for (int z = 0; z < m.d; z++) {
            for (int y = 0; y < m.h; y++) {
                for (int x = 0; x < m.w; x++) {
                    vec_heap.emplace_back(ptr[x]);
                }
                ptr += m.w;
            }
        }
    }
}

2、Your own LayerNorm2d_Sc works the same with the original one. If your own LayerNorm2d_Sc works but fails in transfering to ncnn model. Maybe you can update the ncnn version and compile the layernorm operation (see #5262 (comment) for detail). Could you post the error message?

I do these operations for getting ncnn: Pytorch model --> onnxsim --> ncnn. But I got "LayerNormalization not supported yet!" when turning it to ncnn

./onnx2ncnn model_trace_1.4M_512_sim.onnx test_ncnn.param test_ncnn.bin 
LayerNormalization not supported yet!
  # axis=-1
  # epsilon=1e-06
LayerNormalization not supported yet!
  # axis=-1
  # epsilon=1e-06
LayerNormalization not supported yet!
  # axis=-1
  # epsilon=1e-06
LayerNormalization not supported yet!
  # axis=-1
  # epsilon=1e-06
LayerNormalization not supported yet!
  # axis=-1
  # epsilon=1e-06
LayerNormalization not supported yet!
  # axis=-1
  # epsilon=1e-06
LayerNormalization not supported yet!
  # axis=-1
  # epsilon=1e-06
LayerNormalization not supported yet!
  # axis=-1
  # epsilon=1e-06
LayerNormalization not supported yet!
  # axis=-1
  # epsilon=1e-06
LayerNormalization not supported yet!
  # axis=-1
  # epsilon=1e-06
LayerNormalization not supported yet!
  # axis=-1
  # epsilon=1e-06
LayerNormalization not supported yet!
  # axis=-1
  # epsilon=1e-06
LayerNormalization not supported yet!
  # axis=-1
  # epsilon=1e-06
LayerNormalization not supported yet!
  # axis=-1
  # epsilon=1e-06
LayerNormalization not supported yet!
  # axis=-1
  # epsilon=1e-06
LayerNormalization not supported yet!
  # axis=-1
  # epsilon=1e-06
LayerNormalization not supported yet!
  # axis=-1
  # epsilon=1e-06
LayerNormalization not supported yet!
  # axis=-1
  # epsilon=1e-06
LayerNormalization not supported yet!
  # axis=-1
  # epsilon=1e-06
LayerNormalization not supported yet!
  # axis=-1
  # epsilon=1e-06

The number of errors reported may correspond to the number of custom LayerNorm operations.
In addition, I try to add LayerNorm in ncnn to implement the following:

// modified in src/layer/layernorm.cpp
else if (affine_size == channels)
        {
            #pragma omp parallel for num_threads(opt.num_threads)
            for (int i = 0; i < size; i++)
            {
                // mean
                float sum = 0.f;
                for (int q = 0; q < channels; q++)
                {
                    sum += bottom_top_blob.channel(q)[i];
                }
                float mean = sum / channels;
                // var
                float sqsum = 0.f;
                float tmp = 0.f;
                for (int q = 0; q < channels; q++)
                {
                    tmp = bottom_top_blob.channel(q)[i] - mean;
                    sqsum += tmp * tmp;
                }
                float var = sqsum / channels;

                float a = 1.f / (sqrtf(var + eps));
                float b = -mean * a;
                for (int q = 0; q < channels; i++)
                {
                    bottom_top_blob.channel(q)[i] = bottom_top_blob.channel(q)[i] * a + b;
                }
            }
        }

And execute the command under ncnn/build:

cmake ..
make -j64
make install```
When I turned onnx-sim file to ncnn, I got the same error above. 

Thanks again for your reply, and I believe I can figure ncnn out with your help.^_^

@HHscut
Copy link

HHscut commented May 13, 2024

Haha I got "LayerNormalization not supported yet!" when turning it to ncnn too.

@Suncheng2022
Copy link
Author

Haha I got "LayerNormalization not supported yet!" when turning it to ncnn too.

I added the LayerNorm implementation of ncnn, why is it still not supported? It feels like the conversion process does not call ncnn’s LayerNorm.

@HHscut
Copy link

HHscut commented May 13, 2024

Haha I got "LayerNormalization not supported yet!" when turning it to ncnn too.

I added the LayerNorm implementation of ncnn, why is it still not supported? It feels like the conversion process does not call ncnn’s LayerNorm.

1、I didn't try to register own op, but i think it should be a individual .h & .cpp file to declare the class LayerNormalization.
and then in /ncnn/src/CMakeLists.txt line 169 add ncnn_add_layer(LayerNormalization)

@Suncheng2022
Copy link
Author

Suncheng2022 commented May 14, 2024

Haha I got "LayerNormalization not supported yet!" when turning it to ncnn too.

I added the LayerNorm implementation of ncnn, why is it still not supported? It feels like the conversion process does not call ncnn’s LayerNorm.

1、I didn't try to register own op, but i think it should be a individual .h & .cpp file to declare the class LayerNormalization. and then in /ncnn/src/CMakeLists.txt line 169 add ncnn_add_layer(LayerNormalization)

I have tried to supplement the LayerNorm implementation in ncnn, added the LayerNormalization implementation according to the reference document add custom layer and recompiled.
When onnx is converted to ncnn, an error is still reported and the LayerNormalization operation is not supported.
Did I compile it incorrectly? (The compilation process prompts "Could NOT find protobuf (missing: protobuf_DIR)", but subsequent execution of make, etc. can also succeed)

1.LayerNorm in ncnn surpports normalization by channel dim:
image
2.Added new LayerNormalization implementation in ncnn, but it doesn't seem to work.
image

@HHscut
Copy link

HHscut commented May 14, 2024

if you edit the file LayerNorm.cpp the op is still called LayerNorm, but the custom op is callled LayerNormalization according to "LayerNormalization not supported yet!" , so maybe u should declare a new op calss.

@Suncheng2022
Copy link
Author

if you edit the file LayerNorm.cpp the op is still called LayerNorm, but the custom op is callled LayerNormalization according to "LayerNormalization not supported yet!" , so maybe u should declare a new op calss.

I know what you mean. I write two files named "LayerNormalization.h" and "LayerNormalization.cpp", and modified src/CMakeLists.txt with ncnn_add_layer(LayerNormalization), then compile it again. But it doesn't seem to work.

@HHscut
Copy link

HHscut commented May 14, 2024

if you edit the file LayerNorm.cpp the op is still called LayerNorm, but the custom op is callled LayerNormalization according to "LayerNormalization not supported yet!" , so maybe u should declare a new op calss.

I know what you mean. I write two files named "LayerNormalization.h" and "LayerNormalization.cpp", and modified src/CMakeLists.txt with ncnn_add_layer(LayerNormalization), then compile it again. But it doesn't seem to work.

yeah, i got the same situation, but dont know why it didn't work

@Suncheng2022
Copy link
Author

if you edit the file LayerNorm.cpp the op is still called LayerNorm, but the custom op is callled LayerNormalization according to "LayerNormalization not supported yet!" , so maybe u should declare a new op calss.

I know what you mean. I write two files named "LayerNormalization.h" and "LayerNormalization.cpp", and modified src/CMakeLists.txt with ncnn_add_layer(LayerNormalization), then compile it again. But it doesn't seem to work.

yeah, i got the same situation, but dont know why it didn't work

Help, please. @nihui

@Suncheng2022
Copy link
Author

if you edit the file LayerNorm.cpp the op is still called LayerNorm, but the custom op is callled LayerNormalization according to "LayerNormalization not supported yet!" , so maybe u should declare a new op calss.

I know what you mean. I write two files named "LayerNormalization.h" and "LayerNormalization.cpp", and modified src/CMakeLists.txt with ncnn_add_layer(LayerNormalization), then compile it again. But it doesn't seem to work.

yeah, i got the same situation, but dont know why it didn't work

Thanks again, I won't give up and solve this problem sooner or later. I must turn to ncnn, as it's perfect in my view.

@Suncheng2022
Copy link
Author

I used PNNX to resolve my trouble in the end!
Thanks @nihui for PNNX!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants