Skip to content

Extremely simple Nvidia Jetson Xavier monitoring using influxdb, telegraf and grafana.

Notifications You must be signed in to change notification settings

guillebot/grafana-tegrastats-telegraf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 

Repository files navigation

Tegrastats -> Telegraf -> Influxdb -> Grafana

I'm developing a series of AI devices Nvidia Jetson based.

In order to remote monitor them I use influxdb. I send the metrics using Telegraf, and visualize them using Grafana.

For the most part I use This beautiful dashboard, but for the more specific parts like temperature, GPU, hardware encoders, etc. I'm going with the nvidia tool tegrastat

So the basic workflow is:

1. Generate the logs:

tegrastats tegrastats --interval 10000 --logfile /var/log/tegrastat

2. Parse the logs with telegraf inputs.tail plugin:

Excerpt from /etc/telegraf/telegraf.conf

[[inputs.tail]]
  ## file(s) to tail:
  files = ["/var/log/tegrastat"]
  from_beginning = false
  data_format = "grok"

  #name of the "Metric" (which I want to see in Grafana eventually)
  name_override = "tegrastat"

 grok_patterns = ["%{CUSTOM_LOGS}"]

 grok_custom_patterns = '''
CUSTOM_LOGS %{NUMBER:ramused:int}/%{NUMBER:ramtotal:int}MB \(lfb %{NUMBER:pages:int}x4MB\) CPU \[%{NUMBER:cpu1percentaje:int}\%@%{NUMBER:cpu1freq:int},%{NUMBER:cpu2percentaje:int}\%@%{NUMBER:cpu2freq:int},%{NUMBER:cpu3percentaje:int}\%@%{NUMBER:cpu3freq:int},%{NUMBER:cpu4percentaje:int}\%@%{NUMBER:cpu4freq:int},%{NUMBER:cpu5percentaje:int}\%@%{NUMBER:cpu5freq:int},%{NUMBER:cpu6percentaje:int}\%@%{NUMBER:cpu6freq:int},%{NUMBER:cpu7percentaje:int}\%@%{NUMBER:cpu7freq:int},%{NUMBER:cpu8percentaje:int}\%@%{NUMBER:cpu8freq:int}\] EMC_FREQ %{NUMBER:emcfreqpercentaje:int}\%@%{NUMBER:emcfreq:int} GR3D_FREQ %{NUMBER:gr3dfreqpercentaje:int}\%@%{NUMBER:gr3dfreq:int} NVENC %{NUMBER:nvencfreq:int} NVENC1 %{NUMBER:nvenc1freq:int} APE %{NUMBER:ape:int} MTS fg %{NUMBER:mts_fg:int}\% bg %{NUMBER:mts_bg:int}\% AO@%{NUMBER:ao_temp:float}C GPU@%{NUMBER:gpu_temp:float}C Tboard@%{NUMBER:tboard_temp:float}C Tdiode@%{NUMBER:tdiode_temp:float}C AUX@%{NUMBER:aux_temp:float}C CPU@%{NUMBER:cpu_temp:float}C thermal@%{NUMBER:thermal_temp:float}C PMIC@100C GPU %{NUMBER:gpupowercur:int}/%{NUMBER:gpupoweravg:int} CPU %{NUMBER:cpupowercur:int}/%{NUMBER:cpupoweravg:int} SOC %{NUMBER:socpowercur:int}/%{NUMBER:socpoweravg:int} CV %{NUMBER:cvpowercur:int}/%{NUMBER:cvpoweravg:int} VDDRQ %{NUMBER:vddrqpowercur:int}/%{NUMBER:vddrqpoweravg:int} SYS5V %{NUMBER:sys5vpowercur:int}/%{NUMBER:sys5vpoweravg:int}
'''

In my case I have an output plugin [[outputs.influxdb]] which sends the data to my influx instance.

3. Then you can of course graph them, set alarms, etc. as usual.

Jetson Xavier Temperatures

4. See attached my grafana panel configuration

Nvidia Jetson Grafana Dashboard

Notes:

I didn't know grok, which is a nice line-parser a la regex. Here a couple of tools that I used to assemble the custom pattern:

https://grokdebug.herokuapp.com/

http://grokconstructor.appspot.com/do/match#result