NVMe health monitoring

This post was originally published on this site

Hi,

 

i’m using a Samsung 1725b NVMe on ESXi 7.0 and wonder what are people using to

 

– monitor the health (tbw, errrors, temperature)

– predict failures based on these (few) data

 

For a normal SSD, i get a lot of information when using

 

#  esxcli storage core device  smart get -d  ID

Parameter                                                Value  Threshold  Worst  Raw
Health Status                OK N/A  N/A N/A
Media Wearout Indicator      99 5    99 172
Write Error Count            100 10   100 0
Power-on Hours               92 0    92 151
Power Cycle Count            99 0    99 14
Reallocated Sector Count     100 10   100 0
Drive Temperature            69 0    63 31
Write Sectors TOT Count      99 0    99 39
Read Sectors TOT Count       99 0    99 40
Initial Bad Block Count      100 10   100 0
Program Fail Count           100 10   100 0
Erase Fail Count             100 10   100 0
Uncorrectable Error Count    100 0    100 0
Pending Sector Reallocation Count  100 0    100 0

 

 

For the NVMe i only have this:

 

Parameter                                          Value      Threshold  Worst  Raw
Health Status       OK N/A  N/A N/A
Power-on Hours      1677  N/A N/A N/A
Power Cycle Count   3 N/A  N/A N/A
Reallocated Sector Count  0 90   N/A N/A
Drive Temperature   36 79   N/A N/A

 

 

There were some efforts to get smartctl up and running, but everything unofficial.

https://www.virten.net/2016/05/determine-tbw-from-ssds-with-s-m-a-r-t-values-in-esxi-smartctl/

 

Thanks for info.

 

     -Mark

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.