Conversation
Problem: In the current version, my SAS drive had empty fields: * Drive Model * Serial Number * Short/Extended test duration The script got stuck after the first SMART short test, reporting it would wait for 0 seconds. Assumptions: * smartctl reports different values in different fields on SAS drives (e.g. smartctl --capabilities only reports "SCSI device successfully opened") * Every drive offers a "short" test. SAS drives do not provide information about that * Every drive offers a "long" test. ATA drives will translate this into "extended" * A burn-in is not time critical but must be exhaustive Actions: * Made grep case insensitive for Serial number on SAS vs Serial Number on ATA * Changed "discard first two columns" to "discard everything until first colon" in get_smart_info_value * Added colon at the end of every inquired smart info * Changed test behavior "if success in smartctl then success; else if error in smartctl then error" to "if ATA error in smartctl then error; else if no test in progress then success" in poll_selftest_complete Remaining problems: * For SAS drives it will still print and log "waiting 0 seconds for test completion" Propositions: * Show Progress of SMART Test / remaining % instead of a fixed "waiting for ### seconds" * Print and log actual time until completion instead of reported test duration, or maybe both * As even SAS drives report a "long test duration", POLL_TIMEOUT_SECONDS could be set relative to that (like 1.5x), and not fixed to 4 hours. * Maybe warn on POLL_TIMEOUT and report time tested until now, instead of abort? * Switch to some other output form of smartctl, like json, and parse it with something like jq INFO ABOUT RUNNING TEST: SAS running short test smartctl -l selftest /dev/sdc smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.14.15-arch1-1] (local build) Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org === START OF READ SMART DATA SECTION === Self-test execution status: 84% of test remaining SMART Self-test log Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ] Description number (hours) # 1 Background short Self test in progress ... - NOW - [- - -] # 2 Background short Completed - 29626 - [- - -] # 3 Background short Completed - 29626 - [- - -] # 4 Background short Completed - 29625 - [- - -] # 5 Background short Completed - 29625 - [- - -] # 6 Background long Completed - 29625 - [- - -] # 7 Background short Completed - 29612 - [- - -] # 8 Background long Completed - 29608 - [- - -] Long (extended) Self-test duration: 3772 seconds [62.9 minutes] SAS not running short test: "Self-test execution status" line is missing ATA running extended test: smartctl -c /dev/sdg smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.14.15-arch1-1] (local build) Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org === START OF READ SMART DATA SECTION === General SMART Values: Offline data collection status: (0x05) Offline data collection activity was aborted by an interrupting command from host. Auto Offline Data Collection: Disabled. Self-test execution status: ( 242) Self-test routine in progress... 20% of test remaining. Total time to complete Offline data collection: ( 45) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 113) minutes. SCT capabilities: (0x003d) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. ATA drive after abort: Self-test execution status: ( 25) The self-test routine was aborted by the host. ATA drive after successful short test: Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. SMART INFO ON SAS DRIVE # echo $SMART_INFO smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.14.15-arch1-1] (local build) Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Vendor: TOSHIBA Product: AL13SXB600N Revision: 5202 Compliance: SPC-3 User Capacity: 600,127,266,816 bytes [600 GB] Logical block size: 512 bytes Rotation Rate: 15000 rpm Form Factor: 2.5 inches Logical Unit id: 0x500003975861d374 Serial number: X6S0A03FFIYA Device type: disk Transport protocol: SAS (SPL-3) Local Time is: Sat Nov 20 14:18:01 2021 UTC SMART support is: Available - device has SMART capability. SMART support is: Enabled Temperature Warning: Disabled or Not Supported FYI:# smartctl -A /dev/sdc smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.14.15-arch1-1] (local build) Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org === START OF READ SMART DATA SECTION === Current Drive Temperature: 51 C Drive Trip Temperature: 65 C Accumulated power on time, hours:minutes 29625:34 Manufactured in week 43 of year 2016 Specified cycle count over device lifetime: 50000 Accumulated start-stop cycles: 15 Specified load-unload count over device lifetime: 600000 Accumulated load-unload cycles: 15 Elements in grown defect list: 0 FYI to get info like ATA "reallocated sector" or "pending sector" os SAS devices: # smartctl -l error /dev/sdc smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.14.15-arch1-1] (local build) Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org === START OF READ SMART DATA SECTION === Error counter log: Errors Corrected by Total Correction Gigabytes Total ECC rereads/ errors algorithm processed uncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 0 2567 1986 1986 0 993683.879 0 write: 0 0 0 0 0 49874.645 0 verify: 0 0 0 0 0 106222.531 0 Non-medium error count: 0
|
Excellent! I unfortunately forgot about this and no longer have SAS drives to work with, but glad to see this getting worked on. EDIT: Actually, I do have some SAS drives I'll be replacing with SATA, so I'll run this on those and see how it goes. |
|
I came here with the same need for SAS support and am testing this patch out now, but I wanted to suggest that perhaps a better way forward is to make use of the |
|
I still have one SAS drive model lying around for testing and would be open to contributing to a rewrite. @Spearfoot seems to be inactive since 10/2021, though... |
|
@ciscam your version is working brilliantly for me. I have 5 SAS WD Ultrastor's from 2018 that I got from eBay.... 50k hours, 17 power cycles. Script is working great. Shame @Spearfoot seems to no longer be maintaining this repo. |
|
We might look into forking? |
|
I apologize for letting this project wither for lack of attention... Work and Life have gotten in the way. |
|
Thanks for the kind words @gfilicetti. |
|
I also have had success with SAS drives using ciscam's PR, after original script errored out. I have done 2 types of drives (4 drives total)... 2x of 6 TB Seagate Exos Enterprise 2x of 12 TB HGST WD Ultrastar DC HC520 |
Problem:
In the current version, my SAS drive had empty fields:
The script got stuck after the first SMART short test, reporting it would wait for 0 seconds.
Assumptions:
Actions:
Remaining problems:
Propositions:
INFO ABOUT RUNNING TEST:
SAS running short test
SAS not running short test: "Self-test execution status" line is missing
ATA running extended test:
ATA drive after abort:
ATA drive after successful short test:
SMART INFO ON SAS DRIVE
FYI:
FYI to get info like ATA "reallocated sector" or "pending sector" os SAS devices: