1010| DeviceEnumerationPlugin | powershell -Command "(Get-WmiObject -Class Win32_Processor \| Measure-Object).Count"<br >lspci -d {vendorid_ep}: \| grep -i 'VGA\\ | Display\\ | 3D' \| wc -l<br >powershell -Command "(wmic path win32_VideoController get name \| findstr AMD \| Measure-Object).Count"<br >lscpu<br >lshw<br >lspci -d {vendorid_ep}: \| grep -i 'Virtual Function' \| wc -l<br >powershell -Command "(Get-VMHostPartitionableGpu \| Measure-Object).Count" | ** Analyzer Args:** <br >- ` cpu_count ` : Optional[ list[ int]] <br >- ` gpu_count ` : Optional[ list[ int]] <br >- ` vf_count ` : Optional[ list[ int]] | [ DeviceEnumerationDataModel] ( #DeviceEnumerationDataModel-Model ) | [ DeviceEnumerationCollector] ( #Collector-Class-DeviceEnumerationCollector ) | [ DeviceEnumerationAnalyzer] ( #Data-Analyzer-Class-DeviceEnumerationAnalyzer ) |
1111| DimmPlugin | sh -c 'dmidecode -t 17 \| tr -s " " \| grep -v "Volatile\\ | None\\ | Module" \| grep Size' 2>/dev/null<br >dmidecode<br >wmic memorychip get Capacity | - | [ DimmDataModel] ( #DimmDataModel-Model ) | [ DimmCollector] ( #Collector-Class-DimmCollector ) | - |
1212| DkmsPlugin | dkms status<br >dkms --version | ** Analyzer Args:** <br >- ` dkms_status ` : Union[ str, list] <br >- ` dkms_version ` : Union[ str, list] <br >- ` regex_match ` : bool | [ DkmsDataModel] ( #DkmsDataModel-Model ) | [ DkmsCollector] ( #Collector-Class-DkmsCollector ) | [ DkmsAnalyzer] ( #Data-Analyzer-Class-DkmsAnalyzer ) |
13- | DmesgPlugin | dmesg --time-format iso -x<br>ls -1 /var/log/dmesg* 2>/dev/null \| grep -E '^/var/log/dmesg(\.[0-9]+(\.gz)?)?$' \|\| true | **Built-in Regexes:**<br>- Out of memory error: `(?:oom_kill_process.*)\|(?:Out of memory.*)`<br>- I/O Page Fault: `IO_PAGE_FAULT`<br>- Kernel Panic: `\bkernel panic\b.*`<br>- SQ Interrupt: `sq_intr`<br>- SRAM ECC: `sram_ecc.*`<br>- Failed to load driver. IP hardware init error.: `\[amdgpu\]\] \*ERROR\* hw_init of IP block.*`<br>- Failed to load driver. IP software init error.: `\[amdgpu\]\] \*ERROR\* sw_init of IP block.*`<br>- Real Time throttling activated: `sched: RT throttling activated.*`<br>- RCU preempt detected stalls: `rcu_preempt detected stalls.*`<br>- RCU preempt self-detected stall: `rcu_preempt self-detected stall.*`<br>- QCM fence timeout: `qcm fence wait loop timeout.*`<br>- General protection fault: `(?:[\w-]+(?:\[[0-9.]+\])?\s+)?general protectio...`<br>- Segmentation fault: `(?:segfault.*in .*\[)\|(?:[Ss]egmentation [Ff]au...`<br>- Failed to disallow cf state: `amdgpu: Failed to disallow cf state.*`<br>- Failed to terminate tmr: `\*ERROR\* Failed to terminate tmr.*`<br>- Suspend of IP block failed: `\*ERROR\* suspend of IP block <\w+> failed.*`<br>- amdgpu Page Fault: `(amdgpu \w{4}:\w{2}:\w{2}\.\w:\s+amdgpu:\s+\[\S...`<br>- Page Fault: `page fault for address.*`<br>- Fatal error during GPU init: `(?:amdgpu)(.*Fatal error during GPU init)\|(Fata...`<br>- PCIe AER Error: `(?:pcieport )(.*AER: aer_status.*)\|(aer_status.*)`<br>- Failed to read journal file: `Failed to read journal file.*`<br>- Journal file corrupted or uncleanly shut down: `journal corrupted or uncleanly shut down.*`<br>- ACPI BIOS Error: `ACPI BIOS Error`<br>- ACPI Error: `ACPI Error`<br>- Filesystem corrupted!: `EXT4-fs error \(device .*\):`<br>- Error in buffered IO, check filesystem integrity: `(Buffer I\/O error on dev)(?:ice)? (\w+)`<br>- PCIe card no longer present: `pcieport (\w+:\w+:\w+\.\w+):\s+(\w+):\s+(Slot\(...`<br>- PCIe Link Down: `pcieport (\w+:\w+:\w+\.\w+):\s+(\w+):\s+(Slot\(...`<br>- Mismatched clock configuration between PCIe device and host: `pcieport (\w+:\w+:\w+\.\w+):\s+(\w+):\s+(curren...`<br>- RAS Correctable Error: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)?(....`<br>- RAS Uncorrectable Error: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)?(....`<br>- RAS Deferred Error: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)?(....`<br>- RAS Corrected PCIe Error: `((?:\[Hardware Error\]:\s+)?event severity: cor...`<br>- GPU Reset: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)?(....`<br>- GPU reset failed: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)?(....`<br>- ACA Error: `(Accelerator Check Architecture[^\n]*)(?:\n[^\n...`<br>- ACA Error: `(Accelerator Check Architecture[^\n]*)(?:\n[^\n...`<br>- MCE Error: `\[Hardware Error\]:.+MC\d+_STATUS.*(?:\n.*){0,5}`<br>- Mode 2 Reset Failed: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)? (...`<br>- RAS Corrected Error: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)?(....`<br>- SGX Error: `x86/cpu: SGX disabled by BIOS`<br>- GPU Throttled: `amdgpu \w{4}:\w{2}:\w{2}.\w: amdgpu: WARN: GPU ...`<br>- LNet: ko2iblnd has no matching interfaces: `(?:\[[^\]]+\]\s*)?LNetError:.*ko2iblnd:\s*No ma...`<br>- LNet: Error starting up LNI: `(?:\[[^\]]+\]\s*)?LNetError:\s*.*Error\s*-?\d+\...`<br>- Lustre: network initialisation failed: `LustreError:.*ptlrpc_init_portals\(\).*network ...` | [DmesgData](#DmesgData-Model) | [DmesgCollector](#Collector-Class-DmesgCollector) | [DmesgAnalyzer](#Data-Analyzer-Class-DmesgAnalyzer) |
13+ | DmesgPlugin | dmesg --time-format iso -x<br>ls -1 /var/log/dmesg* 2>/dev/null \| grep -E '^/var/log/dmesg(\.[0-9]+(\.gz)?)?$' \|\| true | **Built-in Regexes:**<br>- Out of memory error: `(?:oom_kill_process.*)\|(?:Out of memory.*)`<br>- I/O Page Fault: `IO_PAGE_FAULT`<br>- Kernel Panic: `\bkernel panic\b.*`<br>- SQ Interrupt: `sq_intr`<br>- SRAM ECC: `sram_ecc.*`<br>- Failed to load driver. IP hardware init error.: `\[amdgpu\]\] \*ERROR\* hw_init of IP block.*`<br>- Failed to load driver. IP software init error.: `\[amdgpu\]\] \*ERROR\* sw_init of IP block.*`<br>- Real Time throttling activated: `sched: RT throttling activated.*`<br>- RCU preempt detected stalls: `rcu_preempt detected stalls.*`<br>- RCU preempt self-detected stall: `rcu_preempt self-detected stall.*`<br>- QCM fence timeout: `qcm fence wait loop timeout.*`<br>- General protection fault: `(?:[\w-]+(?:\[[0-9.]+\])?\s+)?general protectio...`<br>- Segmentation fault: `(?:segfault.*in .*\[)\|(?:[Ss]egmentation [Ff]au...`<br>- Failed to disallow cf state: `amdgpu: Failed to disallow cf state.*`<br>- Failed to terminate tmr: `\*ERROR\* Failed to terminate tmr.*`<br>- Suspend of IP block failed: `\*ERROR\* suspend of IP block <\w+> failed.*`<br>- amdgpu Page Fault: `(amdgpu \w{4}:\w{2}:\w{2}\.\w:\s+amdgpu:\s+\[\S...`<br>- Page Fault: `page fault for address.*`<br>- Fatal error during GPU init: `(?:amdgpu)(.*Fatal error during GPU init)\|(Fata...`<br>- PCIe AER Error Status: `(pcieport [\w:.]+: AER: aer_status:[^\n]*(?:\n[...`<br>- PCIe AER Correctable Error Status: `(.*aer_cor_status: 0x[0-9a-fA-F]+, aer_cor_mask...`<br>- PCIe AER Uncorrectable Error Status: `(.*aer_uncor_status: 0x[0-9a-fA-F]+, aer_uncor_...`<br>- PCIe AER Uncorrectable Error Severity with TLP Header: `(.*aer_uncor_severity: 0x[0-9a-fA-F]+.*)(\n.*TL...`<br>- Failed to read journal file: `Failed to read journal file.*`<br>- Journal file corrupted or uncleanly shut down: `journal corrupted or uncleanly shut down.*`<br>- ACPI BIOS Error: `ACPI BIOS Error`<br>- ACPI Error: `ACPI Error`<br>- Filesystem corrupted!: `EXT4-fs error \(device .*\):`<br>- Error in buffered IO, check filesystem integrity: `(Buffer I\/O error on dev)(?:ice)? (\w+)`<br>- PCIe card no longer present: `pcieport (\w+:\w+:\w+\.\w+):\s+(\w+):\s+(Slot\(...`<br>- PCIe Link Down: `pcieport (\w+:\w+:\w+\.\w+):\s+(\w+):\s+(Slot\(...`<br>- Mismatched clock configuration between PCIe device and host: `pcieport (\w+:\w+:\w+\.\w+):\s+(\w+):\s+(curren...`<br>- RAS Correctable Error: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)?(....`<br>- RAS Uncorrectable Error: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)?(....`<br>- RAS Deferred Error: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)?(....`<br>- RAS Corrected PCIe Error: `((?:\[Hardware Error\]:\s+)?event severity: cor...`<br>- GPU Reset: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)?(....`<br>- GPU reset failed: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)?(....`<br>- ACA Error: `(Accelerator Check Architecture[^\n]*)(?:\n[^\n...`<br>- ACA Error: `(Accelerator Check Architecture[^\n]*)(?:\n[^\n...`<br>- MCE Error: `\[Hardware Error\]:.+MC\d+_STATUS.*(?:\n.*){0,5}`<br>- Mode 2 Reset Failed: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)? (...`<br>- RAS Corrected Error: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)?(....`<br>- SGX Error: `x86/cpu: SGX disabled by BIOS`<br>- MMP Error: `Failed to load MMP firmware qat_4xxx_mmp.bin`<br>- GPU Throttled: `amdgpu \w{4}:\w{2}:\w{2}.\w: amdgpu: WARN: GPU ...`<br>- RAS Poison Consumed: `amdgpu[ 0-9a-fA-F:.]+:(?:\s*amdgpu:)?\s+(?:{\d+...`<br>- RAS Poison created: `amdgpu[ 0-9a-fA-F:.]+:(?:\s*amdgpu:)?\s+(?:{\d+...`<br>- Bad page threshold exceeded: `(amdgpu: Saved bad pages (\d+) reaches threshol...`<br>- RAS Hardware Error: `Hardware error from APEI Generic Hardware Error...`<br>- Error Address: `Error Address.*(?:\s.*)`<br>- RAS EDR Event: `EDR: EDR event received`<br>- DPC Event: `DPC: .*`<br>- LNet: ko2iblnd has no matching interfaces: `(?:\[[^\]]+\]\s*)?LNetError:.*ko2iblnd:\s*No ma...`<br>- LNet: Error starting up LNI: `(?:\[[^\]]+\]\s*)?LNetError:\s*.*Error\s*-?\d+\...`<br>- Lustre: network initialisation failed: `LustreError:.*ptlrpc_init_portals\(\).*network ...` | [DmesgData](#DmesgData-Model) | [DmesgCollector](#Collector-Class-DmesgCollector) | [DmesgAnalyzer](#Data-Analyzer-Class-DmesgAnalyzer) |
1414| FabricsPlugin | ibstat<br >ibv_devinfo<br >ls -l /sys/class/infiniband/* /device/net<br >mst start<br >mst status -v<br >ofed_info -s<br >rdma dev<br >rdma link | - | [ FabricsDataModel] ( #FabricsDataModel-Model ) | [ FabricsCollector] ( #Collector-Class-FabricsCollector ) | - |
1515| JournalPlugin | journalctl --no-pager --system --output=short-iso<br >journalctl --no-pager --system --output=json | ** Analyzer Args:** <br >- ` check_priority ` : Optional[ int] <br >- ` group ` : bool | [ JournalData] ( #JournalData-Model ) | [ JournalCollector] ( #Collector-Class-JournalCollector ) | [ JournalAnalyzer] ( #Data-Analyzer-Class-JournalAnalyzer ) |
1616| KernelPlugin | sh -c 'uname -a'<br >wmic os get Version /Value | ** Analyzer Args:** <br >- ` exp_kernel ` : Union[ str, list] <br >- ` regex_match ` : bool | [ KernelDataModel] ( #KernelDataModel-Model ) | [ KernelCollector] ( #Collector-Class-KernelCollector ) | [ KernelAnalyzer] ( #Data-Analyzer-Class-KernelAnalyzer ) |
@@ -1172,7 +1172,10 @@ Check dmesg for errors
11721172 regex=re.compile('(amdgpu \\ w{4}:\\ w{2}:\\ w{2}\\ .\\ w:\\ s+amdgpu:\\ s+\\ [ \\ S+\\ ] \\ s* (?: retry |no-retry)? page fault[ ^ \\ n] * )(?:\\ n[ ^ \\ n] * (amdgpu \\ w{4}:\\ w{2}:\\ w{2}\\ .\\ w:\\ s+amdgpu:[ ^ \\ n] * ))?(?:\\ n[ ^ \\ n] * (amdgpu \\ w{4}:, re.MULTILINE) message='amdgpu Page Fault' event_category=<EventCategory.SW_DRIVER: 'SW_DRIVER'> event_priority=<EventPriority.ERROR: 3>,
11731173 regex=re.compile('page fault for address.* ') message='Page Fault' event_category=<EventCategory.OS: 'OS'> event_priority=<EventPriority.ERROR: 3>,
11741174 regex=re.compile('(?: amdgpu )(.* Fatal error during GPU init)|(Fatal error during GPU init)') message='Fatal error during GPU init' event_category=<EventCategory.SW_DRIVER: 'SW_DRIVER'> event_priority=<EventPriority.ERROR: 3>,
1175- regex=re.compile('(?: pcieport )(.* AER: aer_status.* )|(aer_status.* )') message='PCIe AER Error' event_category=<EventCategory.SW_DRIVER: 'SW_DRIVER'> event_priority=<EventPriority.ERROR: 3>,
1175+ regex=re.compile('(pcieport [ \\ w:.] +: AER: aer_status:[ ^ \\ n] * (?:\\ n[ ^ \\ n] * ){0,32}?pcieport [ \\ w:.] +: AER: aer_layer=[ ^ \\ n] * )', re.MULTILINE) message='PCIe AER Error Status' event_category=<EventCategory.SW_DRIVER: 'SW_DRIVER'> event_priority=<EventPriority.ERROR: 3>,
1176+ regex=re.compile('(.* aer_cor_status: 0x[ 0-9a-fA-F] +, aer_cor_mask: 0x[ 0-9a-fA-F] +.* )') message='PCIe AER Correctable Error Status' event_category=<EventCategory.SW_DRIVER: 'SW_DRIVER'> event_priority=<EventPriority.ERROR: 3>,
1177+ regex=re.compile('(.* aer_uncor_status: 0x[ 0-9a-fA-F] +, aer_uncor_mask: 0x[ 0-9a-fA-F] +.* )') message='PCIe AER Uncorrectable Error Status' event_category=<EventCategory.SW_DRIVER: 'SW_DRIVER'> event_priority=<EventPriority.ERROR: 3>,
1178+ regex=re.compile('(.* aer_uncor_severity: 0x[ 0-9a-fA-F] +.* )(\\ n.* TLP Header: (?:0x)?[ 0-9a-fA-F] +(?: (?:0x)?[ 0-9a-fA-F] +){3}.* )', re.MULTILINE) message='PCIe AER Uncorrectable Error Severity with TLP Header' event_category=<EventCategory.SW_DRIVER: 'SW_DRIVER'> event_priority=<EventPriority.ERROR: 3>,
11761179 regex=re.compile('Failed to read journal file.* ') message='Failed to read journal file' event_category=<EventCategory.OS: 'OS'> event_priority=<EventPriority.WARNING: 2>,
11771180 regex=re.compile('journal corrupted or uncleanly shut down.* ') message='Journal file corrupted or uncleanly shut down' event_category=<EventCategory.OS: 'OS'> event_priority=<EventPriority.WARNING: 2>,
11781181 regex=re.compile('ACPI BIOS Error') message='ACPI BIOS Error' event_category=<EventCategory.BIOS: 'BIOS'> event_priority=<EventPriority.ERROR: 3>,
@@ -1194,15 +1197,23 @@ Check dmesg for errors
11941197 regex=re.compile('(?:\\ d{4}-\\ d+-\\ d+T\\ d+:\\ d+:\\ d+,\\ d+[ +-] \\ d+:\\ d+)? (.* Mode2 reset failed.* )') message='Mode 2 Reset Failed' event_category=<EventCategory.RAS: 'RAS'> event_priority=<EventPriority.ERROR: 3>,
11951198 regex=re.compile('(?:\\ d{4}-\\ d+-\\ d+T\\ d+:\\ d+:\\ d+,\\ d+[ +-] \\ d+:\\ d+)?(.* \\ [ Hardware Error\\ ] : Corrected error.* )') message='RAS Corrected Error' event_category=<EventCategory.RAS: 'RAS'> event_priority=<EventPriority.ERROR: 3>,
11961199 regex=re.compile('x86/cpu: SGX disabled by BIOS') message='SGX Error' event_category=<EventCategory.BIOS: 'BIOS'> event_priority=<EventPriority.WARNING: 2>,
1200+ regex=re.compile('Failed to load MMP firmware qat_4xxx_mmp.bin') message='MMP Error' event_category=<EventCategory.BIOS: 'BIOS'> event_priority=<EventPriority.WARNING: 2>,
11971201 regex=re.compile('amdgpu \\ w{4}:\\ w{2}:\\ w{2}.\\ w: amdgpu: WARN: GPU is throttled.* ') message='GPU Throttled' event_category=<EventCategory.SW_DRIVER: 'SW_DRIVER'> event_priority=<EventPriority.WARNING: 2>,
1202+ regex=re.compile('amdgpu[ 0-9a-fA-F:.] +:(?:\\ s* amdgpu:)?\\ s+(?:{\\ d+})?poison is consumed by client \\ d+, kick off gpu reset flow') message='RAS Poison Consumed' event_category=<EventCategory.RAS: 'RAS'> event_priority=<EventPriority.ERROR: 3>,
1203+ regex=re.compile('amdgpu[ 0-9a-fA-F:.] +:(?:\\ s* amdgpu:)?\\ s+(?:{\\ d+})?Poison is created') message='RAS Poison created' event_category=<EventCategory.RAS: 'RAS'> event_priority=<EventPriority.ERROR: 3>,
1204+ regex=re.compile('(amdgpu: Saved bad pages (\\ d+) reaches threshold value 128)') message='Bad page threshold exceeded' event_category=<EventCategory.RAS: 'RAS'> event_priority=<EventPriority.ERROR: 3>,
1205+ regex=re.compile('Hardware error from APEI Generic Hardware Error Source:.* (?:\\ n.* ){0,14}') message='RAS Hardware Error' event_category=<EventCategory.RAS: 'RAS'> event_priority=<EventPriority.ERROR: 3>,
1206+ regex=re.compile('Error Address.* (?:\\ s.* )') message='Error Address' event_category=<EventCategory.RAS: 'RAS'> event_priority=<EventPriority.ERROR: 3>,
1207+ regex=re.compile('EDR: EDR event received') message='RAS EDR Event' event_category=<EventCategory.RAS: 'RAS'> event_priority=<EventPriority.ERROR: 3>,
1208+ regex=re.compile('DPC: .* ') message='DPC Event' event_category=<EventCategory.RAS: 'RAS'> event_priority=<EventPriority.ERROR: 3>,
11981209 regex=re.compile('(?:\\ [[ ^ \\ ] ] +\\ ] \\ s* )?LNetError:.* ko2iblnd:\\ s* No matching interfaces', re.IGNORECASE) message='LNet: ko2iblnd has no matching interfaces' event_category=<EventCategory.IO: 'IO'> event_priority=<EventPriority.WARNING: 2>,
11991210 regex=re.compile('(?:\\ [[ ^ \\ ] ] +\\ ] \\ s* )?LNetError:\\ s* .* Error\\ s* -?\\ d+\\ s+starting up LNI\\ s+\\ w+', re.IGNORECASE) message='LNet: Error starting up LNI' event_category=<EventCategory.IO: 'IO'> event_priority=<EventPriority.WARNING: 2>,
12001211 regex=re.compile('LustreError:.* ptlrpc_init_portals\\ (\\ ).* network initiali[ sz] ation failed', re.IGNORECASE) message='Lustre: network initialisation failed' event_category=<EventCategory.IO: 'IO'> event_priority=<EventPriority.WARNING: 2>
12011212] `
12021213
12031214### Regex Patterns
12041215
1205- * 46 items defined*
1216+ * 57 items defined*
12061217
12071218- ** Built-in Regexes:**
12081219- - Out of memory error: ` (?:oom_kill_process.*)|(?:Out of memory.*) `
@@ -1224,7 +1235,10 @@ Check dmesg for errors
12241235- - amdgpu Page Fault: ` (amdgpu \w{4}:\w{2}:\w{2}\.\w:\s+amdgpu:\s+\[\S... `
12251236- - Page Fault: ` page fault for address.* `
12261237- - Fatal error during GPU init: ` (?:amdgpu)(.*Fatal error during GPU init)|(Fata... `
1227- - - PCIe AER Error: ` (?:pcieport )(.*AER: aer_status.*)|(aer_status.*) `
1238+ - - PCIe AER Error Status: ` (pcieport [\w:.]+: AER: aer_status:[^\n]*(?:\n[... `
1239+ - - PCIe AER Correctable Error Status: ` (.*aer_cor_status: 0x[0-9a-fA-F]+, aer_cor_mask... `
1240+ - - PCIe AER Uncorrectable Error Status: ` (.*aer_uncor_status: 0x[0-9a-fA-F]+, aer_uncor_... `
1241+ - - PCIe AER Uncorrectable Error Severity with TLP Header: ` (.*aer_uncor_severity: 0x[0-9a-fA-F]+.*)(\n.*TL... `
12281242- - Failed to read journal file: ` Failed to read journal file.* `
12291243- - Journal file corrupted or uncleanly shut down: ` journal corrupted or uncleanly shut down.* `
12301244- - ACPI BIOS Error: ` ACPI BIOS Error `
@@ -1246,7 +1260,15 @@ Check dmesg for errors
12461260- - Mode 2 Reset Failed: ` (?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)? (... `
12471261- - RAS Corrected Error: ` (?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)?(.... `
12481262- - SGX Error: ` x86/cpu: SGX disabled by BIOS `
1263+ - - MMP Error: ` Failed to load MMP firmware qat_4xxx_mmp.bin `
12491264- - GPU Throttled: ` amdgpu \w{4}:\w{2}:\w{2}.\w: amdgpu: WARN: GPU ... `
1265+ - - RAS Poison Consumed: ` amdgpu[ 0-9a-fA-F:.]+:(?:\s*amdgpu:)?\s+(?:{\d+... `
1266+ - - RAS Poison created: ` amdgpu[ 0-9a-fA-F:.]+:(?:\s*amdgpu:)?\s+(?:{\d+... `
1267+ - - Bad page threshold exceeded: ` (amdgpu: Saved bad pages (\d+) reaches threshol... `
1268+ - - RAS Hardware Error: ` Hardware error from APEI Generic Hardware Error... `
1269+ - - Error Address: ` Error Address.*(?:\s.*) `
1270+ - - RAS EDR Event: ` EDR: EDR event received `
1271+ - - DPC Event: ` DPC: .* `
12501272- - LNet: ko2iblnd has no matching interfaces: ` (?:\[[^\]]+\]\s*)?LNetError:.*ko2iblnd:\s*No ma... `
12511273- - LNet: Error starting up LNI: ` (?:\[[^\]]+\]\s*)?LNetError:\s*.*Error\s*-?\d+\... `
12521274- - Lustre: network initialisation failed: ` LustreError:.*ptlrpc_init_portals\(\).*network ... `
0 commit comments