this post was submitted on 07 Mar 2024
35 points (88.9% liked)

Linux

48323 readers
648 users here now

From Wikipedia, the free encyclopedia

Linux is a family of open source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991 by Linus Torvalds. Linux is typically packaged in a Linux distribution (or distro for short).

Distributions include the Linux kernel and supporting system software and libraries, many of which are provided by the GNU Project. Many Linux distributions use the word "Linux" in their name, but the Free Software Foundation uses the name GNU/Linux to emphasize the importance of GNU software, causing some controversy.

Rules

Related Communities

Community icon by Alpár-Etele Méder, licensed under CC BY 3.0

founded 5 years ago
MODERATORS
 

My laptop is working just fine. It's from 2018 and it has an NVME drive.

It has an EFI boot partition and other partition with LUKS and LVM on top of that.

Since this week I see these logs from time to time:

Mar 07 17:31:14 almendra kernel: pcieport 0000:00:1d.6: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Mar 07 17:31:14 almendra kernel: pcieport 0000:00:1d.6:   device [8086:34b6] error status/mask=00000001/00002000
Mar 07 17:31:14 almendra kernel: pcieport 0000:00:1d.6:    [ 0] RxErr                  (First)
Mar 07 17:31:14 almendra kernel: pcieport 0000:00:1d.6: AER:   Error of this Agent is reported first
Mar 07 17:31:14 almendra kernel: nvme 0000:02:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Mar 07 17:31:14 almendra kernel: nvme 0000:02:00.0:   device [8086:0975] error status/mask=00000001/00002000
Mar 07 17:31:14 almendra kernel: nvme 0000:02:00.0:    [ 0] RxErr                  (First)

The devices are:

$ lspci -vv | grep 1d.6
00:1d.6 PCI bridge: Intel Corporation Device 34b6 (rev 30) (prog-if 00 [Normal decode])

$ lspci -vv | grep 02:00.0
02:00.0 Non-Volatile memory controller: Intel Corporation Optane NVME SSD H10 with Solid State Storage [Teton Glacier] (prog-if 02 [NVM Express])

The laptop works like always, but I have the impression that the NVME drive is telling me something bad.

It happens from time to time:

$ journalctl --since yesterday | grep -c "nvme 0000:02:00.0: PCIe Bus Error: severity=Corrected, type=Physical"
9

Do you know what does it mean?

you are viewing a single comment's thread
view the rest of the comments
[–] bizdelnick@lemmy.ml 6 points 8 months ago (1 children)

Check also sudo smartctl -a /dev/nvme0

[–] vsis@feddit.cl 1 points 8 months ago

sudo smartctl -a /dev/nvme0

$ sudo smartctl -a /dev/nvme0
[sudo] password for ****:
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.7.6-arch1-2] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       INTEL HBRPEKNX0202A
Serial Number:                      BTTE95101RQM512B-1
Firmware Version:                   G002
PCI Vendor/Subsystem ID:            0x8086
IEEE OUI Identifier:                0x5cd2e4
Controller ID:                      1
NVMe Version:                       1.3
Number of Namespaces:               1
Namespace 1 Size/Capacity:          512,110,190,592 [512 GB]
Namespace 1 Formatted LBA Size:     512
Local Time is:                      Fri Mar  8 12:09:53 2024 CET
Firmware Updates (0x14):            2 Slots, no Reset required
Optional Admin Commands (0x0016):   Format Frmw_DL Self_Test
Optional NVM Commands (0x005f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Log Page Attributes (0x0f):         S/H_per_NS Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg
Maximum Data Transfer Size:         32 Pages
Warning  Comp. Temp. Threshold:     77 Celsius
Critical Comp. Temp. Threshold:     80 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     3.50W       -        -    0  0  0  0        0       0
 1 +     2.70W       -        -    1  1  1  1        0       0
 2 +     2.00W       -        -    2  2  2  2        0       0
 3 -   0.0250W       -        -    3  3  3  3     2000    5000
 4 -   0.0040W       -        -    4  4  4  4     5000    9000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        30 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    32%
Data Units Read:                    6,877,173 [3.52 TB]
Data Units Written:                 9,397,485 [4.81 TB]
Host Read Commands:                 54,359,124
Host Write Commands:                239,213,047
Controller Busy Time:               2,412
Power Cycles:                       536
Power On Hours:                     6,350
Unsafe Shutdowns:                   62
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0

Error Information (NVMe Log 0x01, 16 of 256 entries)
No Errors Logged

Self-test Log (NVMe Log 0x06)
Self-test status: No self-test in progress
Num  Test_Description  Status                       Power_on_Hours  Failing_LBA  NSID Seg SCT Code
 0   Extended          Completed without error                6334            -     -   -   -    -
 1   Short             Completed without error                6334            -     -   -   -    -