FAILED SMART self-check. BACK UP DATA NOW!

自宅サーバの HDD で SMART警告(Spin_Retry_Countが危ない)が出たので調べてみた。

# smartctl -a /dev/hda
：
(略)
：
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 100 100 046 Pre-fail Always – 258501
2 Throughput_Performance 0x0005 100 100 030 Pre-fail Offline – 0
3 Spin_Up_Time 0x0003 100 100 025 Pre-fail Always – 25602
4 Start_Stop_Count 0x0032 097 097 000 Old_age Always – 1666
5 Reallocated_Sector_Ct 0x0033 099 099 024 Pre-fail Always – 3
7 Seek_Error_Rate 0x000f 100 089 047 Pre-fail Always – 1655
8 Seek_Time_Performance 0x0005 100 100 019 Pre-fail Offline – 0
9 Power_On_Seconds 0x0032 006 006 000 Old_age Always – 14228h+09m+46s
10 Spin_Retry_Count 0x0013 001 001 020 Pre-fail Always FAILING_NOW 21
12 Power_Cycle_Count 0x0032 091 091 000 Old_age Always – 1355
192 Emergency_Retract_Cycle_Ct0x0032 099 099 000 Old_age Always – 19
193 Load_Cycle_Count 0x0032 001 001 000 Old_age Always – 474964
194 Temperature_Celsius 0x0022 100 090 000 Old_age Always – 51 (Lifetime Min/Max 14/57)
195 Hardware_ECC_Recovered 0x001a 100 100 000 Old_age Always – 21048
196 Reallocated_Event_Count 0x0032 099 099 000 Old_age Always – 2
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0
198 Off-line_Scan_UNC_Sector_Ct0x0010 100 100 000 Old_age Offline – 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always – 0
200 Write_Error_Count 0x000f 100 100 060 Pre-fail Always – 5193
203 Run_Out_Cancel 0x0002 099 099 000 Old_age Always – 1528978538545
：
(略)
：

Spin_Retry_Count とは「ディスクを規定の速度までスピンアップしようと再試行を試みた回数」ってことで、ようはディスクがへたってきているってことかな。中古ですし。

Self-Monitoring, Analysis and Reporting Technology (ウィキペディア)
smartmontoolsで取得できるSMART情報一覧 (じーなか.com さん)

それで一応セルフテスト(ロング)を実行しました。

# smartctl -t long /dev/hda
smartctl version 5.38 [i686-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: “Execute SMART Extended self-test routine immediately in off-line mode”.
Drive command “Execute SMART Extended self-test routine immediately in off-line mode” successful.
Testing has begun.
Please wait 42 minutes for test to complete.
Test will complete after [終了予定時刻]
Use smartctl -X to abort test.

ログの確認と診断結果の確認

# smartctl -l selftest /dev/hda
smartctl version 5.38 [i686-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: servo/seek failure 90% 14228 –
# smartctl -H /dev/hda
smartctl version 5.38 [i686-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
Failed Attributes:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
10 Spin_Retry_Count 0x0013 001 001 020 Pre-fail Always FAILING_NOW 21

不良セクタとかならローレベルフォーマットとかで延命処置ができそうですが、これはやばいですねぇ。
どのくらいで完全に死ぬのか興味ありますが、捨てマシンでもないので交換予定です。

純正は FUJITSU MHS2030AT (30GB)でして、代替をいろいろ物色すると 2.5inch, 80GB, 9.5mm, ATA6 で 5,000円くらいです。安いなぁ。
チップセット古めなので大容量は考えないことにした。320GB とかでも 1万円しない。ほしい。
ついでに、この機会に VMware ESXi も試してみようかなぁっと。