从 hdfs web 面板上看到的报错信息,明确指出了出错的 datanode 以及数据位置。
参考官方文档的磁盘热更换教程
首先登录目标 datanode 所在的主机 (192.168.34.30),修改 hdfs-site.xml
配置文件
vim /usr/local/hadoop/etc/hadoop/hdfs-site.xml
找到 dfs.datanode.data.dir
(也可能是弃用的 dfs.data.dir
)
在 value 中移除故障目录
<property>
<name>dfs.data.dir</name>
- <value>/disk/sata1/hdfs/data,/disk/sata2/hdfs/data,/disk/sata3/hdfs/data,/disk/sata4/hdfs/data,/disk/sata5/hdfs/data,/disk/sata6/hdfs/data,/disk/sata7/hdfs/data,/disk/sata8/hdfs/data,/disk/sata9/hdfs/data,/disk/sata10/hdfs/data,/disk/sata11/hdfs/data,/disk/sata12/hdfs/data</value>
+ <value>/disk/sata1/hdfs/data,/disk/sata2/hdfs/data,/disk/sata3/hdfs/data,/disk/sata4/hdfs/data,/disk/sata5/hdfs/data,/disk/sata6/hdfs/data,/disk/sata7/hdfs/data,/disk/sata8/hdfs/data,/disk/sata9/hdfs/data,/disk/sata10/hdfs/data,/disk/sata11/hdfs/data</value>
</property>
Hadoop 集群中任意节点上运行管理命令 /usr/local/hadoop/bin/hdfs
# 重载配置文件
hdfs dfsadmin -reconfig datanode 192.168.34.30:50020 start
# 查看状态
hdfs dfsadmin -reconfig datanode 192.168.34.30:50020 status
出现如下日志,表示修改成功。
SUCCESS: Change property dfs.datanode.data.dir
From: "[DISK]file:/disk/sata1/hdfs/data/,[DISK]file:/disk/sata2/hdfs/data/,[DISK]file:/disk/sata3/hdfs/data/,[DISK]file:/disk/sata4/hdfs/data/,[DISK]file:/disk/sata5/hdfs/data/,[DISK]file:/disk/sata6/hdfs/data/,[DISK]file:/disk/sata7/hdfs/data/,[DISK]file:/disk/sata8/hdfs/data/,[DISK]file:/disk/sata9/hdfs/data/,[DISK]file:/disk/sata10/hdfs/data/,[DISK]file:/disk/sata11/hdfs/data/"
To: "/disk/sata1/hdfs/data,/disk/sata2/hdfs/data,/disk/sata3/hdfs/data,/disk/sata4/hdfs/data,/disk/sata5/hdfs/data,/disk/sata6/hdfs/data,/disk/sata7/hdfs/data,/disk/sata8/hdfs/data,/disk/sata9/hdfs/data,/disk/sata10/hdfs/data,/disk/sata11/hdfs/data"
此时 hdfs web 面板上的错误不会消失,不会影响后续操作。
请求运维更换新的磁盘之后,同样的流程,修改配置文件,将新的磁盘加入,然后 reconfig 。
一切顺利的话, Failed Volumes 错误就消失了。