Welcome 微信登录

首页 / 操作系统 / Linux / Linux技巧:一次删除一百万个文件的最快方法

最初的测评

昨天,我看到一个非常有趣的删除一个目录下的海量文件的方法。这个方法来自http://www.quora.com/How-can-someone-rapidly-delete-400-000-files里的Zhenyu Lee。他没有使用findxargs,他很有创意的利用了rsync的强大功能,使用rsync –delete将目标文件夹以一个空文件夹来替换。之后,我做了一个实验来比较各种方法。让我吃惊的是,Lee的方法要比其它的快的多。下面就是我的测评。环境:
  • CPU: Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz
  • MEM: 4G
  • HD: ST3250318AS: 250G/7200RPM
Method# Of FilesDeletion Time
rsync -a –delete empty/ s1/10000006m50.638s
find s2/ -type f -delete100000087m38.826s
find s3/ -type f | xargs -L 100 rm100000083m36.851s
find s4/ -type f | xargs -L 100 -P 100 rm100000078m4.658s
rm -rf s5100000080m33.434s
使用 –delete–exclude,你可以选择性删除符合条件的文件。还有一点,当你需要保留这个目录做其它用处时,这种方法是再适合不过了。

重新测评

几天前,Keith-Winstein在回复Quora上的这个帖子时说我之前的测评无法复制,因为操作的时间持续的太久。我澄清一下,这些数据过大,可能是因为我的计算机在过去的几年里做的事太多,测评中可能存在一些文件系统错误。但我不确定是这些原因。现在好了,我弄了一天比较新的计算机,把测评再做一次。这次我使用/usr/bin/time,它能提供更详细的信息。下面就是新的结果。(每次都是1000000个文件。每个文件的体积都是0。)
CommandElapsedSystem Time%CPUcs (Vol/Invol)
rsync -a –delete empty/ a10.601.3195106/22
find b/ -type f -delete28.5114.465214849/11
find c/ -type f | xargs -L 100 rm41.6920.605437048/15074
find d/ -type f | xargs -L 100 -P 100 rm34.3227.8289929897/21720
rm -rf f31.2914.804715134/11

原始输出

# method 1~/test $ /usr/bin/time -vrsync -a --delete empty/ a/Command being timed: "rsync -a --delete empty/ a/"User time (seconds): 1.31System time (seconds): 10.60Percent of CPU this job got: 95%Elapsed (wall clock) time (h:mm:ss or m:ss): 0:12.42Average shared text size (kbytes): 0Average unshared data size (kbytes): 0Average stack size (kbytes): 0Average total size (kbytes): 0Maximum resident set size (kbytes): 0Average resident set size (kbytes): 0Major (requiring I/O) page faults: 0Minor (reclaiming a frame) page faults: 24378Voluntary context switches: 106Involuntary context switches: 22Swaps: 0File system inputs: 0File system outputs: 0Socket messages sent: 0Socket messages received: 0Signals delivered: 0Page size (bytes): 4096Exit status: 0# method 2Command being timed: "find b/ -type f -delete"User time (seconds): 0.41System time (seconds): 14.46Percent of CPU this job got: 52%Elapsed (wall clock) time (h:mm:ss or m:ss): 0:28.51Average shared text size (kbytes): 0Average unshared data size (kbytes): 0Average stack size (kbytes): 0Average total size (kbytes): 0Maximum resident set size (kbytes): 0Average resident set size (kbytes): 0Major (requiring I/O) page faults: 0Minor (reclaiming a frame) page faults: 11749Voluntary context switches: 14849Involuntary context switches: 11Swaps: 0File system inputs: 0File system outputs: 0Socket messages sent: 0Socket messages received: 0Signals delivered: 0Page size (bytes): 4096Exit status: 0# method 3find c/ -type f | xargs -L 100 rm~/test $ /usr/bin/time -v ./delete.shCommand being timed: "./delete.sh"User time (seconds): 2.06System time (seconds): 20.60Percent of CPU this job got: 54%Elapsed (wall clock) time (h:mm:ss or m:ss): 0:41.69Average shared text size (kbytes): 0Average unshared data size (kbytes): 0Average stack size (kbytes): 0Average total size (kbytes): 0Maximum resident set size (kbytes): 0Average resident set size (kbytes): 0Major (requiring I/O) page faults: 0Minor (reclaiming a frame) page faults: 1764225Voluntary context switches: 37048Involuntary context switches: 15074Swaps: 0File system inputs: 0File system outputs: 0Socket messages sent: 0Socket messages received: 0Signals delivered: 0Page size (bytes): 4096Exit status: 0# method 4find d/ -type f | xargs -L 100 -P 100 rm~/test $ /usr/bin/time -v ./delete.shCommand being timed: "./delete.sh"User time (seconds): 2.86System time (seconds): 27.82Percent of CPU this job got: 89%Elapsed (wall clock) time (h:mm:ss or m:ss): 0:34.32Average shared text size (kbytes): 0Average unshared data size (kbytes): 0Average stack size (kbytes): 0Average total size (kbytes): 0Maximum resident set size (kbytes): 0Average resident set size (kbytes): 0Major (requiring I/O) page faults: 0Minor (reclaiming a frame) page faults: 1764278Voluntary context switches: 929897Involuntary context switches: 21720Swaps: 0File system inputs: 0File system outputs: 0Socket messages sent: 0Socket messages received: 0Signals delivered: 0Page size (bytes): 4096Exit status: 0# method 5~/test $ /usr/bin/time -v rm -rf fCommand being timed: "rm -rf f"User time (seconds): 0.20System time (seconds): 14.80Percent of CPU this job got: 47%Elapsed (wall clock) time (h:mm:ss or m:ss): 0:31.29Average shared text size (kbytes): 0Average unshared data size (kbytes): 0Average stack size (kbytes): 0Average total size (kbytes): 0Maximum resident set size (kbytes): 0Average resident set size (kbytes): 0Major (requiring I/O) page faults: 0Minor (reclaiming a frame) page faults: 176Voluntary context switches: 15134Involuntary context switches: 11Swaps: 0File system inputs: 0File system outputs: 0Socket messages sent: 0Socket messages received: 0Signals delivered: 0Page size (bytes): 4096Exit status: 0我真的十分好奇为什么Lee的方法要比其它的快,竟然比rm -rf也要快。如果有人知道,请写在下面,非常感谢。[英文原文: A faster way to delete millions of files in a directory ]Ubuntu 12.04嵌入式交叉编译环境arm-linux-gcc搭建过程图解解决CentOS系统mini安装后的虚拟机工具(VMware Tools)安装问题相关资讯      Linux技巧 
  • Linux技巧分享:如何检查PDF中使用  (08/27/2014 19:11:58)
  • [技巧分享]如何在Linux中阻止其它  (06/29/2014 19:56:43)
  • 35款基于terminal的Linux应用  (02/07/2011 12:54:14)
  • Linux 目录导航技巧  (07/18/2014 08:29:50)
  • 8个有趣的Linux提示与技巧!  (01/31/2014 08:06:39)
  • Linux下VirtualBox虚拟机中使用USB  (02/04/2011 08:21:35)
本文评论 查看全部评论 (0)
表情: 姓名: 字数