I have a small 🙂 NAS box with 6 x 1TB HDD in RAID 6 and 4 x 500GB HDD in RAID 5. Recently thanks to the arrival of my baby girl and a HD handy cam, I was running out of space fast on the array and so when when I saw a decent deal for 1TB drives in NewEgg I picked up couple to add to the RAID 6 volume.
Growing a raid array in linux using mdadm is easy. I made sure I used fdisk to create single large partition on the drive and mark the partition type fd (Linux raid autodetect) prior to adding it to the array.
Device Boot Start End Blocks Id System /dev/sdm1 1 121601 976760001 fd Linux raid autodetect
Adding the drives to the raid is straightforward
mdadm --add /dev/md1 /dev/sdl1 mdadm --add /dev/md1 /dev/sdm1
the device gets added as spares in the array, now grow the array to include the spares. Remember earlier I had 6 drives in it, now am increasing it to 8.
mdadm --grow /dev/md1 --raid-devices=8
Now I have to wait for RAID to resync with the new drives. You can watch the progress by running
watch cat /proc/mdstat
Except I was pissed to see when it said
[>......................] reshape = 0.0% (7892/976759936) finish=2804.9min speed=5463K/sec
This meant the array will take about 47hrs to rebuild at 5MBPS speed, I was pissed but did not worried because I had run into this issue before and all I have to do is increase the minimum and maximum speed limits for the raid device from seriously conservative limits to reasonable limits.
So I went ahead and did the following
echo 50000 > /proc/sys/dev/raid/speed_limit_min echo 200000 > /proc/sys/dev/raid/speed_limit_max
See here for some additional details on these commands.
Now I opened /proc/mdstat, imagine my surprise when the speed improved from 5463K/sec to 5863K/sec and settled there. Now I got aggressive and doubled the numbers to 100000 for min and 400000 for max. Now the speed moved from 5863K/sec to 6028K/sec and settled around there. Now am worried if there was some sort of bus contention or other drivers issues am running into.
And after about 30mins for furious googling, I found clues that increasing the strip_cache_size improves the resync build speed as well, mine was set to default 256
cat /sys/block/md1/md/stripe_cache_size 256
Changed that to 1024
echo 1024 > /sys/block/md1/md/stripe_cache_size
Now the speed increased from 6028K/sec to a nice 15763K/sec
Setting the cache size to 4096 brings it up to a cool 22844K/sec. Bumping it up to 8192 takes it a 27169K/sec, at this point am sensing am reaching point diminishing returns. So as any tech guy would do, I plow through to 16384 and the speed increased to 30029K/sec for a few secs, but came down and settled at 29100K/sec. Now on to 32768, now that made it worse it went down to 27970K/sec. Here is quick table of the improvement per cache size
stripe_cache_size -> speed K/sec -> finish min 256 -> 6028 -> 2581.8 1024 -> 15763 -> 1003.3 4096 -> 22844 -> 703.7 8192 -> 27169 -> 604.5 16384 -> 30029 -> 553.5 32768 -> 27970 -> 601.6
In the end it moved back to 16384 and the speed settled into a range between 28000-30000K/sec and a rebuild finish time of just under 570mins.
[>......................] reshape = 0.1% (70372/976759936) finish=569.3min speed=28773K/sec
An improvement from 47hrs to 9.5hrs, I think I can live that 🙂