Thanks for the comment. Yes the queue management issues for VVOLS is sort of an internal storage subsystem problem but could present some interesting impacts to the unwary. Rgds, Ray
Brook, Thanks for the info. I suppose the CME events are happening more often than I imagined. The fact that we (the Earth) were on the missing side last time was lucky, that time. But the odds seem to be correct and take into account that there's only a (small) volume of space impacted by any one CME. Existential risk is real, we can ignore them or try to consider them in our planning future. I'm all for realizing they are real and trying to plan our way to mitigate them as best we can. There's obviously an economic (opportunity) cost to dealing with these but hopefully we can minimize the pain.
Vic, No i have not been monitoring the litigation. Would be interested to hear about HyperVault...
Roberto, The disk industry continues to increase density and performance the only way they know how, but TDMR represents just one of the possible paths ahead. I thought it was an interesting approach inasmuch as they use more (2X) signal to reduce the (more) noise coming off the media. But why stop there if 2 is good, 3's better. The problem in disk density growth is that the easy engineering tricks to get more data on/off magnetic media seem to be going away. Each new trick seems to last less time than the one before. GMR heads seemed to last almost 2 decades. But Helium, HAMR, TDMR, & SMR seem to be shorter term fixes and seem likely to only last 1/4th as long... Which makes R&D more expensive which makes disk capacity more expensive. We seem to hurdling towards a marginal utilization wall. I just don't know where/when it will go vertical. Probably one reason we are down to 3 rotating disk suppliers...
Hans, I couldn't agree more, but it still scares me... Maybe I'm a bit to paranoid...
Joe, I guess I don't see anywhere on Netlist's website that they have a similar product. That being said, it's often not the first to come out with a technology that wins but the organization that generates the most adoption with a technology.
Thanks for the comment. I would have to say that ViPR has a ways to go to prove it's overall value that's just not there today. From my perspective, how well they play in the data plane will determine how useful it is in the long run. But they have gone out of their way to leave it alone as much as they can.
Rob - Sorry, I had no intention of ignoring Isilon and ViPR supports Isilon at the start. My mistake. I will update the post.
Anderson, Thanks for your comment. Yes it should be very easy to randomize the predictive failure processing to disperse wear out failures. Not sure anyone is currently doing this but it's one of the many things I suggested in my post. And yes, the wearout level for EFDs is pretty high which some customers may never approach in their storage lifetime. Most enterprise storage systems with SSD use probably also have other ERP capabilities beyond those outlined in my post to make sure the data loss doesn't happen. But for DIYers, they need to be more aware of what they are getting when they place SSDs in RAID groups.
Dave, Thanks for your comment. While I agree with most of what you have to say I don't agree with it all.
1) Wear leveling with Defect skipping is not the same as defect skipping alone. The later should exhibit more randomized wear failures than the former which is what I am getting at. Yes, there are many different types of drive and SSD failures out there but given a "mature" development process (which may never happen in my lifetime) failure rates should be governed by more by attributes of the componentry and technology rather than code. But my main concern is with the variance of the failure rate. I have yet to see any statistics that show the variance of disk or SSD failure rates which have a distinct bearing on the discussion.
2) Yes, SSDs read much faster than disks, and as such may have a faster rebuild time. But not all SSDs write (sequentially) faster than disks. So this may slow down rebuild time. I have no stats on SSD RAID group rebuild times to know if they are significantly faster than disk, but I am guessing as MLC SSD capacities go up and MLC write times slow down, someday MLC SSD RAID group rebuilds won't be that much faster than 15Krpm disks of comparable capacities.
3) I am not as much worried by all flash arrays or even hybrid disk-flash arrays, they all should understand these issues much better than I. However, like you say DIYer doesn't necessarily share this knowledge and needs to be aware of these concerns.
4) Like I say in the post, a more randomized view of predictive maintenance is one of the things that can help. Having a more conservative approach to when to swap a drive out doesn't necessarily make them have a more wider distribution of failure rates.
My original goal in writing the post was to show how antifragility can be applied to RAID groups disk and SSD and what we can do to make SSDs a better RAID group participant. I had no intention of belittling the performance and other advantages that can come with SSDs which we all know so well.