Jump to content
45Doll

Flaky Computer Was The SSD

Recommended Posts

I thought I'd share this little troubleshooting story, since the culprit was the last thing on my list of possible suspects.

I have two identical Dell PCs running Windows 10 Pro, one for my daily use and one for R&D. Been using them for over five years.

Monday, suddenly, my personal one starts beeping out the notification tones periodically. Like a notification has popped up, or new mail has arrived. Can't find or see why. Then the next day, in addition to beeping, the PC intermittently freezes. Mouse cursor frozen, screen activity frozen, everything. For 10 to 30 seconds. Then the Num Lock light will blink, the PC beep and everything moves again. Goes on all day. Just updated to the 2004 release 10 days ago. Is that the reason?

Then by Wednesday when trying to do backups to cover myself (I found the scheduled ones had failed), I'm getting the BSOD DPC Watchdog Violation. Research that, and lots of possibilities but most likely disk related. Do all the file system checking, DISM, SFC etc. No errors found. Finally I get a clean backup to complete, so I can start looking deeper. On my personal PC the power indicator is dark when the PC is on, instead of glowing white. So I suspect some hardware issue.

I have an iSCSI drive attached to a server used for File History. I check that and there's an old target trying to connect in the initiator. And there's a lot of iSCSI warnings in the logs. Is that the problem? The odds say the DPC violation is a disk error, and to the PC the iSCSI drive is a disk. Delete the obsolete target. Nope, still BSOD'ing.

I'll skip the next several hours of BS and cut to the chase. I move the system SSD over to my R&D PC, and the problem moves with it. So it's not the PC chassis. Order a new SSD, copy the image to it, install and boot from it, and the problem is gone. The OS image is OK. The Samsung EVO SSD was apparently burping. It's just over four years old.

I put in SSDs first and foremost for performance, but also because they're not subject to mechanical problems like classic disks which are in the top two reasons for PC failures. Now I'm a witness to their electronic failure. And not a hard failure mind you, just enough to piss you off!

Share this post


Link to post
Share on other sites
20 hours ago, 45Doll said:

I thought I'd share this little troubleshooting story, since the culprit was the last thing on my list of possible suspects.

I have two identical Dell PCs running Windows 10 Pro, one for my daily use and one for R&D. Been using them for over five years.

Monday, suddenly, my personal one starts beeping out the notification tones periodically. Like a notification has popped up, or new mail has arrived. Can't find or see why. Then the next day, in addition to beeping, the PC intermittently freezes. Mouse cursor frozen, screen activity frozen, everything. For 10 to 30 seconds. Then the Num Lock light will blink, the PC beep and everything moves again. Goes on all day. Just updated to the 2004 release 10 days ago. Is that the reason?

Then by Wednesday when trying to do backups to cover myself (I found the scheduled ones had failed), I'm getting the BSOD DPC Watchdog Violation. Research that, and lots of possibilities but most likely disk related. Do all the file system checking, DISM, SFC etc. No errors found. Finally I get a clean backup to complete, so I can start looking deeper. On my personal PC the power indicator is dark when the PC is on, instead of glowing white. So I suspect some hardware issue.

I have an iSCSI drive attached to a server used for File History. I check that and there's an old target trying to connect in the initiator. And there's a lot of iSCSI warnings in the logs. Is that the problem? The odds say the DPC violation is a disk error, and to the PC the iSCSI drive is a disk. Delete the obsolete target. Nope, still BSOD'ing.

I'll skip the next several hours of BS and cut to the chase. I move the system SSD over to my R&D PC, and the problem moves with it. So it's not the PC chassis. Order a new SSD, copy the image to it, install and boot from it, and the problem is gone. The OS image is OK. The Samsung EVO SSD was apparently burping. It's just over four years old.

I put in SSDs first and foremost for performance, but also because they're not subject to mechanical problems like classic disks which are in the top two reasons for PC failures. Now I'm a witness to their electronic failure. And not a hard failure mind you, just enough to piss you off!

I've  had two SSD failures.  One was flaky, just random nonsense.   That was a 120gb Sandisk SSD and I stopped buying those after that(probably 6 years ago).  The other was an M.2 Samsung 860 Evo.   Could see that one in the bios, couldn't see it to install the OS.  I've installed hundreds of SSDs at this point and everything in the last 3 years has been solid.

 

 

Share this post


Link to post
Share on other sites
34 minutes ago, voyager9 said:

We avoid SSD’s at work because the MTBF is significantly lower than spinners.  The last time we did a hardware refresh was years ago so maybe their better now. 

I've got two piles of dead drives at work. Hard disks on right, SSDs on left.  The hard disks is actually 3 piles, each containing 12-15 and next to it is a smaller pile of laptop disks counting a dozen or so.  

On the left there are two SSDs.

I will continue to install only SSDs in working machines unless the ratio starts to change.

I do install spinning platters in servers, SANs and NVRs just because of the space requirements.   Even our AS/400 though has stacks of SSDs as the working storage but it archives out to platters during the backup cycles.

 

 

 

Share this post


Link to post
Share on other sites

I'll have to look into this a little bit. I don't do this for a living any more but I'm still interested in it.

The performance gain on a PC is so large I can't ignore it. Whatever the comparative MTBF turns out to be I'll live with it. In any case I would retain my frequent backup schedules and multiple paths. I've seen what having no backups can do to someone's life. Or business!!!

I did find this article on the topic. Not sure if it's definitive, but it's a place to start. My SSDs are Samsung, and starting with the 860 series Samsung Magician will provide information on performance and health. I just installed that yesterday.

Share this post


Link to post
Share on other sites
2 hours ago, Malsua said:

I've got two piles of dead drives at work. Hard disks on right, SSDs on left.  The hard disks is actually 3 piles, each containing 12-15 and next to it is a smaller pile of laptop disks counting a dozen or so.  

On the left there are two SSDs.

I will continue to install only SSDs in working machines unless the ratio starts to change.

I do install spinning platters in servers, SANs and NVRs just because of the space requirements.   Even our AS/400 though has stacks of SSDs as the working storage but it archives out to platters during the backup cycles.

Interesting. YRMV.  My example is purely anecdotal.  Most of the system is “diskless” anyway, net boot backed by iSCSI or HANFS.  

Share this post


Link to post
Share on other sites

The geek at the repair desk who restored win 10 for me said the SSD storage was light years faster than a traditional disk.  I always kind of avoided them while shopping because the space just sounded too small.  I guess I was jaded from way back when I used to build desktops from parts at those big expo center shows like at Raritan center, when  bigger was always better.

 

You have a 486sx?  Bitch, I just got  486dx!!    :p

 

I still hate windows 10 because they decide to drop an update on  you whenever they decide to, and then you are stuck for a couple hours while the thing is force-fed.  Now my drive just spins at 100% mostly all the time because of some mystery apps Microsoft dropped on.  Can't even stop them in task manager without getting threats that doing so will cause loss of files, loss of life, a plague of locusts, and possibly an asteroid crushing my house.

Share this post


Link to post
Share on other sites

I have no idea how many personal spinning HDDs I've had fail on me.  I run WAY more hardware than the average home user admittedly, but still a LOT of failures.  Maxtor, WD, Seagate, IBM Deathstar...guessing at least 12-15.  I held off on SSDs for a while because of paranoia of total bytes written, and admittedly, on tiny SSDs, the TBW  ratings are kinda low.  On modern, larger capacity drives, as well as MLC drives, it seems to be much less of an issue (at higher cost).

Went to a 120GB Sandisk as my first SSD 6 years ago.  Lasted all of 3 months, sudden, instant complete death - doesn't even show up in the BIOS.  Was pissed.  Went to a Samsung 850 Pro 256GB, which is now a backup drive in my current PC.  It has about 7TB written on it with zero issues so far.  Current PC has a 512GB 860 Pro SSD, 1024GB GB 860 Evo SSD, and a few spinner drives in addition to the 850.  A good spinner like a WD Black is actually reasonably fast for a 7200 RPM HDD, but nothing like the SSDs.  I've moved to SSD as my primary drive on most of my modern PCs.

There was some company which did a non-scientific stress test on a sample of a few SSDs and a couple managed to last into PB range of bytes written - https://techreport.com/review/27909/the-ssd-endurance-experiment-theyre-all-dead/

I'll see your DX and raise you an Am486 DX2-80MHz with a 40MHz bus.  Was faster than any other 486 I ever used.

  • Informative 1

Share this post


Link to post
Share on other sites

So I'll keep using SSDs for everyday performance, and make sure my backup schedule is robust in case my natural life exceeds the lifespan of my SSDs.

You could of course treat SSDs like your car batteries. Know what the warranty period is and replace them just before it ends. You know the failure is coming, because the manufacturer told you.

Share this post


Link to post
Share on other sites
11 hours ago, 45Doll said:

So I'll keep using SSDs for everyday performance, and make sure my backup schedule is robust in case my natural life exceeds the lifespan of my SSDs.

You could of course treat SSDs like your car batteries. Know what the warranty period is and replace them just before it ends. You know the failure is coming, because the manufacturer told you.

I have an _EXTREMELY_ disk intensive process for press ready rip files.  Each imprint file is 27-35mb and each run is 5k to 250k imprint files. It runs through an ONYX RIP on a very fast machine with Samsung M.2 pro drives and ripping can take 8-12 hours for the big jobs.  The rip internally probably creates 2 temp files for each imprint although they may get created on the OS drive not the working drive.   Granted, it doesn't run big jobs every day but 3-4 months a year, it's chugging at least 8-12 hours a day. After about 18 months, I checked it, thinking I should consider replacing it due to the volume. It had only written 65TB. I believe the warranty on those is 800TB written so under  10% usage. The only reason I replaced it last year was that the motherboard failed and I upgraded the whole box. I put the m.2 in a desktop and while I don't remember what the actual written value was, I remember not being alarmed about it so I put it out to pasture where it can lazily provide maybe another 1TB for the rest of it's days.

Share this post


Link to post
Share on other sites
1 hour ago, Malsua said:

It runs through an ONYX RIP on a very fast machine with Samsung M.2 pro drives and ripping can take 8-12 hours for the big jobs.  

Well I had to look up ONYX RIP. Interesting. I spent a couple years in the production end of newspaper publishing, and needless to say back then things were much more primitive.

I think my last word on this subject will be to reference this article. I know we can all do research, but this one gets right to my main interest, and concludes that MTBF may not really be a useful measure to compare SSDs to hard disks. There's other good info in there too.

I just bought another Dell refurbished OptiPlex in light of this little expedition, to either upgrade my personal machine or serve as a backup box. Not sure yet. But the new one came in configured as UEFI instead of legacy BIOS boot. Never configured a PC that way before, so I've got my next continuing education topic to bone up on.

When I was in commercial land, I pitched myself as a just-in-time-consultant. I'd learn what I needed to know when I needed to know it. Somewhere in the late 80's I realized I was wasting my time learning everything down to the lowest levels, because by the time I got there everything changed again. That worked well. But now that I'm retired there's a lot less I want to learn! :read:

Share this post


Link to post
Share on other sites

There is an epitaph to the story.

The new Dell I bought came with a SSD. Fine. So to play with UEFI and a new Windows installation I grabbed a Seagate Barracuda 7200 hard drive from my parts pile and installed it in the Dell.

After the Windows 10 fresh install, the first attempt at boot resulted in a hardware message that 'No boot device found, hit F1 to continue or F2 to enter configuration'. And if I hit F1 the PC booted as expected with the new installation on the hard disk and everything worked properly. Why did it fail on first attempt? If there really was no known boot device, the PC would never boot.

I got this same symptom playing with it for an entire day. I tried everything I could think of but could not get the drive to boot on the first attempt. Then I found the culprit: it was the hard drive I pulled out of the pile. And I don't know why. I even re-initialized the drive to a GPT drive prior to another fresh install and still had the same symptom. Meanwhile, two other drives from my pile behaved as expected with no issue booting after a new install. I'm left thinking it was some kind of timing problem and that this particular drive has a weird problem responding in time during the boot process. The other drives that worked OK were also Seagate 7200s. All other variables held constant.

Such is the price of progress.

 

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...

  • Recently Browsing   0 members

    No registered users viewing this page.



×
×
  • Create New...