How to Check Your SSD's Health: SMART Data, Write Endurance, and When to Replace
Your SSD is not immortal. Every write operation degrades the NAND flash cells inside it by a measurable, predictable amount. The good news: modern SSDs ship with extensive self-monitoring capabilities called SMART (Self-Monitoring, Analysis and Reporting Technology) that track exactly how much life your drive has consumed and how much remains. The bad news: almost nobody checks this data until the drive is already failing.
Unlike mechanical hard drives that often fail with audible clicks and grinding noises, SSDs die silently. One day your drive responds to every read and write; the next day it doesn't. There's no warning sound. But there is warning data — buried in SMART attributes that most users never look at. This guide explains every SMART metric that matters for SSDs, how to read them, and the actual thresholds where you should start planning a replacement.
What SMART Data Actually Tells You
SMART is an interface built into every modern storage drive — SSDs, NVMe drives, and traditional HDDs alike. It continuously records internal health metrics: how many bytes have been written, how many error corrections have occurred, how much spare capacity remains, and how hot the controller is running. Think of it as your drive's medical record.
For SSDs specifically, SMART data answers the three questions that matter most: How much write endurance have I consumed? Are any cells failing? Is the controller operating within safe parameters? The challenge is that SMART attributes are reported as numeric IDs with cryptic names, and different manufacturers use slightly different attribute sets. NVMe drives standardize this better than SATA SSDs, but you still need to know what to look for.
The SMART Attributes That Matter for SSDs
Not all SMART attributes are equally important. Some are diagnostic curiosities; others are the difference between catching a failing drive and losing your data. Here are the attributes you should actually monitor:
| SMART Attribute | What It Means | What to Watch For |
|---|---|---|
| Percentage Used (NVMe) / Wear Leveling Count (SATA) | How much of the drive's rated endurance has been consumed, expressed as a percentage | Below 80% = healthy |
| Available Spare | Percentage of spare NAND blocks remaining for replacing failed cells | Below 10% = plan replacement |
| Available Spare Threshold | Manufacturer-defined minimum spare level before the drive is considered at risk | At or below threshold = replace now |
| Data Units Written | Total data written to the drive in 512-byte units (multiply by 512,000 for GB) | Compare against drive's TBW rating |
| Media and Data Integrity Errors | Number of uncorrectable data errors detected by the controller | Any value above 0 = investigate immediately |
| Critical Warning | Bit flags indicating spare space depletion, temperature exceedance, or reliability degradation | Any non-zero value = urgent |
| Temperature | Current controller temperature in Celsius | Sustained above 70°C = throttling risk |
| Power On Hours | Total hours the drive has been powered on since manufacture | Context for wear rate calculation |
| Unsafe Shutdowns | Number of times the drive lost power without a proper shutdown command | High counts increase firmware risk |
lightbulb NVMe vs. SATA SMART Differences
NVMe drives use a standardized SMART/Health Information Log defined by the NVM Express specification, making their health data consistent across manufacturers. SATA SSDs use the older ATA SMART framework where attribute IDs and meanings vary between Samsung, Crucial, Western Digital, and others. Always check your specific manufacturer's documentation for SATA drive attribute definitions.
Understanding TBW: Write Endurance Explained
Every SSD has a TBW (Terabytes Written) rating — the total amount of data the manufacturer guarantees can be written before wear becomes a reliability concern. This rating is directly tied to the type of NAND flash used in the drive and its capacity.
NAND Types and Their Endurance
The endurance differences between NAND types are substantial. Each cell stores data by trapping electrons in a floating gate, and each program/erase cycle damages the oxide layer slightly. More bits per cell means more voltage levels to distinguish, which means tighter tolerances and faster wear:
| NAND Type | Bits Per Cell | Typical P/E Cycles | Endurance Class |
|---|---|---|---|
| SLC (Single-Level Cell) | 1 | 50,000 - 100,000 | Enterprise / Industrial |
| MLC (Multi-Level Cell) | 2 | 3,000 - 10,000 | High-endurance consumer |
| TLC (Triple-Level Cell) | 3 | 1,000 - 3,000 | Standard consumer |
| QLC (Quad-Level Cell) | 4 | 500 - 1,000 | Budget / read-heavy workloads |
For context, a typical 1TB TLC SSD like the Samsung 990 Pro is rated at 600 TBW. If you write 50 GB per day — which is significantly above average for most desktop users — that's 18.25 TB per year, giving you approximately 32 years before hitting the TBW limit. Even heavy workstation use at 100 GB/day yields 16 years. The TBW rating is not the bottleneck for most users; controller failure and firmware bugs are statistically more likely to end your drive's life than NAND wear.
QLC drives tell a different story. A 1TB QLC drive might carry a 200 TBW rating — one-third the endurance of its TLC equivalent. At the same 50 GB/day write rate, that's roughly 11 years. Still plenty for a typical desktop user, but if you're running database workloads, video editing scratch disks, or heavy virtual machine operations, QLC endurance can become a legitimate concern within 3-5 years.
warning Write Amplification Multiplies Your Actual Writes
The data you write to your SSD is not the only data the controller writes to NAND. Garbage collection, wear leveling, and over-provisioning management cause the controller to write additional data internally. This write amplification factor (WAF) typically ranges from 1.1x to 3x depending on workload patterns and drive fullness. A drive that's 95% full has significantly higher write amplification than one at 50% capacity because the controller has fewer free blocks to work with. Keep at least 10-20% of your SSD free to minimize write amplification.
How to Read SSD Health Data on Windows
CrystalDiskInfo (Free, Quick Check)
CrystalDiskInfo is the most widely used free tool for reading SSD SMART data on Windows. It displays a simple health status indicator (Good, Caution, Bad) along with the raw SMART attribute table. For NVMe drives, it reads the standard health log and shows Percentage Used, Temperature, and Data Units Written in a human-readable format.
The limitation of CrystalDiskInfo is that it's a point-in-time snapshot. You open it, see the current values, and close it. There's no historical tracking, no trend analysis, and no alerting. You have to remember to check it manually — which means most users check it exactly once and then forget about it until their drive exhibits problems.
STX.1 System Monitor (Continuous Monitoring)
STX.1 integrates SSD health monitoring into its unified system dashboard alongside CPU, GPU, and memory metrics. Rather than requiring you to open a separate tool and manually inspect SMART tables, STX.1 surfaces the critical SSD health indicators — Percentage Used, temperature, and available spare — in your daily monitoring view. The drive temperature is tracked over time alongside your other thermal data, so you can correlate SSD temperature spikes with heavy workloads and identify whether your drive needs better airflow.
Where STX.1 adds real value over snapshot tools is its historical data. By recording drive health metrics over time, you can observe the rate of wear, not just the current value. A drive that's at 15% used after 3 years is wearing at a fundamentally different rate than one that hit 15% in 6 months. The trend tells you when to plan your replacement purchase — the snapshot only tells you it's fine right now.
Detecting Performance Degradation Before Failure
SSDs don't just die — they slow down first. As NAND cells wear and the controller has to work harder to maintain data integrity through increased error correction, read and write latencies increase. This degradation is gradual and often goes unnoticed until it becomes severe, but you can detect it early with benchmarking.
Baseline Your Drive When New
Run a benchmark like CrystalDiskMark when your SSD is new and save the results. Record the sequential read/write speeds and, more importantly, the random 4K read/write IOPS. These random access numbers are the most sensitive indicator of controller health and NAND degradation. Repeat the benchmark every 6-12 months and compare.
What Degradation Looks Like
- Sequential speeds drop 10-20%: Could indicate thermal throttling or a nearly-full drive (keep 10-20% free). Check temperature during the benchmark.
- Random 4K write IOPS drop 30%+: This is the earliest sign of meaningful NAND wear. The controller is spending more time on error correction and garbage collection.
- Intermittent latency spikes: Brief pauses or freezes during normal use that weren't present before. Often caused by the controller relocating data from failing cells to spare blocks.
- SMART "Reallocated Sector Count" increasing: The drive is actively mapping out bad cells. Small increases over years are normal; rapid increases over weeks are not.
lightbulb The "Nearly Full" Performance Trap
Before assuming your SSD is degrading, check how full it is. SSDs perform significantly worse when filled above 80-90% capacity because the controller runs out of free blocks for efficient garbage collection. A drive that feels slow at 95% full may return to full speed after freeing up space. Always rule out capacity issues before suspecting NAND wear.
When to Actually Replace Your SSD
The internet is full of premature SSD replacement advice driven by misunderstood SMART data and worst-case-scenario thinking. Here are the actual thresholds where replacement becomes a data protection decision, not paranoia:
Replace Now (Urgent)
- Media and Data Integrity Errors > 0: This means the drive has encountered data corruption it could not correct. Back up immediately and replace.
- Critical Warning flags set: The drive's own firmware is telling you it's unreliable. This is the most definitive failure indicator.
- Available Spare at or below threshold: The drive has exhausted its reserve capacity for replacing worn cells. Every additional write increases the risk of data loss.
- Consistent read errors or system freezes: If your system hangs briefly during disk access, your controller may be struggling with error correction on heavily worn cells.
Plan Replacement (1-3 Months)
- Percentage Used above 90%: You've consumed the vast majority of the drive's rated endurance. The drive may continue working well past 100%, but you're outside the manufacturer's warranty and reliability guarantee.
- Available Spare below 10%: The drive's reserve pool is nearly depleted. It still functions, but its ability to gracefully handle future cell failures is compromised.
- Random 4K write performance degraded 50%+ from baseline: Significant performance degradation indicates the controller is under strain from accumulating NAND wear.
Monitor Closely (No Immediate Action)
- Percentage Used 50-80%: You've used a meaningful portion of endurance but still have substantial life remaining. Increase monitoring frequency to quarterly benchmark checks.
- Temperature consistently above 60°C: Not an immediate failure risk, but sustained heat accelerates NAND wear. Improve airflow or add a heatsink.
- Unsafe Shutdown count climbing: Frequent power losses stress the drive's firmware and can cause metadata corruption. Address the root cause (UPS, power supply stability) rather than replacing the drive.
warning Percentage Used Can Exceed 100%
The "Percentage Used" indicator reaching 100% does not mean your drive will immediately fail. It means you've consumed the warranted endurance. Many SSDs continue operating normally at 200-300% of their rated endurance, according to long-term endurance testing by Tech Report. However, you're in uncharted territory with no manufacturer support. Use a drive past 100% only with current, verified backups.
SSD Health Monitoring Checklist
Use this schedule to stay ahead of SSD failures without obsessing over SMART data daily:
| Frequency | Action | What to Look For |
|---|---|---|
| Continuous (via STX.1) | Monitor drive temperature alongside CPU/GPU temps | Sustained temps above 70°C under load |
| Monthly | Check Percentage Used and Available Spare in SMART data | Any change greater than 2-3% per month |
| Quarterly | Run CrystalDiskMark and compare to baseline | Random 4K write regression exceeding 20% |
| Annually | Calculate projected remaining lifespan based on wear rate | Less than 2 years of endurance remaining at current rate |
| Immediately | Check SMART if you experience system freezes, BSODs, or file corruption | Media errors, critical warnings, or rapid spare depletion |
Your SSD is a consumable component with a quantifiable lifespan, but that lifespan is almost certainly longer than you think. For the average Windows user writing 20-30 GB per day, even a budget QLC drive will outlast the useful life of the rest of the system. The real value of monitoring isn't preventing unexpected failure — it's having the data to make a calm, informed replacement decision instead of a panicked one.
rocket_launch Track Your SSD Health Alongside Everything Else
STX.1 System Monitor surfaces your SSD's critical health metrics in the same dashboard where you track CPU temps, GPU load, and memory usage. No separate tools, no manual SMART table inspection. Drive temperature trends, real-time readings, and historical data give you the full picture of your storage health — before degradation becomes data loss.
-Rocky