Asm Health Checker Found 1 New Failures Access
| Area | Requirement | |-------|--------------| | Performance | Comparison must complete in < 100ms | | State persistence | Store previous health state (e.g., Redis, SQLite, S3) | | Idempotency | Rerunning same check should not trigger duplicate alerts | | Configurability | Ability to ignore certain failure types from “new failure” detection |
Some common causes for ASM Health Checker failures include:
ASM Health Checker detected one new failure in your environment. This post explains what that means, likely causes, immediate checks you should run, step-by-step troubleshooting, and recommended fixes to restore full health.
The asm health checker found 1 new failures alert is Oracle’s way of saying: "Something in your storage stack is not perfect." In most cases, it's a disk path issue or an offline disk that can be fixed with a few commands. However, ignoring even one failure invites gradual degradation.
By following the diagnostic steps, applying the appropriate fix, and implementing preventive measures like ASMFD and regular scrubbing, you can ensure that this alert remains a minor reminder—not a major incident.
Remember: A healthy ASM is the silent foundation of a robust Oracle database. Listen to its health checker, and your database will thank you with uptime.
Have you encountered an unusual "1 new failure" not covered here? Share your specific error message—solutions vary by Oracle version (11.2 to 23ai) and storage vendor.
Here’s a structured feature implementation for “ASM Health Checker found 1 new failure” — suitable for a monitoring or alerting system.
If you manage Oracle Grid Infrastructure (GI) or a standalone Automatic Storage Management (ASM) instance, one notification can send a chill down your spine: "ASM health checker found 1 new failures."
This message, often found in your alert log, crsd.log, or email alerts from Enterprise Manager (EM12c/13c), indicates that the automated ASM Health Checker has detected a new issue affecting the integrity, availability, or performance of your ASM environment. Ignoring it is not an option; unresolved failures can lead to disk group mount issues, I/O latency, or even database crashes.
This article provides a 360-degree breakdown of this alert: what triggers it, how to diagnose the root cause, step-by-step repair procedures, and long-term prevention strategies.
Would you like me to extend this into:
ASM Health Checker Found 1 New Failure: What It Means and How to Resolve It
The Automatic Storage Management (ASM) health checker is a crucial tool in Oracle databases that monitors the health and integrity of the storage infrastructure. When the ASM health checker reports a new failure, it's essential to understand the implications and take corrective actions to prevent data loss or system downtime. In this blog post, we'll discuss what an ASM health checker failure means, how to investigate the issue, and steps to resolve it.
What does an ASM health checker failure mean?
When the ASM health checker detects a problem, it logs an error message indicating that a failure has been detected. The message may look like this:
"ASM health checker found 1 new failure"
This message indicates that the ASM health checker has detected a single failure in the storage system. The failure could be related to various issues, such as:
Investigating the ASM health checker failure
To investigate the failure, follow these steps:
Resolving the ASM health checker failure
Once you've identified the root cause of the failure, take corrective actions to resolve the issue:
Best practices to prevent ASM health checker failures
To minimize the likelihood of ASM health checker failures:
By understanding the causes of ASM health checker failures and taking proactive steps to prevent them, you can ensure the reliability and performance of your Oracle database storage infrastructure.
The message " asm health checker found 1 new failures typically appears in environments using Oracle Automatic Storage Management (ASM) when an automated health check tool (like Oracle ORAchk Oracle EXAchk asm health checker found 1 new failures
) identifies a configuration issue or a hardware fault that doesn't match the established "best practices" or previous healthy state What This Usually Means
When this alert is triggered, it indicates that a recent scan has detected a deviation in your ASM environment. Common causes for a single new failure include: Disk Path Issues
: A single disk path has become unavailable, even if the disk is still accessible via a redundant path. Disk Group Redundancy
: One of the disks in a "Normal Redundancy" disk group has failed, putting the group in a "degraded" state. Parameter Mismatches : An ASM instance parameter (like ASM_POWER_LIMIT
) has been changed and no longer aligns with recommended settings. Offline Disks
: A disk has been taken offline due to I/O errors but has not yet been dropped from the disk group. Oracle Forums Recommended Steps to Investigate Check the Health Check Report : The tool that generated this message (likely
) will have created an HTML report. Locate this report to see the specific and description of the failure. Verify ASM Disk Status utility to check the status of your disks and disk groups: asmcmd lsdsk -t asmcmd lsdg Use code with caution. Copied to clipboard Look for disks with a status of Inspect the ASM Alert Log
: Review the ASM alert log file (usually found in the ADR home) for specific ORA- errors or messages about disk evictions. Validate Path Visibility
: Ensure the OS can still see all physical devices associated with the ASM disks. Oracle Help Center For more detailed troubleshooting, you can refer to the Oracle Automatic Storage Management documentation or check for tool-specific errors on the Oracle Support portal ASMCMD commands to check for disk redundancy or rebalance status?
Troubleshooting the "ASM Health Checker Found 1 New Failures" Alert
If you are managing an Oracle Database environment using Automatic Storage Management (ASM), encountering the alert "ASM health checker found 1 new failures" can be a jarring experience. This message is usually triggered by the Oracle Health Monitor (HM), a framework designed to detect and analyze components within the database and ASM instances.
When this alert surfaces in your alert log or monitoring dashboard (like Enterprise Manager), it means ASM has identified a specific issue that could potentially impact the availability or performance of your storage layer.
Here is a deep dive into what this error means, how to diagnose it, and the steps to resolve it. 1. Understanding the ASM Health Checker
The ASM Health Checker is part of the broader Oracle Health Monitor. It runs periodic checks—and can be triggered manually—to assess the integrity of:
ASM Metadata (Disk headers, File Directory, Alias Directory) Disk Group health Process responsiveness
When a "new failure" is reported, Oracle has logged a diagnostic entry into its ADR (Automatic Diagnostic Repository). The alert doesn't tell you the problem directly; it tells you that a report is waiting for your review. 2. Immediate Diagnostic Steps
To fix the failure, you first have to identify it. You can do this via the Command Line Interface (CLI) using ADRCI. Step A: Access ADRCI Log in to your grid infrastructure server and run: adrci Use code with caution. Step B: Set the Home Path
Check which home is reporting the error (usually the ASM home):
show homes set homepath diag/asm/+asm/+asm1 -- (Adjust based on your SID) Use code with caution. Step C: List the Failures
Run the following command to see the specific failure identified: list failure Use code with caution.
This will provide a Failure ID, the severity (CRITICAL or HIGH), and a brief description of what went wrong. 3. Common Causes for ASM Failures
While the "1 new failure" could technically be anything, it usually falls into one of these three categories: A. Disk Corruption or Metadata Inconsistency
The most common cause is an inconsistency in the ASM metadata. This can happen due to an unexpected power loss, a bug in the storage firmware, or "lost writes." The Fix: Run an internal ASM check. ALTER DISKGROUP Use code with caution. B. Offline Disks or Path Issues
If a path to a physical disk is lost (due to HBA failure or cable issues), ASM might mark the disk as "OFFLINE." If the diskgroup is still mounted but missing a member, the Health Checker will flag it.
The Fix: Check v$asm_disk to ensure all disks are ONLINE and HEADER_STATUS is MEMBER. C. Resource Exhaustion Some common causes for ASM Health Checker failures
Sometimes the failure is not about the disks themselves, but about the ASM instance’s ability to manage them—such as running out of processes or memory in the SGA. 4. How to Resolve the Failure
Once you’ve identified the Failure ID in ADRCI, you can ask Oracle for a repair advice: Advise on Failure: advise failure Use code with caution.
This will generate a report explaining the impact and recommending a script or manual action to fix it.
Execute Repair:If Oracle provides a repair script, you can run: repair failure; Use code with caution.
Note: Always back up your metadata and ensure you have a valid backup before running automated repair scripts on production storage. 5. Clearing the Alert
After the underlying issue is resolved (e.g., the disk is back online or the metadata is repaired), you need to "close" the failure in the ADR so the health checker stops reporting it. Inside ADRCI:
set homepath Use code with caution.
The "ASM health checker found 1 new failures" alert is a call to action to check your storage integrity. By using ADRCI to drill down into the specific failure ID, you can move from a vague warning to a concrete resolution plan.
Pro Tip: Regularly monitor your v$asm_operation view. If you see long-running "REBAL" (rebalance) operations following a failure, ensure your ASM_POWER_LIMIT is set high enough to complete the recovery quickly without impacting database I/O.
Do you have the ADRCI output or the specific Failure ID from your logs? I can help you interpret the exact cause.
ASM Health Checker Found 1 New Failure: What It Means and How to Resolve It
If you're a database administrator or a system administrator working with Oracle databases, you're likely familiar with the Automatic Storage Management (ASM) system. ASM is a storage management system that provides a simple and efficient way to manage storage for Oracle databases. One of the tools used to monitor and maintain ASM is the ASM Health Checker, which periodically checks the health of the ASM infrastructure and reports any issues or failures.
Recently, you may have encountered an alert or message indicating that the "ASM health checker found 1 new failure." This message can be concerning, especially if you're not familiar with what it means or how to resolve it. In this article, we'll explore what this message means, the possible causes, and step-by-step instructions on how to resolve the issue.
What Does the ASM Health Checker Do?
The ASM Health Checker is a background process that periodically checks the health of the ASM infrastructure. It monitors various aspects of ASM, including:
The ASM Health Checker runs automatically and reports any issues or failures it detects. The checker runs at regular intervals, which can be configured using the ASM_CHECK_INTERVAL parameter.
What Does "ASM Health Checker Found 1 New Failure" Mean?
When the ASM Health Checker detects a new failure, it reports the issue and provides information about the failure. The message "ASM health checker found 1 new failure" indicates that the checker has detected a problem with the ASM infrastructure that requires attention.
The failure can be related to various aspects of ASM, such as:
Possible Causes of the Failure
There are several possible causes for the ASM Health Checker to report a new failure. Some common causes include:
How to Resolve the Issue
To resolve the issue, follow these step-by-step instructions:
ALTER SESSION SET CONTAINER = '+ASM';
BEGIN
DBMS ASMADM .check_health;
END;
/
This command will provide more detailed information about the failure.
SELECT * FROM V$ASM_DISKGROUP;
SELECT * FROM V$ASM_DISK;
SELECT * FROM V$ASM_INSTANCE;
Best Practices to Avoid Future Failures
To avoid future failures and ensure the health of your ASM infrastructure, follow these best practices:
By following these best practices and resolving the issue reported by the ASM Health Checker, you can ensure the health and performance of your ASM infrastructure and prevent future failures.
The message "ASM Health Checker found 1 new failures" is a critical warning often found in Oracle Automatic Storage Management (ASM) alert logs. It typically signals that the system has detected a significant issue—such as disk corruption or a communication breakdown—that could lead to a diskgroup being forcibly dismounted.
Here is a story of a "typical" Friday night in the life of a Database Administrator (DBA) facing this error. The Friday Night Ghost in the Machine
It was 4:45 PM on a Friday. The office was thinning out, and Leo was already thinking about his weekend plans when his terminal began to scroll with red text. The monitoring system had just spat out a single, chilling line: ASM Health Checker found 1 new failures
Leo’s heart sank. In the world of Oracle ASM, "1 new failure" is rarely just one thing; it's the tip of an iceberg.
The Investigation BeginsHe dove into the alert logs. Just seconds before the health checker tripped, he saw a flurry of ORA-15130 errors: diskgroup "DATA" is being dismounted. This was the DBA equivalent of a ship taking on water.
He checked the shared storage. "It's always the hardware," he muttered. But the storage arrays looked green. He then checked the ASM Filter Driver, remembering a bug involving 4k sector drives that had caused similar headaches for peers in the past. The DiscoveryLeo ran a quick check of the diskgroup status: Diskgroup: DATA Status: DISMOUNTED Cause: "Insufficient number of disks discovered".
It turned out a routine disk add operation from earlier that morning had gone sideways. A subtle corruption on metadata block 40 had been lying in wait. When the ASM rebalance operation hit that specific block, the Health Checker—a silent guardian that usually stays in the background—spotted the anomaly and pulled the emergency brake to prevent further data loss.
The ResolutionThe "1 new failure" wasn't a death sentence, but it required surgery. Leo had to:
The message "ASM Health Checker found 1 new failures" is a critical alert in the Oracle Automatic Storage Management (ASM) alert log indicating that the internal Health Monitor has detected a significant operational issue. This failure often precedes or accompanies a diskgroup being forced to dismount or an instance entering a "dirty detach" state. Common Root Causes
Storage I/O Failures: The most frequent cause is a physical I/O error or a "Write Failed" event on an ASM disk.
Diskgroup Corruption: Metadata inconsistencies or corrupted blocks within a diskgroup can trigger health check failures.
Connectivity Issues: Loss of access to a voting file or a failure to refresh voting files on a specific diskgroup.
Resource Exhaustion: In environments like F5 BIG-IP ASM, failures often stem from disk space limits in the /var partition or database table row limits. Immediate Diagnostic Steps
Check the Alert Log: Locate the exact timestamp of the error in the ASM alert log. Look for preceding errors like ORA-15130 (diskgroup being dismounted) or specific path-related I/O errors.
Verify Disk Status: Use the following SQL command in the ASM instance to identify disks with red flags (e.g., OFFLINE, CLOSED, or MISSING):
SELECT name, path, mount_status, header_status, state FROM v$asm_disk; Use code with caution. Copied to clipboard
Inspect Diskgroup Consistency: If the diskgroup is still mounted, run a metadata check to find internal inconsistencies: ALTER DISKGROUP Use code with caution. Copied to clipboard
Use ADRCI: Leverage the Automatic Diagnostic Repository Command-Line (ADRCI) utility to view detailed incident reports associated with the health check failure. Recommended Remediation
Restore Disk Connectivity: If the failure was due to a missing device, re-label and scan for the disk using Oracle ASM Command-line Tool (ASMCMD) or oracleasm scandisks.
Trigger a Rebalance: If a disk failed but redundancy allowed the group to stay online, add a replacement disk to trigger an automatic rebalance.
Recreate the Diskgroup: In cases of severe block corruption where the database cannot be recovered via standard means, you may need to recreate the diskgroup and restore from backup.
Autonomous Health Framework (AHF): For long-term monitoring, use the Oracle Autonomous Health Framework to proactively identify issues before they lead to health checker failures. AI responses may include mistakes. Learn more