Veröffentlicht am

NetApp Clusterrettung nach Stromausfall

Autoren

Ausgangslage

Zum Betrieb einer experimentellen Plattform habe ich einen vorkonfigurierten NetApp erhalten. Zu diesem gibt's keinerlei Support. Bisher habe ich nie mit NetApp oder anderen SAN gearbeitet. An diesem NetApp hängt primär ein HPE DL380 G10, welcher auf einer LUN, welche auf dem NetApp liegt, ESXi bootet. Plötzlich ging nichts mehr, der NetApp ist aus. Erster Gedanke: Stromausfall? Also, Boot. Verbunden habe ich beide Nodes über Micro!-USB-Kabel und mit Putty auf COM20 und 21 mit 115200 Baud zugegriffen.

Cluster retten

COM20

LOADER-A> boot_ontap

[...blabla...]
May 23 14:03:39 [netapp05-02:mgr.boot.unequalDist:error]: Warning: Unequal number of disks will be used for auto-partitioning of the root aggregate on the local system and HA partner. The local system will use 8 disks but the HA partner will use 6 disks. To correct this situation, boot both controllers into maintenance mode and remove the ownership of all disks.
May 23 14:03:39 [netapp05-02:fmmb.disk.notAccsble:notice]: All Local mailbox disks are inaccessible.
May 23 14:03:39 [netapp05-02:fmmb.disk.notAccsble:notice]: All Partner mailbox disks are inaccessible.
May 23 14:03:39 [netapp05-02:raid.assim.disk.brokenPreAssim:error]: Broken Disk 0b.05.9P2 Shelf 5 Bay 9 [NETAPP   X427_HCBFE1T8A10 NA06] S/N [08HJ1LJANP002] UID [6000CCA0:2C558DC0:500A0981:00000002:00000000:00000000:00000000:00000000:00000000:00000000] detected prior to assimilation.
May 23 14:03:39 [netapp05-02:kern.syslog.msg:notice]: FAILOVER: fmrsrc_startSecondary() - TakeOver for fmdisk_reserve done in 20 msecs (Since TO started: 20)

May 23 14:03:39 [netapp05-01:raid.assim.disk.brokenPreAssim:error]: Broken Disk 0a.05.6P1 Shelf 5 Bay 6 [NETAPP   X427_HCBFE1T8A10 NA06] S/N [08HJ5RVANP001] UID [6000CCA0:2C55CBE8:500A0981:00000001:00000000:00000000:00000000:00000000:00000000:00000000] detected prior to assimilation.
May 23 14:03:39 [netapp05-01:raid.assim.disk.brokenPreAssim:error]: Broken Disk 0b.05.9P1 Shelf 5 Bay 9 [NETAPP   X427_HCBFE1T8A10 NA06] S/N [08HJ1LJANP001] UID [6000CCA0:2C558DC0:500A0981:00000001:00000000:00000000:00000000:00000000:00000000:00000000] detected prior to assimilation.
May 23 14:03:39 [netapp05-01:raid.assim.disk.brokenPreAssim:error]: Broken Disk 0a.05.6P2 Shelf 5 Bay 6 [NETAPP   X427_HCBFE1T8A10 NA06] S/N [08HJ5RVANP002] UID [6000CCA0:2C55CBE8:500A0981:00000002:00000000:00000000:00000000:00000000:00000000:00000000] detected prior to assimilation.
May 23 14:03:40 [netapp05-02:kern.syslog.msg:notice]: FAILOVER: fmrsrc_startSecondary() - TakeOver for raid done in 274 msecs (Since TO started: 294)
[...]

May 23 14:03:42 [netapp05-02:LUN.nvfail.vol.proc.complete:error]: LUNs in volume IIL_4 (DSID 1314) have been brought offline because an inconsistency was detected in the nvlog during boot or takeover.
May 23 14:03:42 [netapp05-02:kern.syslog.msg:notice]: The system was down for 73786 seconds
May 23 14:03:42 [netapp05-02:cf.fsm.takeoverByPartnerDisabled:error]: Failover monitor: takeover of netapp05-02 by netapp05-01 disabled (Already in takeover mode).
May 23 14:03:42 [netapp05-02:cf.fm.takeoverStarted:notice]: Failover monitor: takeover started
[...]

May 23 14:03:42 [netapp05-01:lmgr.sf.up.ready:notice]: Lock manager allowed high availability module to transition to the up state for the following reason: Partner down.
[...]

May 23 14:04:00 [netapp05-02:monitor.globalStatus.critical:EMERGENCY]: This node has taken over netapp05-01. Disk on adapter 0b, shelf 5, bay 9, failed.
May 23 14:04:18 [netapp05-02:callhome.root.vol.recovery.reqd:EMERGENCY]: Call home for ROOT VOLUME NOT WORKING PROPERLY: RECOVERY REQUIRED.

COM21

LOADER-B> boot_ontap

[...blabla...]
May 23 14:11:46 [netapp05-01:disk.init.failureBytes:error]: Failed disk 0b.05.12 detected during disk initialization.
Reservation conflict found on this node's disks!
[...]

Waiting for giveback...(Press Ctrl-C to abort wait)
This node was previously declared dead.
Pausing to check HA partner status ...
partner is operational and in takeover mode.

You must initiate a giveback or shutdown on the HA
partner in order to bring this node online.


The HA partner is currently operational and in takeover mode.This node cannot continue unless you initiate a giveback on the partner.
Once this is done this node will reboot automatically.

waiting for giveback...

Oha, ungesund.

Fixing

Also, login auf Node A:

login: 
Password:
******************************************************
* This is a serial console session. Output from this *
* session is mirrored on the SP console session.     *
******************************************************
***********************
**  SYSTEM MESSAGES  **
***********************

Internal error. Cannot open corrupt replicated database. Automatic recovery
attempt has failed or is disabled. Check the event logs for details. This node
is not fully operational. Contact support personnel for the root volume recovery
procedures.

Und in der Zwischenzeit hat Node B aber fertig gebootet:

Partner has released takeover lock.
Continuing boot...
[...]
May 23 14:21:51 [netapp05-01:disk.dynamicqual.fail.parse:error]: Device qualification information file (/etc/qual_devices) is invalid. The following error, " Unsupported File version detected.
" has been detected. For further information about correcting the problem, search the knowledgebase of the NetApp technical support support web site for the "[disk.dynamicqual.fail.parse]" keyword.
[...]
May 23 14:21:51 [netapp05-01:cf.fsm.takeoverOfPartnerDisabled:error]: Failover monitor: takeover of netapp05-02 disabled (unsynchronized log).
May 23 14:21:52 [netapp05-01:raid.fdr.reminder:error]: Failed Disk 0a.05.6 Shelf 5 Bay 6 [NETAPP   X427_HCBFE1T8A10 NA06] S/N [08HJ5RVA] UID [5000CCA0:2C55CBE8:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000] is still present in the system and should be removed.
May 23 14:21:52 [netapp05-01:raid.fdr.reminder:error]: Failed Disk 0b.05.9 Shelf 5 Bay 9 [NETAPP   X427_HCBFE1T8A10 NA06] S/N [08HJ1LJA] UID [5000CCA0:2C558DC0:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000] is still present in the system and should be removed.

Und ja, offensichtlich sind Disks defekt!

Jetzt scheint Node B in besserem Zustand: Node A:

netapp05::> cluster show
Error: "show" is not a recognized command

Node B:

netapp05::> cluster show
Node                  Health  Eligibility
--------------------- ------- ------------
netapp05-01           true    true
netapp05-02           false   true
2 entries were displayed.

Versuchen wir ein Recovery:

netapp05::*> system node show
Node      Health Eligibility Uptime        Model       Owner    Location
--------- ------ ----------- ------------- ----------- -------- ---------------
""        -      -                       - -           -        -
netapp05-02
          -      -                00:37:41 FAS2650

Warning: Cluster HA has not been configured.  Cluster HA must be configured on
         a two-node cluster to ensure data access availability in the event of
         storage failover. Use the "cluster ha modify -configured true" command
         to configure cluster HA.
2 entries were displayed.

netapp05::*> system configuration backup show
Node       Backup Name                               Time               Size
---------  ----------------------------------------- ------------------ -----
netapp05-02
           netapp05.8hour.2025-05-12.18_15_03.7z     05/12 19:15:03     76.00MB
netapp05-02
           netapp05.8hour.2025-05-13.02_15_03.7z     05/13 03:15:03     76.65MB
netapp05-02
           netapp05.daily.2025-05-12.00_10_03.7z     05/12 01:10:03     76.90MB
netapp05-02
           netapp05.daily.2025-05-13.00_10_03.7z     05/13 01:10:03     76.25MB
netapp05-02
           netapp05.weekly.2025-05-04.00_15_03.7z    05/04 01:15:03     77.49MB
netapp05-02
           netapp05.weekly.2025-05-11.00_15_03.7z    05/11 01:15:03     77.75MB
6 entries were displayed.

netapp05::*> system configuration recovery node restore -backup  netapp05.8hour.2025-05-13.02_15_03.7z

Warning: This command overwrites local configuration files with files contained
         in the specified backup file. Use this command only to recover from a
         disaster that resulted in the loss of the local configuration files.
         The node will reboot after restoring the local configuration.
Do you want to continue? {y|n}: y
Verifying that the node is offline in the cluster.
Verifying that the backup tarball exists.
Extracting the backup tarball.
Verifying that software and hardware of the node match with the backup.
Stopping cluster applications.  

Nach dem Reboot war leider immer noch alles beim alten. Ich versuche ein altes Backup.

Folgende Ansätze hatte ich nun offen:

varfs_backup_restore: bootarg.abandon_varfs is set! Skipping /var backup.

Dies ist aber vermutlich auf den Backup-Restore zurückzuführen. Aber der da?

*********************************************
* ALERT: SHA256 checksum failure detected   *
*        in boot device                     *
*                                           *
* Contact technical support for assistance. *
*********************************************
ERROR: netapp_varfs: SHA256 checksum failure detected in boot device. Contact technical support for assistance.
[...]
May 26 07:56:34 [netapp05-02:callhome.root.vol.recovery.reqd:EMERGENCY]: Call home for ROOT VOLUME NOT WORKING PROPERLY: RECOVERY REQUIRED.

Dies führt mich zu folgender KNDB: https://kb.netapp.com/on-prem/ontap/OHW/OHW-KBs/System_does_not_start_after_reboot_due_to_Unable_to_recover_the_local_database_of_Data_Replication_Module von den 3 ENV sind aber 2 unbekannt...

LOADER-A> unsetenv bootarg.rdb_corrupt
LOADER-A> unsetenv bootarg.init.boot_recovery
Could not delete environment variable 'bootarg.init.boot_recovery': Environment variable not found
*** command status = Environment variable not found(-9)
LOADER-A> unsetenv bootarg.rdb_corrupt.mgwd
Could not delete environment variable 'bootarg.rdb_corrupt.mgwd': Environment variable not found
*** command status = Environment variable not found(-9)
LOADER-A> saveenv
LOADER-A> bye

HEUREKA!

netapp05::> cluster show
Node                  Health  Eligibility
--------------------- ------- ------------
netapp05-01           true    true
netapp05-02           true    true
2 entries were displayed.

Aggregat wiederbeleben

Im etwas älteren NetApp liegen mittlerweile 3 SAS-Disks im Sterben. Ist es zu spät?

netapp05::> storage aggregate show


Aggregate     Size Available Used% State   #Vols  Nodes            RAID Status
--------- -------- --------- ----- ------- ------ ---------------- ------------
n01_SAS         0B        0B    0% failed       0 netapp05-01      raid_dp,
                                                                   partial
n01_root   368.4GB   17.85GB   95% online       1 netapp05-01      raid_dp,
                                                                   normal
n02_SSD    18.86TB   15.17TB   20% online      17 netapp05-02      raid_dp,
                                                                   normal
n02_root   368.4GB   17.85GB   95% online       1 netapp05-02      raid_dp,
                                                                   normal
4 entries were displayed.

netapp05::> storage disk show
                     Usable           Disk    Container   Container
Disk                   Size Shelf Bay Type    Type        Name      Owner
---------------- ---------- ----- --- ------- ----------- --------- --------

Info: This cluster has partitioned disks. To get a complete list of spare disk
      capacity use "storage aggregate show-spare-disks".
1.5.0                     -     5   0 unknown unsupported -         -
1.5.1                1.63TB     5   1 SAS     shared      n01_SAS   netapp05-02
1.5.2                1.63TB     5   2 SAS     shared      n01_SAS, n01_root
                                                                    netapp05-01
1.5.3                1.63TB     5   3 SAS     shared      n01_SAS, n02_root
                                                                    netapp05-02
1.5.4                1.63TB     5   4 SAS     shared      n01_SAS, n01_root
                                                                    netapp05-01
1.5.5                1.63TB     5   5 SAS     shared      n01_SAS, n02_root
                                                                    netapp05-02
1.5.6                1.63TB     5   6 SAS     broken      -         netapp05-01
1.5.7                1.63TB     5   7 SAS     shared      n01_SAS, n02_root
                                                                    netapp05-02
1.5.8                1.63TB     5   8 SAS     shared      n01_SAS, n01_root
                                                                    netapp05-01
1.5.9                1.63TB     5   9 SAS     broken      -         netapp05-02
1.5.10               1.63TB     5  10 SAS     shared      n01_SAS, n01_root
                                                                    netapp05-01
1.5.11               1.63TB     5  11 SAS     shared      n01_SAS, n02_root
                                                                    netapp05-02
1.5.12                    -     5  12 SAS     broken      -         -
1.5.13               1.63TB     5  13 SAS     shared      n01_SAS, n02_root
                                                                    netapp05-02
1.5.14               1.63TB     5  14 SAS     shared      n01_SAS, n01_root
                                                                    netapp05-01
1.5.15               3.49TB     5  15 SSD     aggregate   n02_SSD   netapp05-02
1.5.16               3.49TB     5  16 SSD     aggregate   n02_SSD   netapp05-02
1.5.17               3.49TB     5  17 SSD     aggregate   n02_SSD   netapp05-02
1.5.18               3.49TB     5  18 SSD     aggregate   n02_SSD   netapp05-02
1.5.19               3.49TB     5  19 SSD     aggregate   n02_SSD   netapp05-02
1.5.20               3.49TB     5  20 SSD     aggregate   n02_SSD   netapp05-02
1.5.21               3.49TB     5  21 SSD     aggregate   n02_SSD   netapp05-02
1.5.22               3.49TB     5  22 SSD     aggregate   n02_SSD   netapp05-02
1.5.23               3.49TB     5  23 SSD     spare       Pool0     netapp05-02
24 entries were displayed.

Einsatz der Spare-Disk

Eine Spare-Disk ist verfügbar (ich bin überrascht, dass diese nicht automatisch ins Aggregat genommen wird). Zuerst versuche ich eine broken Disk durch die Spare zu ersetzen. Ohne Erfolg.

netapp05::> storage disk replace -disk 1.5.6 -replacement 1.5.1 -action start
Error: command failed: Disk "1.5.6" is not in present state.

ok, versuchen wir die disks temporär zu reaktivieren:

netapp05::> set advanced
netapp05::*> disk unfail -disk 1.5.6
netapp05::*> disk unfail -disk 1.5.9
netapp05::*> disk unfail -disk 1.5.12

netapp05::*> aggr show-status

Owner Node: netapp05-01
 Aggregate: n01_SAS (online, raid_dp, reconstruct, degraded) (block checksums)
  Plex: /n01_SAS/plex0 (online, normal, active, pool0)
   RAID Group /n01_SAS/plex0/rg0 (reconstruction 0% completed, block checksums)
                                                              Usable Physical
     Position Disk                        Pool Type     RPM     Size     Size Status
     -------- --------------------------- ---- ----- ------ -------- -------- ----------
     shared   1.5.10                       0   SAS    10000   1.49TB   1.64TB (normal)
     shared   1.5.3                        0   SAS    10000   1.49TB   1.64TB (normal)
     shared   1.5.5                        0   SAS    10000   1.49TB   1.64TB (normal)
     shared   1.5.7                        0   SAS    10000   1.49TB   1.64TB (normal)
     shared   1.5.9                        0   SAS    10000   1.49TB   1.64TB (normal)
     shared   1.5.11                       0   SAS    10000   1.49TB   1.64TB (normal)
     shared   1.5.1                        0   SAS    10000   1.49TB   1.64TB (normal)
     shared   1.5.13                       0   SAS    10000   1.49TB   1.64TB (normal)
     shared   1.5.14                       0   SAS    10000   1.49TB   1.64TB (normal)
     shared   1.5.4                        0   SAS    10000   1.49TB   1.64TB (normal)
     shared   1.5.6                        0   SAS    10000   1.49TB   1.64TB (reconstruction 0% completed)
     shared   1.5.8                        0   SAS    10000   1.49TB   1.64TB (normal)
     shared   1.5.2                        0   SAS    10000   1.49TB   1.64TB (normal)
     shared   FAILED                       -   -          -   1.49TB       0B (failed)

Somit ist 1.5.9 aktuell running (hab aber kein Vertrauen), 1.5.12 bleibt KO und 1.5.6 tut etwas...

Umzug aus einem alten NetApp

Um die broken Disks zu ersetzen nutze ich Festplatten aus einer anderen NetApp. Die Disks sind baugleich, gleiche Produktnummer. diese Disk wurde aber nicht sauber entfernt und kann aktuell nicht einfach so gelesen werden:

netapp05::*> storage disk show
                     Usable           Disk    Container   Container
Disk                   Size Shelf Bay Type    Type        Name      Owner
---------------- ---------- ----- --- ------- ----------- --------- --------
1.5.0                     -     5   0 unknown unsupported -         -
1.5.1                1.63TB     5   1 SAS     shared      n01_SAS   netapp05-02
1.5.2                1.63TB     5   2 SAS     shared      n01_SAS, n01_root
[...]

netapp05::*> storage disk show -disk 1.5.0
                  Disk: 1.5.0
        Container Type: unsupported
            Owner/Home: -  / -
               DR Home: -
    Stack ID/Shelf/Bay: 1  / 5  / 0
                   LUN: 0
                 Array: N/A
                Vendor: NETAPP
                 Model: X427_HCBFE1T8A10
         Serial Number: -
                   UID: 5000CCA0:2C55A2F0:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000
                   BPS: 520
         Physical Size: 0B
              Position: present
Checksum Compatibility: block
             Aggregate: -
                  Plex: -
Paths:
                                LUN  Initiator Side        Target Side                                                                              Link
Controller         Initiator     ID  Switch Port           Switch Port           Acc Use  Target Port              TPGN    Speed      I/O KB/s          IOPS
------------------ ---------  -----  --------------------  --------------------  --- ---  -----------------------  ------  -------  ------------  ------------
netapp05-02        0a             0  N/A                   N/A                   AO  INU  5000cca02c55a2f2             86  12 Gb/S             0             0
netapp05-02        0b             0  N/A                   N/A                   AO  RDY  5000cca02c55a2f1             55  12 Gb/S             0             0
netapp05-01        0a             0  N/A                   N/A                   AO  INU  5000cca02c55a2f1             55  12 Gb/S             0             0
netapp05-01        0b             0  N/A                   N/A                   AO  RDY  5000cca02c55a2f2             86  12 Gb/S             0             0

Errors:
The node is configured with All-Flash Optimized personality and this disk is not an SSD. The disk needs to be removed from the system.

Nach relativ langem Suchen stellte sich heraus, dass diese Anzeige auf selbst-verschlüsselte Festplatten (SED) hindeutet. Glücklicherweise hatte ich noch Zugriff zum alten Cluster und konnte die Disk dort wieder einfügen. Die beiden auf den Disks liegenden Volumes habe ich sicherheitshalber (voraussichtlich werden diese nicht mehr benötigt, aber man weis nie) mittels volume move auf die verbleibenden aggregate verschoben, das alte SAS-Aggregat danach gelöscht. Nach ausdauerndem try&error zeigte folgendes Vorgehen endlich Erfolg:

set d
node run netapp-master-01 -command disk remove_ownership 0a.00.0P1
node run netapp-master-01 -command disk remove_ownership 0a.00.0P2
node run netapp-master-01 -command disk remove_ownership 0a.00.0

system node run -node netapp-master-01 disk unpartition 0a.00.0

storage encryption disk modify -disk 1.0.0 -fips-key-id 0x0
storage encryption disk modify -disk 1.0.0 -data-key-id  0x0

netapp01::> storage disk show
                     Usable           Disk    Container   Container
Disk                   Size Shelf Bay Type    Type        Name      Owner
---------------- ---------- ----- --- ------- ----------- --------- --------

Info: This cluster has partitioned disks. To get a complete list of spare disk
      capacity use "storage aggregate show-spare-disks".
1.0.0                1.63TB     0   0 SAS     spare       Pool0     netapp-master-01


netapp01::> storage encryption disk show
Disk     Mode Data Key ID
-------- ---- ----------------------------------------------------------------
1.0.0    data 000000000000000002000000000001000B8C0C4412BBFE9EDB2951E40BE463E6

So, nun war die disk endlich als spare und als open beschrieben und wurde im neuen Cluster auch endlich erkannt.

storage disk assign -disk 1.5.2 -owner netapp05-01 -data

netapp05::storage disk*> storage disk show
                     Usable           Disk    Container   Container
Disk                   Size Shelf Bay Type    Type        Name      Owner
---------------- ---------- ----- --- ------- ----------- --------- --------
1.5.1                1.63TB     5   1 SAS     shared      n01_SAS   netapp05-02
1.5.2                1.63TB     5   2 SAS     shared      -         netapp05-01

Jetzt muss diese Disk nur noch die FAILED des aggregates ersetzen. Aber das scheint automatisch zu passieren.

netapp05::storage disk*> storage aggregate show-status

Owner Node: netapp05-01
 Aggregate: n01_SAS (online, raid_dp, reconstruct, degraded) (block checksums)
  Plex: /n01_SAS/plex0 (online, normal, active, pool0)
   RAID Group /n01_SAS/plex0/rg0 (reconstruction 0% completed, block checksums)
                                                              Usable Physical
     Position Disk                        Pool Type     RPM     Size     Size Status
     -------- --------------------------- ---- ----- ------ -------- -------- ----------
     shared   1.5.10                       0   SAS    10000   1.49TB   1.64TB (normal)
     shared   1.5.3                        0   SAS    10000   1.49TB   1.64TB (normal)
     shared   1.5.5                        0   SAS    10000   1.49TB   1.64TB (normal)
     shared   1.5.7                        0   SAS    10000   1.49TB   1.64TB (normal)
     shared   1.5.9                        0   SAS    10000   1.49TB   1.64TB (normal)
     shared   1.5.11                       0   SAS    10000   1.49TB   1.64TB (normal)
     shared   1.5.1                        0   SAS    10000   1.49TB   1.64TB (normal)
     shared   1.5.13                       0   SAS    10000   1.49TB   1.64TB (normal)
     shared   1.5.14                       0   SAS    10000   1.49TB   1.64TB (normal)
     shared   1.5.4                        0   SAS    10000   1.49TB   1.64TB (normal)
     shared   1.5.6                        0   SAS    10000   1.49TB   1.64TB (normal)
     shared   1.5.8                        0   SAS    10000   1.49TB   1.64TB (normal)
     shared   1.5.2                        0   SAS    10000   1.49TB   1.64TB (reconstruction 0% completed)
     shared   FAILED                       -   -          -   1.49TB       0B (failed)

Fix LUN

Ausgangslage

Nachdem der Cluster und die Aggregate wieder liefen, wollten die Server welche über iSCSI angebunden waren einfach nicht starten. Der SMB-Share war aber erreichbar...interessant.

Bring Online

Im Web GUI des NetApp waren die Peaks in den IOPS alle 5 Sekunden zu sehen, immer wenn der Server versuchte zu booten und danach "no bootable device" meldeten. Unter den LUN-Actions fand ich die Option "bring Online", welche aber sofort ein Alert lieferte: The volume is in nvfailed state

Nach kurzer recherche fand ich dann: https://kb.netapp.com/on-prem/ontap/OHW/OHW-KBs/lun_online_fails_with_Error_The_volume_is_in_nvfailed_state

netapp05::> ucadmin show
                       Current  Current    Pending  Pending    Admin
Node          Adapter  Mode     Type       Mode     Type       Status
------------  -------  -------  ---------  -------  ---------  -----------
netapp05-01   0c       cna      target     -        -          online
netapp05-01   0d       cna      target     -        -          online
netapp05-01   0e       cna      target     -        -          online
netapp05-01   0f       cna      target     -        -          online
netapp05-02   0c       cna      target     -        -          online
netapp05-02   0d       cna      target     -        -          online
netapp05-02   0e       cna      target     -        -          online
netapp05-02   0f       cna      target     -        -          online
8 entries were displayed.

netapp05::> network interface show
            Logical    Status     Network            Current       Current Is
Vserver     Interface  Admin/Oper Address/Mask       Node          Port    Home
----------- ---------- ---------- ------------------ ------------- ------- ----
Cluster
            netapp05-01_clus1
                         up/up    169.254.214.208/16 netapp05-01   e0a     true
            netapp05-01_clus2
                         up/up    169.254.52.115/16  netapp05-01   e0b     true
            netapp05-02_clus1
                         up/up    169.254.159.191/16 netapp05-02   e0a     true
            netapp05-02_clus2
                         up/up    169.254.244.129/16 netapp05-02   e0b     true
netapp05
            bkup-lif_1   up/up    172.16.19.222/24   netapp05-01   a0a-1619
                                                                           true
            bkup-lif_2   up/up    172.16.19.223/24   netapp05-02   a0a-1619
                                                                           true
            cluster_mgmt up/up    172.16.17.221/24   netapp05-01   e0M     true
[...]
39 entries were displayed.

Alles up/up... Aber da:

netapp05::> lun show
Vserver   Path                            State   Mapped   Type        Size
--------- ------------------------------- ------- -------- -------- --------
svm10     /vol/IIL_Insight/IIL_Insight    nvfail  mapped   vmware     1.95TB
svm11     /vol/IIL_1/IIL_1                nvfail  mapped   vmware     1.95TB
svm11     /vol/IIL_1_clone_300/IIL_1      nvfail  unmapped vmware     1.95TB
svm11     /vol/IIL_1_clone_371/IIL_1      nvfail  unmapped vmware     1.95TB
svm12     /vol/IIL_2/IIL_2                nvfail  mapped   vmware     1.95TB
svm13     /vol/IIL_3/IIL_3                nvfail  mapped   vmware     1.95TB
svm14     /vol/IIL_4/IIL_4                nvfail  mapped   vmware     1.95TB

netapp05::> lun online -vserver svm11 -path /vol/IIL_1/IIL_1

Error: command failed: The volume is in nvfailed state

nicht gut...

netapp05::*> volume modify  -vserver svm11 -volume IIL_1 -in-nvfailed-state false
Volume modify successful on volume IIL_1 of Vserver svm11.

netapp05::*> lun online -vserver svm11 -path /vol/IIL_1/IIL_1

Das war erfrischend einfach...