“No valid disk found containing disk group” message. vxdisk -o alldgs list shows all disks but you can’t import it – what could be the issue?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Possible causes are:
1. Check udid of the disk as per Veritas (vxdisk list fabric_0 | grep udid) and compare that with the actual udid on the array. If they are different, then reboot the system to pick up the new disks.
2. Check the number of enabled configs on each disk in a diskgroup – if none of the disk have config state=enabled then diskgroup does not have valid configuration to import. Edit the nconfig=all on diskgroup
3. Try importing by clearing the lock
# vxdg -C -f import testdg
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
When vxdisk list show dgdisabled and there are other disks in the same diskgroup which are not
imported – How to resolve this?~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Umount the file systems withing this DG and then deport and import the disk group. If this doesnt work, the only option is to reboot the system.
# vxdisk -o alldgs list
DEVICE TYPE DISK GROUP STATUS
fabric_6 auto:sliced c90t53d3 dg_test1 online dgdisabled
fabric_7 auto:sliced - (dg_test1) online
# vxdg deport dg_test1
# vxdisk -o alldgs list
fabric_6 auto - - error
fabric_7 auto:sliced - (dg_test1) online
# vxdg import dg_midoffprd1
VxVM vxdg ERROR V-5-1-10978 Disk group dg_test1: import failed:
No valid disk found containing disk group
# reboot -- -r
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
How to remove the ghost entry of a removed disk “failed was:c1t1d1s2”?~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If the disk has recovered (like represented from array again) – you can recover it using “vxreattach Disk_0”
If disk doesn’t get attached and gives information about Serial Split Brain conditin and advising to run -o overridessb, then it is a bad case. The only way to recover is remove the disk and add it back as below:
#/etc/vx/bin/vxreattach -b fabric_6
VxVM vxdg ERROR V-5-1-10127 associating disk-media c90t10d4 with fabric_6:
Serial Split Brain detected. Use -ooverridessb to reattach the disk/site or run vxsplitlines to import
the diskgroup
Remove the subdisks/plexes from disk and remove the disk from dg.
Disassociate the disabled plex
# vxplex -g dg_smsprd1 dis smsprd1_log
Remove the plex and subdisks
# vxedit -g dg_smsprd1 -rf rm smsprd1_log_4-02
Remove the disk from diskgroup
# vxdg -g dg_smsprd1 rmdisk c90t60d1
Initialise the disk
# vxdisk -f init fabric_4 privoffset=1 privlen=81663 puboffset=0 publen=10354688 format=sliced
Add the disk back into diskgroup
# vxdg -g dg_smsprd1 adddisk c90t60d1=fabric_4
OR
If the disk has failed completely and you have removed it – Remove the disk from DG. If there are no objects, it should succeed.
OR
Find the list of disks which needs to be removed:
# vxprint -g dg_name -d -F "%{name} %{assoc}"
c90t10d1 -
c90t50d2 c3t2d10s2
c90t60d2 c3t2d11s2
c90t70d1 -
Remove the disks which have no access name
# vxdg -g dg_name -o override rmdisk
If it says it has the volumes associated, run above command with -k option:
# vxdg -g dg_name -o override -k rmdisk
After this, vxdisk list will show them “removed was:c1t1d1s2”
Remove it now using vxdiskadm, option 3.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
How to disable boot from vxvm and start it manually?~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1. Boot into single user mode
2. Edit /etc/system. Comment out the vx parameter as follow:
*rootdev:/pseudo/vxio@0:0
*set vxio:vol_rootdev_is_volume=1
3. cd /etc/vx/reconfig.d/state.d/; rm *; touch install-db
(This should remove root-done; and prevent vxvm from starting)
4. cp -p /etc/vfstab /etc/vfstab; cp -p /etc/vfstab.prevm /etc/vfstab
(restore original vfstab)
5. init 6
6. After the system is up, start the Volume Manager service manually as follows
# vxiod set 10
# ps -ef |grep vxconfigd. If vxconfigd is not running, then run "/usr/sbin/vxconfigd -m disable"
# vxdctl mode. Should see it is in disabled mode.
# vxdctl init
# vxdctl enable
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
How to recreate diskgroup info?~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1. # vxprint -mpvsh -g DISKGROUP >DISKGROUP.out
2. Destroy the diskgroup
3. Create the diskgroup with the same disk names
4. Edit DISKGROUP.out and change the disknames manually if needed
5. # vxmake -g DGNAME -d /DISKGROUP.out (to rebuild the config in one go)
6. All the volumes should now be defined and in DISABLED/EMPTY state; plex should be in DISABLED/EMPTY state; subdisk should be in ENABLED/ACTIVE state
7. Init and start the volume as below:
# vxvol -g dg_dodgeprd4 init active dodgeprd4_data_1
This command will init the volume to active (start the plexes and volumes)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Metasave and file system corruption ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
DISCLAIMER: I might have copied steps mentioned below from some site while googling a long time ago. I don’t
want to take any credit for steps mentioned below.
When file system complains of corruption, try the following steps.
1. Run file system check utility once on the affected file system and verify if that resolves the issue using –
# fsck -F vxfs /dev/vx/rdsk/
If above command doesn’t resolve then try the command:
# fsck -F vxfs -y -o full,nolog /dev/vx/rdsk/
2. Umount and mount the file system again and verify.
3. Verify if you are able to see the VXFS file system header using –
# /opt/VRTS/bin/fstyp -v /dev/vx/rdsk/
4. Verify if you can see the “lost+found” folder and its content as expected:
# cd /
# cd lost+found
# ls -l
5. When a file system becomes corrupted and the reasons for corruption are unknown, collect a metadata image of a corrupted file system to investigate why corruption happened. The metadata can be captured using a tool called metasave. Metasave is included in the VRTSspt package, which comes with the product CDs and is also available from ftp.veritas.com. The /opt/VRTSspt/FS/MetaSave directory may contain more than one metasave binary, depending on the operating system. For example, on Solaris there are:
metasave_5.8
metasave_5.9
metasave_5.10
To save metadata from a file system, the corrupted file system needs to be unmounted (if it is still mounted). Run the appropriate metasave binary, such as on Sun Solaris 10 systems:
# metasave_5.10 -f
The file created by this command,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
How to recover from splitbrain error while trying to import?~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Depending on configuration, one/many/all disks in dg stores the diskgroup configurations. When different configuration is found among these disks, splitbrain situation happens while importing it. Try following steps:
– Decide the disk with valid config. If you can’t decide now, you can decide after running vxsplitlines using different diskids
– Run vxsplitlines -g DG to find out the problem
– Run vxdisk list on good disk and note down its Disk ID
– Run vxsplitlines -g DG -c DISKID to get the exact mismatch
– Import the diskgroup with
# vxdg -o overridessb -o selectcp=DISKID import DG
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
How to fix a volume that has plex in DISABLED/RECOVER state?~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
One of the plex is in DISABLED/RECOVER and the other one is ENABLED/ACTIVE.
v testvol - ENABLED ACTIVE 32768000 SELECT - gen
pl testvol-01 testvol ENABLED ACTIVE 32768000 CONCAT - RW
sd c92t52d1-01 testvol-01 c92t52d1 0 32768000 0 Disk_17 ENA
pl testvol-02 testvol DISABLED RECOVER 32768000 CONCAT - RW
sd c92t55d1-57 testvol-02 c92t55d1 1707622400 32768000 0 Disk_12 ENA
Force the plex into OFFLINE state:
# vxmend -g testdg -o force off testvol-02 (DISABLED/OFFLINE)
v testvol - ENABLED ACTIVE 32768000 SELECT - gen
pl testvol-01 testvol ENABLED ACTIVE 32768000 CONCAT - RW
sd c92t52d1-01 testvol-01 c92t52d1 0 32768000 0 Disk_17 ENA
pl testvol-02 testvol DISABLED OFFLINE 32768000 CONCAT - RW
sd c92t55d1-57 testvol-02 c92t55d1 1707622400 32768000 0 Disk_12 ENA
Place into STALE state:
# vxmend -g testdg on testvol-02 (DISABLED/STALE)
v testvol - ENABLED ACTIVE 32768000 SELECT - gen
pl testvol-01 testvol ENABLED ACTIVE 32768000 CONCAT - RW
sd c92t52d1-01 testvol-01 c92t52d1 0 32768000 0 Disk_17 ENA
pl testvol-02 testvol DISABLED STALE 32768000 CONCAT - RW
sd c92t55d1-57 testvol-02 c92t55d1 1707622400 32768000 0 Disk_12 ENA
If there are other ACTIVE or CLEAN plexes in the volum, reattach those plexes to volume (even though
they already are attached). If the volume is already ENABLED, resynchronisation of the plex is started immediately but unfortunately it waits until it synchronises completely.
# vxplex -g testdg att testvol-02 testvol
# vxprint testvol
v testvol gen ENABLED 32768000 - ACTIVE - -
pl testvol-01 testvol ENABLED 32768000 - ACTIVE - -
sd c92t52d1-01 testvol-01 ENABLED 32768000 0 - - -
pl testvol-02 testvol ENABLED 32768000 - ACTIVE - -
sd c92t55d1-57 testvol-02 ENABLED 32768000 0 - - -
If there are no other ACTIVE or CLEAN plexes in the volume, make the plex CLEAN
# vxmend -g testdg fix clean testvol-02 (DISABLED/CLEAN)
If the volume is not ENABLED, use the foll command to start it, and perform any resynchronisation
of the plexes in the backgroup
# vxvol -g testdg -o bg start testvol
(If the data in the plex was corrupted, and the volume has no ACTIVE or CLEAN redundant plexes from which its contents can be resynchronized, it must be restored from a backup or from a snapshot image)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
How to get a volume working if it is in “DETACHED DETACH” state?~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
It went into this state because underlying plexes went offline thereby causing volume to into
maintenance mode (no ios). This gives a chance to “enable active” individual plex to figure out
the clean plex. If you know which plex is clean for sure, then you can recover using “vxvol start”.
# vxvol -g testdg start testvol
Different scenarios where volumes were in different state before they were recovered using vxvol start
Scenario 1
# vxprint testvol
v testvol fsgen DETACHED 1048444928 - DETACH - -
pl testvol-01 testvol ENABLED 1048444928 - ACTIVE - -
sd c92t72d1-01 testvol-01 ENABLED 1048444928 0 - - -
pl testvol-02 testvol DISABLED 1048444928 - IOFAIL - -
sd c90t70d1-01 testvol-02 ENABLED 1048444928 0 RELOCATE - -
# vxplex -g testdg dis testvol-02
# vxvol -g testdg start testvol
# vxprint testvol
v testvol fsgen ENABLED 1048444928 - ACTIVE - -
pl testvol-01 testvol ENABLED 1048444928 - ACTIVE - -
sd c92t72d1-01 testvol-01 ENABLED 1048444928 0 - - -
Now attach the plex back to volume. It should start synchronising again.
Scenario 2
# vxprint testvol
v testvol gen DETACHED 409600 - DETACH - -
pl testvol-01 testvol DISABLED 409600 - RECOVER - -
sd c92t58d1-25 testvol-01 ENABLED 409600 0 - - -
pl testvol-02 testvol ENABLED 409600 - ACTIVE - -
sd c92t52d1-66 testvol-02 ENABLED 409600 0 - - -
# vxvol -g testdg start testvol
# vxprint testvol
v testvol gen ENABLED 409600 - ACTIVE - -
pl testvol-01 testvol ENABLED 409600 - ACTIVE - -
sd c92t58d1-25 testvol-01 ENABLED 409600 0 - - -
pl testvol-02 testvol ENABLED 409600 - ACTIVE - -
sd c92t52d1-66 testvol-02 ENABLED 409600 0 - - -
Scenario 3
# vxprint testvol_3
TY NAME ASSOC KSTATE LENGTH PLOFFS STATE TUTIL0 PUTIL0
v testvol_3 gen DETACHED 409600 - DETACH - -
pl testvol_3-01 testvol_3 DISABLED 409600 - IOFAIL - -
sd c92t58d1-27 testvol_3-01 ENABLED 409600 0 - - -
pl testvol_3-02 testvol_3 ENABLED 409600 - ACTIVE - -
sd c92t52d1-68 testvol_3-02 ENABLED 409600 0 - - -
# vxvol -g testdg start testvol_3
# vxprint testvol_3
v testvol_3 gen ENABLED 409600 - ACTIVE - -
pl testvol_3-01 testvol_3 ENABLED 409600 - ACTIVE - -
sd c92t58d1-27 testvol_3-01 ENABLED 409600 0 - - -
pl testvol_3-02 testvol_3 ENABLED 409600 - ACTIVE - -
sd c92t52d1-68 testvol_3-02 ENABLED 409600 0 - - -
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A volume with 2 plexes – one plex with Recover state and other in STALE state, How do you recover?~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Plex P1 is RECOVER indicates it was in the ACTIVE state prior to the failure
Plex P2 is STALE indicates it was not participating in I/Os and had stale data.
Run following commands:
# vxmend fix stale P1
# vxmend fix stale P2
# vxmend fix clean P1
# vxrecover -s V1
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A Volume is disabled and not startable. No CLEAN plexes. Good Plex is not known. How do you recover?~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* Take all but one plex offline and set that plex to CLEAN
* Run vxrecover -s
* Verify data on the volume
* Run vxvol stop
* Repeat this for all plexes until you identify the plex with good data.
How to remove disabled paths from Veritas?
Run vxdctl enable to make sure veritas has released it’s grip on the device.
# vxdctl enable
Make sure the device is offlined from Solaris’s view.
# luxadm -e offline /dev/rdsk/c2t5006048452A83978d206s2
Clear out the device from Solaris’s view.
# cfgadm -o unusable_FCP_dev -c unconfigure c2::5006048452a83978
# devfsadm -Cv
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Boot time related issues ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Problem: Licenses keys are corrputed, missing, or expired
Causes: The /etc/vx/licenses/lic files become corrupted OR An evaulation license was installed and not updated to a full license.
Solution:
Save /etc/vx/licenses/lic/* to a backup device. If the license files are removed or corrupted, you can copy the files back.
# vxlicinst (install a new license)
# vxiod set 10 (start i/o daemons)
# vxconfigd (start config daemon)
Problem: Boot device can’t be opened
Causes:
Boot disk is not powered on
Boot disk has failed
SCSI bus is not terminated
Controller failure has occurred
Disk is failing and locking the bus
Solution:
Check scsi bus connections: probe-scsi-all
Boot from alternate boot disk
Problem: VxvM start up scripts exit without initialisation
Causes:
/etc/vx/reconfig.d/state.d/install-db exists – indicates that VxVM software packages have been installed,
but vxvm has not been initialised with vxinstall. Therefore vxconfig is not started.
/VXV#.#.#-UPGRADE/.start_runed – indicates that a vxvm upgrade has been started but not completed.
Therefore vxconfigd is not started.
Solution:
Remove the files and take appropriate actions
Problem: A conflicting host ID exists in /etc/vx/volboot file
volboot file contains the host ID that was on the system when vxvm was installed.
Solution:
Change the host name in volboot file: vxdctl hostid
Recreated new volboot file: vxdctl init
Problem: /var/vxvm/tempdb directory is missing, misnamed, or corrupted
It stores configuration information about imported disk groups. The contents are recreated after a reboot.
Causes: Directory is missing, misnamed, or corrupted
Solution:
To remove and recreate this directory:
# vxconfigd -k -x cleartempdir
How to run vxconfigd in debug mode?
# vxconfigd -k -m enable -x debug_level
(0 – no debugging, 9 – highest debugging)
-x log – log all console output to the /var/vxvm/vxconfigd.log file
-x logfile=name – use the specified log file instead
-x syslog – Direct all console output through the syslog interface
-x timestamp – Attach a date and time-of-day timestamp to all messages
-x tracefile=name – log all possible tracing information in the given file
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Useful commands
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To remove “online failin” in vxdisk list output for a good disk
# vxedit -g dgname set failing=off diskname
To check volume sizes
# vxprint -q -v -g <DG> -F "%{name} %{len}" | awk '{printf "%s %d\n", $1,$2/2048}'
To find out the volumes on a particular disk
vxprint -g datadg -e 'any v_plex.pl_sd.sd_disk="datadisk01"'
To find out the volumes in DISABLED state
# vxprint -g dg_dodgepre5 -e 'v_kstate!=V_ENABLED'
To find out the plexes in DISABLED state
# vxprint -g dg_dodgepre5 -e 'pl_kstate!=PL_ENABLED'
To remove multiplex plexes which are DISABLED and have no devices
# vxprint -p -e pl_kstate!=PL_ENABLED -g $dg -F %{name} | while read i; do
vxplex -g $dg dis $i; vxedit -g -fr rm $i;
done
To remove the license
Remove the files in /etc/vx/license/lic and run vxdctl license init to pick up the new license
To change disk to “sliced”?
EVA80003_10 auto:none – – online invalid
/etc/vx/bin/vxdisksetup -i EVA80003_10 format=sliced
EVA80003_10 auto:sliced – – online
To start volume without recovery?
# vxrecover -sn
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Useful links
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Removing thin provisioned disk from DG:
http://www.symantec.com/connect/articles/automating-thin-storage-reclamation-veritas-storage-foundation
Veritas MAN pages
http://sfdoccentral.symantec.com/index.html
No comments yet.