IBM TSM - How to remove a tape that contains corrupt files
Errors in activity log
If a volume is reporting read errors in the activity log similar to those shown below then audit the volume to detect if there are any files with errors on the tape.
06/10/2009 12:47:27 ANR8944E Hardware or media error on drive LTO2 (/dev/IBMtape1) with volume VOL02(OP=READ, Error Number= 5, CC=0, KEY=03, ASC=11, ASCQ=00, SENSE=F0.00.03.00.04.00.00.1C.00.00.00.00.11.00.36.00.70- .60.00.00.00.03.43.47.43.30.32.37.4C.10.00.00.00.00.00.0- 0, Description=An undetermined error has occurred). Refer to Appendix D in the 'Messages' manual for recommended action. (SESSION: 323460, PROCESS: 3014)
06/10/2009 12:47:27 ANR8359E Media fault detected on LTO volume VOL02 in drive LTO2 (/dev/IBMtape1) of library 3583LIBR. (SESSION: 323460, PROCESS: 3014)
Audit tape
To audit the tape type the following.
audit vol volume_name
Where volume_name should be replaced by the name of your volume.
Audit tape errors
Check the activity log, if there errors listed similar to the following then the media is faulty and should be removed and thrown away.
06/10/2009 12:47:27 ANR2335W Audit Volume has encountered an I/O error for volume VOL02 while attempting to read: Node NODE_1, Type Backup (Active), Filespace /u01, fsId 1, File Name // RFDGT_16ke3sao_269350. (SESSION: 323460, PROCESS: 3014)
06/10/2009 12:47:27 ANR2317W Audit Volume found damaged file on volume VOL02: Node NODE_1, Type Backup (Active), File space /u01, fsId 1, File name // RFDGT_16ke3sao_269350 is number 1 of 1 versions. (SESSION: 323460, PROCESS: 3014)
Get tapes for restore
To allow for the disposal of the faulty volume we will need to restore the data that is contained on the faulty tape from offsite storage.
To list the required offsite volumes run the following command.
restore volume VOL02 preview=yes
Where VOL02 is the name of the faulty tape.
You should get output similar to the following if the data on the faulty tape is stored on offsite storage.
06/11/2009 10:39:07 ANR1255W Files on volume VOL05 cannot be restored - access mode is "unavailable" or "offsite". (SESSION: 337242, PROCESS: 3116)
There may be more than one tape listed if the data is spread across more than one offsite tape.
Get these tapes back onsite from offsite storage. The data can then be restored and the faulty tape destroyed.
Whilst you are waiting for the tapes to be returned from offsite storage ensure the offsite tapes are set to unavailable as follows.
upd vol V0L05 access=unavailable
This will prevent the data changing on the offsite tapes due to reclaimation.
Also set the access status of the faulty volume to read only as follows.
upd vol VOL02 access=readonly
Restore
Once the tapes are back onsite and loaded into the hopper (bulk I/O) check them in as follows.
checkin libv libary_name search=bulk status=private checkl=barcode
Where library_name is the name of your library.
Then query the request as follows.
q req
Then reply to the request as follows.
reply request_id
Where request_id is the request ID displayed in the q req command.
As soon as the tapes are checked into the library, make them read only as follows.
upd vol vol_name access=readonly
Where vol_name is the name of the tape you are making read only.
We then need to start the restore as follows.
restore vol VOL02
Where VOL02 is the faulty tape.
This will cause TSM to restore from the secondary copy (the offsite tapes that have been checked in) and put the data onto other tapes already in the storage pool or scratch if extra tapes are required.
The access mode for the faulty tape (in this case VOL02) will get updated to destroyed.
Type q proc to see the restore processes running. Once complete check the activity log for errors.
Look for an entry similar to the following.
24-12-2009 15:08:38 ANR0986I Process 9559 for RESTORE VOLUME running in the BACKGROUND processed 988889 items for a total of 397,375,066,708 bytes with a completion state of SUCCESS at 15:08:38. (SESSION: 870606, PROCESS: 9559) 24-12-2009 15:08:38 ANR1240I Restore of volumes in primary storage pool PRIMARY has ended. Files Restored: 988889, Bytes Restored: 397375066708, Unreadable Files: 0, Unreadable Bytes: 0. (SESSION: 870606)
If there are no errors and the restore completed successfully then the faulty tape VOL02 will be set to scratch and can be checked out and thrown away.
Return restore tapes
The offsite tapes can be sent back offsite again. Before they are moved offsite their access status needs to be set to offsite as follows.
upd vol vol_name access=offsite
The tapes then need to be checked out as follows.
checkout libv lib_name vol_name checkl=no where lib_name is the name of your library and vol_name is the name of the tape to go offsite.