Computers-it. IBM TSM - How to remove a tape that contains corrupt files

IBM TSM - How to remove a tape that contains corrupt files

Errors in activity log

If a volume is reporting read errors in the activity log similar to those shown below then audit the volume to detect if there are any files with errors on the tape.

06/10/2009 12:47:27 	ANR8944E Hardware or media error on drive LTO2
			(/dev/IBMtape1) with volume VOL02(OP=READ, Error
			Number= 5, CC=0, KEY=03, ASC=11, ASCQ=00,
			SENSE=F0.00.03.00.04.00.00.1C.00.00.00.00.11.00.36.00.70-
			.60.00.00.00.03.43.47.43.30.32.37.4C.10.00.00.00.00.00.0-
			0, Description=An undetermined error has occurred). Refer
			to Appendix D in the 'Messages' manual for recommended
			action. (SESSION: 323460, PROCESS: 3014)

06/10/2009 12:47:27 	ANR8359E Media fault detected on LTO volume VOL02 in
			drive LTO2 (/dev/IBMtape1) of library 3583LIBR. (SESSION:
			323460, PROCESS: 3014)

Audit tape

To audit the tape type the following.

audit vol volume_name

Where volume_name should be replaced by the name of your volume.

Audit tape errors

Check the activity log, if there errors listed similar to the following then the media is faulty and should be removed and thrown away.

06/10/2009 12:47:27 	ANR2335W Audit Volume has encountered an I/O error for
			volume VOL02 while attempting to read: Node
			NODE_1, Type Backup (Active), Filespace /u01, fsId
			1, File Name // RFDGT_16ke3sao_269350. (SESSION:
			323460, PROCESS: 3014)

06/10/2009 12:47:27 	ANR2317W Audit Volume found damaged file on volume
			VOL02: Node NODE_1, Type Backup (Active), File
			space /u01, fsId 1, File name // RFDGT_16ke3sao_269350
			is number 1 of 1 versions. (SESSION: 323460, PROCESS:
			3014)

Get tapes for restore

To allow for the disposal of the faulty volume we will need to restore the data that is contained on the faulty tape from offsite storage.

To list the required offsite volumes run the following command.

restore volume VOL02 preview=yes

Where VOL02 is the name of the faulty tape.

You should get output similar to the following if the data on the faulty tape is stored on offsite storage.

06/11/2009 10:39:07      ANR1255W Files on volume VOL05 cannot be restored -
                          access mode is "unavailable" or "offsite". (SESSION:
                          337242, PROCESS: 3116)

There may be more than one tape listed if the data is spread across more than one offsite tape.

Get these tapes back onsite from offsite storage. The data can then be restored and the faulty tape destroyed.

Whilst you are waiting for the tapes to be returned from offsite storage ensure the offsite tapes are set to unavailable as follows.

upd vol V0L05 access=unavailable

This will prevent the data changing on the offsite tapes due to reclaimation.

Also set the access status of the faulty volume to read only as follows.

upd vol VOL02 access=readonly

Restore

Once the tapes are back onsite and loaded into the hopper (bulk I/O) check them in as follows.

checkin libv libary_name search=bulk status=private checkl=barcode

Where library_name is the name of your library.

Then query the request as follows.

q req

Then reply to the request as follows.

reply request_id

Where request_id is the request ID displayed in the q req command.

As soon as the tapes are checked into the library, make them read only as follows.

upd vol vol_name access=readonly

Where vol_name is the name of the tape you are making read only.

We then need to start the restore as follows.

restore vol VOL02

Where VOL02 is the faulty tape.

This will cause TSM to restore from the secondary copy (the offsite tapes that have been checked in) and put the data onto other tapes already in the storage pool or scratch if extra tapes are required.

The access mode for the faulty tape (in this case VOL02) will get updated to destroyed.

Type q proc to see the restore processes running. Once complete check the activity log for errors.

Look for an entry similar to the following.

 24-12-2009 15:08:38      ANR0986I Process 9559 for RESTORE VOLUME running in the
                          BACKGROUND processed 988889 items for a total of
                          397,375,066,708 bytes with a completion state of SUCCESS
                          at 15:08:38. (SESSION: 870606, PROCESS: 9559)
 24-12-2009 15:08:38      ANR1240I Restore of volumes in primary storage pool
                          PRIMARY has ended.  Files Restored: 988889, Bytes
                          Restored: 397375066708, Unreadable Files: 0, Unreadable
                          Bytes: 0. (SESSION: 870606)

If there are no errors and the restore completed successfully then the faulty tape VOL02 will be set to scratch and can be checked out and thrown away.

Return restore tapes

The offsite tapes can be sent back offsite again. Before they are moved offsite their access status needs to be set to offsite as follows.

upd vol vol_name access=offsite

The tapes then need to be checked out as follows.

checkout libv lib_name vol_name checkl=no where lib_name is the name of your library and vol_name is the name of the tape to go offsite.