[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Block and record terminology is rather confused, and it is also confusing to the expert reader. On the other hand, readers who are new to the field have a fresh mind, and they may safely skip the next two paragraphs, as the remainder of this manual uses those two terms in a quite consistent way.
John Gilmore, the writer of the public domain tar
from which
GNU tar
was originally derived, wrote (June 1995):
The nomenclature of tape drives comes from IBM, where I believe they were invented for the IBM 650 or so. On IBM mainframes, what is recorded on tape are tape blocks. The logical organization of data is into records. There are various ways of putting records into blocks, including
F
(fixed sized records),V
(variable sized records),FB
(fixed blocked: fixed size records, n to a block),VB
(variable size records, n to a block),VSB
(variable spanned blocked: variable sized records that can occupy more than one block), etc. TheJCL
‘DD RECFORM=’ parameter specified this to the operating system.The Unix man page on
tar
was totally confused about this. When I wrotePD TAR
, I used the historically correct terminology (tar
writes data records, which are grouped into blocks). It appears that the bogus terminology made it into POSIX (no surprise here), and now François has migrated that terminology back into the source code too.
The term physical block means the basic transfer chunk from or
to a device, after which reading or writing may stop without anything
being lost. In this manual, the term block usually refers to
a disk physical block, assuming that each disk block is 512
bytes in length. It is true that some disk devices have different
physical blocks, but tar
ignore these differences in its own
format, which is meant to be portable, so a tar
block is always
512 bytes in length, and block always mean a tar
block.
The term logical block often represents the basic chunk of
allocation of many disk blocks as a single entity, which the operating
system treats somewhat atomically; this concept is only barely used
in GNU tar
.
The term physical record is another way to speak of a physical
block, those two terms are somewhat interchangeable. In this manual,
the term record usually refers to a tape physical block,
assuming that the tar
archive is kept on magnetic tape.
It is true that archives may be put on disk or used with pipes,
but nevertheless, tar
tries to read and write the archive one
record at a time, whatever the medium in use. One record is made
up of an integral number of blocks, and this operation of putting many
disk blocks into a single tape block is called reblocking, or
more simply, blocking. The term logical record refers to
the logical organization of many characters into something meaningful
to the application. The term unit record describes a small set
of characters which are transmitted whole to or by the application,
and often refers to a line of text. Those two last terms are unrelated
to what we call a record in GNU tar
.
When writing to tapes, tar
writes the contents of the archive
in chunks known as records. To change the default blocking
factor, use the ‘--blocking-factor=512-size’ (‘-b
512-size’) option. Each record will then be composed of
512-size blocks. (Each tar
block is 512 bytes.
See section Basic Tar Format.) Each file written to the archive uses at least one
full record. As a result, using a larger record size can result in
more wasted space for small files. On the other hand, a larger record
size can often be read and written much more efficiently.
Further complicating the problem is that some tape drives ignore the blocking entirely. For these, a larger record size can still improve performance (because the software layers above the tape drive still honor the blocking), but not as dramatically as on tape drives that honor blocking.
When reading an archive, tar
can usually figure out the
record size on itself. When this is the case, and a non-standard
record size was used when the archive was created, tar
will
print a message about a non-standard blocking factor, and then operate
normally(26). On some tape
devices, however, tar
cannot figure out the record size
itself. On most of those, you can specify a blocking factor (with
‘--blocking-factor’) larger than the actual blocking factor,
and then use the ‘--read-full-records’ (‘-B’) option.
(If you specify a blocking factor with ‘--blocking-factor’ and
don’t use the ‘--read-full-records’ option, then tar
will not attempt to figure out the recording size itself.) On some
devices, you must always specify the record size exactly with
‘--blocking-factor’ when reading, because tar
cannot
figure it out. In any case, use ‘--list’ (‘-t’) before
doing any extractions to see whether tar
is reading the archive
correctly.
tar
blocks are all fixed size (512 bytes), and its scheme for
putting them into records is to put a whole number of them (one or
more) into each record. tar
records are all the same size;
at the end of the file there’s a block containing all zeros, which
is how you tell that the remainder of the last record(s) are garbage.
In a standard tar
file (no options), the block size is 512
and the record size is 10240, for a blocking factor of 20. What the
‘--blocking-factor’ option does is sets the blocking factor,
changing the record size while leaving the block size at 512 bytes.
20 was fine for ancient 800 or 1600 bpi reel-to-reel tape drives;
most tape drives these days prefer much bigger records in order to
stream and not waste tape. When writing tapes for myself, some tend
to use a factor of the order of 2048, say, giving a record size of
around one megabyte.
If you use a blocking factor larger than 20, older tar
programs might not be able to read the archive, so we recommend this
as a limit to use in practice. GNU tar
, however,
will support arbitrarily large record sizes, limited only by the
amount of virtual memory or the physical characteristics of the tape
device.
9.4.1 Format Variations | ||
9.4.2 The Blocking Factor of an Archive |
[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
(This message will disappear, once this node revised.)
Format parameters specify how an archive is written on the archive media. The best choice of format parameters will vary depending on the type and number of files being archived, and on the media used to store the archive.
To specify format parameters when accessing or creating an archive,
you can use the options described in the following sections.
If you do not specify any format parameters, tar
uses
default parameters. You cannot modify a compressed archive.
If you create an archive with the ‘--blocking-factor’ option
specified (see section The Blocking Factor of an Archive), you must specify that
blocking-factor when operating on the archive. See section Controlling the Archive Format, for other
examples of format parameter considerations.
[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
(This message will disappear, once this node revised.)
The data in an archive is grouped into blocks, which are 512 bytes. Blocks are read and written in whole number multiples called records. The number of blocks in a record (i.e., the size of a record in units of 512 bytes) is called the blocking factor. The ‘--blocking-factor=512-size’ (‘-b 512-size’) option specifies the blocking factor of an archive. The default blocking factor is typically 20 (i.e., 10240 bytes), but can be specified at installation. To find out the blocking factor of an existing archive, use ‘tar --list --file=archive-name’. This may not work on some devices.
Records are separated by gaps, which waste space on the archive media.
If you are archiving on magnetic tape, using a larger blocking factor
(and therefore larger records) provides faster throughput and allows you
to fit more data on a tape (because there are fewer gaps). If you are
archiving on cartridge, a very large blocking factor (say 126 or more)
greatly increases performance. A smaller blocking factor, on the other
hand, may be useful when archiving small files, to avoid archiving lots
of nulls as tar
fills out the archive to the end of the record.
In general, the ideal record size depends on the size of the
inter-record gaps on the tape you are using, and the average size of the
files you are archiving. See section How to Create Archives, for information on
writing archives.
Archives with blocking factors larger than 20 cannot be read
by very old versions of tar
, or by some newer versions
of tar
running on old machines with small address spaces.
With GNU tar
, the blocking factor of an archive is limited
only by the maximum record size of the device containing the archive,
or by the amount of available virtual memory.
Also, on some systems, not using adequate blocking factors, as sometimes imposed by the device drivers, may yield unexpected diagnostics. For example, this has been reported:
Cannot write to /dev/dlt: Invalid argument
In such cases, it sometimes happen that the tar
bundled by
the system is aware of block size idiosyncrasies, while GNU tar
requires an explicit specification for the block size,
which it cannot guess. This yields some people to consider
GNU tar
is misbehaving, because by comparison,
the bundle tar
works OK. Adding -b 256,
for example, might resolve the problem.
If you use a non-default blocking factor when you create an archive, you
must specify the same blocking factor when you modify that archive. Some
archive devices will also require you to specify the blocking factor when
reading that archive, however this is not typically the case. Usually, you
can use ‘--list’ (‘-t’) without specifying a blocking factor—tar
reports a non-default record size and then lists the archive members as
it would normally. To extract files from an archive with a non-standard
blocking factor (particularly if you’re not sure what the blocking factor
is), you can usually use the ‘--read-full-records’ (‘-B’) option while
specifying a blocking factor larger then the blocking factor of the archive
(i.e., ‘tar --extract --read-full-records --blocking-factor=300’).
See section How to List Archives, for more information on the ‘--list’ (‘-t’)
operation. See section Options to Help Read Archives, for a more detailed explanation of that option.
Specifies the blocking factor of an archive. Can be used with any operation, but is usually not necessary with ‘--list’ (‘-t’).
Device blocking
Set record size to blocks*512 bytes.
This option is used to specify a blocking factor for the archive.
When reading or writing the archive, tar
, will do reads and writes
of the archive in records of block*512 bytes. This is true
even when the archive is compressed. Some devices requires that all
write operations be a multiple of a certain size, and so, tar
pads the archive out to the next record boundary.
The default blocking factor is set when tar
is compiled, and is
typically 20. Blocking factors larger than 20 cannot be read by very
old versions of tar
, or by some newer versions of tar
running on old machines with small address spaces.
With a magnetic tape, larger records give faster throughput and fit more data on a tape (because there are fewer inter-record gaps). If the archive is in a disk file or a pipe, you may want to specify a smaller blocking factor, since a large one will result in a large number of null bytes at the end of the archive.
When writing cartridge or other streaming tapes, a much larger blocking factor (say 126 or more) will greatly increase performance. However, you must specify the same blocking factor when reading or updating the archive.
Apparently, Exabyte drives have a physical block size of 8K bytes. If we choose our blocksize as a multiple of 8k bytes, then the problem seems to disappear. Id est, we are using block size of 112 right now, and we haven’t had the problem since we switched…
With GNU tar
the blocking factor is limited only
by the maximum record size of the device containing the archive, or by
the amount of available virtual memory.
However, deblocking or reblocking is virtually avoided in a special case which often occurs in practice, but which requires all the following conditions to be simultaneously true:
tar
invocation.
If the output goes directly to a local disk, and not through stdout, then the last write is not extended to a full record size. Otherwise, reblocking occurs. Here are a few other remarks on this topic:
gzip
will complain about trailing garbage if asked to
uncompress a compressed archive on tape, there is an option to turn
the message off, but it breaks the regularity of simply having to use
‘prog -d’ for decompression. It would be nice if gzip was
silently ignoring any number of trailing zeros. I’ll ask Jean-loup
Gailly, by sending a copy of this message to him.
compress
does not show this problem, but as Jean-loup pointed
out to Michael, ‘compress -d’ silently adds garbage after
the result of decompression, which tar ignores because it already
recognized its end-of-file indicator. So this bug may be safely
ignored.
tar
might ignore the exit status returned, but I hate doing
that, as it weakens the protection tar
offers users against
other possible problems at decompression time. If gzip
was
silently skipping trailing zeros and also avoiding setting the
exit status in this innocuous case, that would solve this situation.
tar
should become more solid at not stopping to read a pipe at
the first null block encountered. This inelegantly breaks the pipe.
tar
should rather drain the pipe out before exiting itself.
Ignore blocks of zeros in archive (means EOF).
The ‘--ignore-zeros’ (‘-i’) option causes tar
to ignore blocks
of zeros in the archive. Normally a block of zeros indicates the
end of the archive, but when reading a damaged archive, or one which
was created by concatenating several archives together, this option
allows tar
to read the entire archive. This option is not on
by default because many versions of tar
write garbage after
the zeroed blocks.
Note that this option causes tar
to read to the end of the
archive file, which may sometimes avoid problems when multiple files
are stored on a single physical tape.
Reblock as we read (for reading 4.2BSD pipes).
If ‘--read-full-records’ is used, tar
will not panic if an attempt to read a record from the archive does
not return a full record. Instead, tar
will keep reading
until it has obtained a full
record.
This option is turned on by default when tar
is reading
an archive from standard input, or from a remote machine. This is
because on BSD Unix systems, a read of a pipe will return however
much happens to be in the pipe, even if it is less than tar
requested. If this option was not used, tar
would fail as
soon as it read an incomplete record from the pipe.
This option is also useful with the commands for updating an archive.
Tape blocking
When handling various tapes or cartridges, you have to take care of selecting a proper blocking, that is, the number of disk blocks you put together as a single tape block on the tape, without intervening tape gaps. A tape gap is a small landing area on the tape with no information on it, used for decelerating the tape to a full stop, and for later regaining the reading or writing speed. When the tape driver starts reading a record, the record has to be read whole without stopping, as a tape gap is needed to stop the tape motion without losing information.
Using higher blocking (putting more disk blocks per tape block) will use
the tape more efficiently as there will be less tape gaps. But reading
such tapes may be more difficult for the system, as more memory will be
required to receive at once the whole record. Further, if there is a
reading error on a huge record, this is less likely that the system will
succeed in recovering the information. So, blocking should not be too
low, nor it should be too high. tar
uses by default a blocking of
20 for historical reasons, and it does not really matter when reading or
writing to disk. Current tape technology would easily accommodate higher
blockings. Sun recommends a blocking of 126 for Exabytes and 96 for DATs.
We were told that for some DLT drives, the blocking should be a multiple
of 4Kb, preferably 64Kb (-b 128) or 256 for decent performance.
Other manufacturers may use different recommendations for the same tapes.
This might also depends of the buffering techniques used inside modern
tape controllers. Some imposes a minimum blocking, or a maximum blocking.
Others request blocking to be some exponent of two.
So, there is no fixed rule for blocking. But blocking at read time should ideally be the same as blocking used at write time. At one place I know, with a wide variety of equipment, they found it best to use a blocking of 32 to guarantee that their tapes are fully interchangeable.
I was also told that, for recycled tapes, prior erasure (by the same drive unit that will be used to create the archives) sometimes lowers the error rates observed at rewriting time.
I might also use ‘--number-blocks’ instead of ‘--block-number’, so ‘--block’ will then expand to ‘--blocking-factor’ unambiguously.
[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] |
This document was generated on August 23, 2023 using texi2html 5.0.