Discussion:
[Bacula-users] ERROR Spooling/Backups with large amounts of data from windows server 2012
Hans Thueminger
2013-11-20 15:58:34 UTC
Permalink
Dear Bacula Users!

I'm a newbie and have been trying to configure a Bacula backup solution
for our Institute for two months. I'm excited of this Software (thanks
to the developers for that great work) and I absolutely want to use it
:-) At the moment, unfortunately I can't use it as I would like.

We have to backup a Fileserver (Windows-Server-2012, Bacula client
5.2.10) with 400TB of Data with an Autochanger (IBM-TS3500) with two
Drives (TS1140). We operate a dedicated Backupserver (CentOS 6.4) with
Bacula (5.2.13) which is connectetd with a dedicated 1 GBit Interface to
the Fileserver. The two drives of the autochanger are directly connected
to a QLogic QLE2562 Dual Port Fibre Channel HBA (on the backupserver).
To be ensured, the drives are operated in streaming mode, the
backupserver has a 35TB spool area. There are several filesystems
connected to the fileserver:
3x120TB
4x40TB

The first thing we have to do is to split the 3x120TB in 6x60TB since
Microsoft does not support VSS for filesystems >(64TB-8GB) not even with
the latest release (Server-2012-R2). This is a not documented "design"
and we got this information from the Microsoft support team. To
re-emphasize this, in Windows-Server-2012 it's not a problem to create
and use a filesystem >64TB but it's not possible to make a backup of any
file of any size in such a filesystem; not even with the in the
operating system included Windows-Server-Backup program :-(

Now I'm glad to make backups and restores of files of our 4x40TB
Filesystems with Bacula if they are not too big (< 5TB). That works
fine. If they are too big (> 15TB) I always get an error after creating
the second or third or sometimes subsequent spoolfile (Error:
lib/bsock.c...). Never for the first spoolfile! I've tried several spool
sizes (from 500GB to 16TB) and different network settings. As attachment
(bacula_mailing_list_some_error_logs.txt) you can find some logs, when
the error occurred. What I have also tried:
- using different Networkinterfaces (at the moment an ethernet cable is
directly connected (no switch) between the fileserver and the
backupserver and this connection is used for the backups (checked with
netstat))
- heartbeat: enabling on the SD (60 seconds) and
net.ipv4.tcp_keepalive_time also set to 60
- AllowCompression Yes/No
- many, many hours for trials

Is there anybody out there who has a similar environment which is
working or can point me in the right direction? I'd behappy for
anyfurther questions or suggestions since at the moment I'm at a loss...

Cheers and thanks in advance
Hans

PS: As attachments you can find also all of the config files and one
complete job listing wich was ok and another one with an error
PPS: Today in the morning, I've tried to Signup for a new account at
http://bugs.bacula.org/signup_page.php for the bug database and haven't
received an E-Mail (***@ipf.tuwien.ac.at) for confirming yet
(username hthuemin)
Simone Caronni
2013-11-20 16:48:05 UTC
Permalink
Hello,
Post by Hans Thueminger
The first thing we have to do is to split the 3x120TB in 6x60TB since
Microsoft does not support VSS for filesystems >(64TB-8GB) not even with
the latest release (Server-2012-R2). This is a not documented "design" and
we got this information from the Microsoft support team. To re-emphasize
this, in Windows-Server-2012 it's not a problem to create and use a
filesystem >64TB but it's not possible to make a backup of any file of any
size in such a filesystem; not even with the in the operating system
included Windows-Server-Backup program :-(
I never stop learning... that's pretty ridiculous! Have you ever considered
switching to Samba 3.6/4 on an XFS filesystem?

We have a similar setup here, but not with these amounts of data and no
Windows Server 2012; I'm sorry.

Regards,
--Simone
--
You cannot discover new oceans unless you have the courage to lose sight of
the shore (R. W. Emerson).

http://xkcd.com/229/
http://negativo17.org/
Hans Thueminger
2013-11-21 14:50:57 UTC
Permalink
Hello,
Post by Simone Caronni
Hello,
The first thing we have to do is to split the 3x120TB in 6x60TB
since Microsoft does not support VSS for filesystems >(64TB-8GB)
not even with the latest release (Server-2012-R2). This is a not
documented "design" and we got this information from the Microsoft
support team. To re-emphasize this, in Windows-Server-2012 it's
not a problem to create and use a filesystem >64TB but it's not
possible to make a backup of any file of any size in such a
filesystem; not even with the in the operating system included
Windows-Server-Backup program :-(
I never stop learning... that's pretty ridiculous! Have you ever
considered switching to Samba 3.6/4 on an XFS filesystem?
Unfortunately we can not any longer, since we are already using this
fileserver in operation mode :-(
Post by Simone Caronni
We have a similar setup here, but not with these amounts of data and
no Windows Server 2012; I'm sorry.
Thank you anyway

Cheers
Hans
l***@kwsoft.de
2013-11-21 09:07:34 UTC
Permalink
Post by Hans Thueminger
Dear Bacula Users!
I'm a newbie and have been trying to configure a Bacula backup
solution for our Institute for two months. I'm excited of this
Software (thanks to the developers for that great work) and I
absolutely want to use it :-) At the moment, unfortunately I can't
use it as I would like.
We have to backup a Fileserver (Windows-Server-2012, Bacula client
5.2.10) with 400TB of Data with an Autochanger (IBM-TS3500) with two
Drives (TS1140). We operate a dedicated Backupserver (CentOS 6.4)
with Bacula (5.2.13) which is connectetd with a dedicated 1 GBit
Interface to the Fileserver. The two drives of the autochanger are
directly connected to a QLogic QLE2562 Dual Port Fibre Channel HBA
(on the backupserver). To be ensured, the drives are operated in
streaming mode, the backupserver has a 35TB spool area. There are
3x120TB
4x40TB
Looks like a decent invest in hardware ;-)
Post by Hans Thueminger
The first thing we have to do is to split the 3x120TB in 6x60TB
since Microsoft does not support VSS for filesystems >(64TB-8GB) not
even with the latest release (Server-2012-R2). This is a not
documented "design" and we got this information from the Microsoft
support team. To re-emphasize this, in Windows-Server-2012 it's not
a problem to create and use a filesystem >64TB but it's not possible
to make a backup of any file of any size in such a filesystem; not
even with the in the operating system included Windows-Server-Backup
program :-(
With up to Windows 2008 R2 the supported volume size was 16TB with
Windows 2012 it is 64TB. Note that their are other constraints with
VSS when used with volumes containing many files or having heavy load
while doing snapshots. That said you should always be able to backup
without VSS, but open files get you in trouble in this case.
Post by Hans Thueminger
Now I'm glad to make backups and restores of files of our 4x40TB
Filesystems with Bacula if they are not too big (< 5TB). That works
fine. If they are too big (> 15TB) I always get an error after
creating the second or third or sometimes subsequent spoolfile
(Error: lib/bsock.c...). Never for the first spoolfile! I've tried
several spool sizes (from 500GB to 16TB) and different network
settings. As attachment (bacula_mailing_list_some_error_logs.txt)
you can find some logs, when the error occurred. What I have also
- using different Networkinterfaces (at the moment an ethernet cable
is directly connected (no switch) between the fileserver and the
backupserver and this connection is used for the backups (checked
with netstat))
- heartbeat: enabling on the SD (60 seconds) and
net.ipv4.tcp_keepalive_time also set to 60
- AllowCompression Yes/No
- many, many hours for trials
So you really have files with 15TB in size? What would be worth a try
is the following:

- Increase the "maximum file size" for the tape drive. The default is
1G and it limits how big the "blocks" on tape are between EOF markers.
Maybe the counter per file is an int and you therefore have trouble
with 15TB files?

- You should increase the default block size written to tape with
"maximum block size" set to for example 2M. Warning: You could not
read already written tapes with non matching block sizes.

- The spool area doesn't need to be that big, but really fast to
saturate the tape drive and keep them streaming. Recommended is
something like fast SSD or similar.
Post by Hans Thueminger
Is there anybody out there who has a similar environment which is
working or can point me in the right direction? I'd behappy for
anyfurther questions or suggestions since at the moment I'm at a loss...
We are way smaller in file size, so use the suggested as theoretical advice...

Regards

Andreas
Hans Thueminger
2013-11-21 14:13:46 UTC
Permalink
Post by l***@kwsoft.de
[...]
With up to Windows 2008 R2 the supported volume size was 16TB with
Windows 2012 it is 64TB. Note that their are other constraints with
VSS when used with volumes containing many files or having heavy load
while doing snapshots. That said you should always be able to backup
without VSS, but open files get you in trouble in this case.
I didn't find a way to make a backup without VSS, neither with the in
the operating system included Windows-Server-Backup program nor with
Bacula. But while writing this sentence I had an idea: What is
happening, if backing up a mount point instead of a drive letter? Now
I've just mounted the 120TB Filesystems as a mount point in C: (which is
a 300GB filesystem) and look at here:

21-Nov 12:49 bacula-sd JobId 292: Spooling data ...
21-Nov 12:49 fs2-fd JobId 292: Generate VSS snapshots. Driver="Win64
VSS", Drive(s)="C"

and the status says:

JobId 292 Job fs2-PHOTO.2013-11-21_12.46.00_11 is running.
VSS Full Backup Job started: 21-Nov-13 12:47
Files=31,223 Bytes=308,963,855,104 Bytes/sec=89,192,798 Errors=0
Files Examined=31,223
Processing file:
C:/PHOTO/Projects/PHOTO-Projects/09_ALS_Kaernten/GailtalLatschur/Dif-Gailtal-Latschur/s577_s592_p02.tif

Projects is the mountpoint for the 120TB filesystem!

By now trying to backup G:/PHOTO-Projects/ (which is the same 120TB
filesyste as above), I always received the following error:

17-Sep 15:54 fs2-fd JobId 36: Generate VSS snapshots. Driver="Win64
VSS", Drive(s)="G"
17-Sep 15:55 fs2-fd JobId 36: Fatal error: CreateSGenerate VSS snapshots
failed. ERR=The operation completed successfully.

It seems, that this is a way to trick bacula and windows :-) Of course
with this workaround we still have the problems with open files, but the
actual problem which I want to discuss with you is the error I receive
Post by l***@kwsoft.de
Post by Hans Thueminger
Now I'm glad to make backups and restores of files of our 4x40TB
Filesystems with Bacula if they are not too big (< 5TB). That works
fine. If they are too big (> 15TB) I always get an error after
creating the second or third or sometimes subsequent spoolfile
(Error: lib/bsock.c...). Never for the first spoolfile! I've tried
several spool sizes (from 500GB to 16TB) and different network
settings. As attachment (bacula_mailing_list_some_error_logs.txt)
you can find some logs, when the error occurred. What I have also
- using different Networkinterfaces (at the moment an ethernet cable
is directly connected (no switch) between the fileserver and the
backupserver and this connection is used for the backups (checked
with netstat))
- heartbeat: enabling on the SD (60 seconds) and
net.ipv4.tcp_keepalive_time also set to 60
- AllowCompression Yes/No
- many, many hours for trials
So you really have files with 15TB in size? What would be worth a try
Sorry, that was badly written of me. It's not the size of one file, it's
the size of all files. So what I wanted to write was: "if the amount of
files to be backuped is not too large (< 5TB) it works. If the amount of
the files to be backuped ist larger than 15TB it fails always!"
Post by l***@kwsoft.de
- Increase the "maximum file size" for the tape drive. The default is
1G and it limits how big the "blocks" on tape are between EOF markers.
Maybe the counter per file is an int and you therefore have trouble
with 15TB files?
I guess the spool file is written as one file to the tape which would
mean, that for every spool file only one EOF marker would be written?
Can you confirm that, or are I'm wrong? What would you suggest to set
for "maximum file size"?
Post by l***@kwsoft.de
- You should increase the default block size written to tape with
"maximum block size" set to for example 2M. Warning: You could not
read already written tapes with non matching block sizes.
Ok and thank you for the warning, that's rather good to know!
Post by l***@kwsoft.de
- The spool area doesn't need to be that big, but really fast to
saturate the tape drive and keep them streaming. Recommended is
something like fast SSD or similar.
The tapes we are using have a native capacity of 4TB. I thought that
should be the lowest size of the spool size to prevent start and stop
operations of the drives. With the hardware compression sometimes more
than 9TB are written on a tape, so I decided the ideal spool size is
about 16TB?! So where is my error in reasoning?
Post by l***@kwsoft.de
Post by Hans Thueminger
Is there anybody out there who has a similar environment which is
working or can point me in the right direction? I'd behappy for
anyfurther questions or suggestions since at the moment I'm at a loss...
We are way smaller in file size, so use the suggested as theoretical advice...
Thank you very much!

Cheers
Hans
l***@kwsoft.de
2013-11-21 15:58:22 UTC
Permalink
Post by Hans Thueminger
Post by l***@kwsoft.de
[...]
With up to Windows 2008 R2 the supported volume size was 16TB with
Windows 2012 it is 64TB. Note that their are other constraints with
VSS when used with volumes containing many files or having heavy load
while doing snapshots. That said you should always be able to backup
without VSS, but open files get you in trouble in this case.
I didn't find a way to make a backup without VSS, neither with the
in the operating system included Windows-Server-Backup program nor
with Bacula. But while writing this sentence I had an idea: What is
happening, if backing up a mount point instead of a drive letter?
21-Nov 12:49 bacula-sd JobId 292: Spooling data ...
21-Nov 12:49 fs2-fd JobId 292: Generate VSS snapshots. Driver="Win64
VSS", Drive(s)="C"
JobId 292 Job fs2-PHOTO.2013-11-21_12.46.00_11 is running.
VSS Full Backup Job started: 21-Nov-13 12:47
Files=31,223 Bytes=308,963,855,104 Bytes/sec=89,192,798 Errors=0
Files Examined=31,223
C:/PHOTO/Projects/PHOTO-Projects/09_ALS_Kaernten/GailtalLatschur/Dif-Gailtal-Latschur/s577_s592_p02.tif
Projects is the mountpoint for the 120TB filesystem!
By now trying to backup G:/PHOTO-Projects/ (which is the same 120TB
17-Sep 15:54 fs2-fd JobId 36: Generate VSS snapshots. Driver="Win64
VSS", Drive(s)="G"
17-Sep 15:55 fs2-fd JobId 36: Fatal error: CreateSGenerate VSS
snapshots failed. ERR=The operation completed successfully.
It seems, that this is a way to trick bacula and windows :-) Of
course with this workaround we still have the problems with open
files, but the actual problem which I want to discuss with you is
the error I receive after creating the second or third or sometimes
Uhm, no. The idea was to set "enable vss = no", but as said this only
works some sort of if the volume in question does not have open files.
Post by Hans Thueminger
Post by l***@kwsoft.de
Post by Hans Thueminger
Now I'm glad to make backups and restores of files of our 4x40TB
Filesystems with Bacula if they are not too big (< 5TB). That works
fine. If they are too big (> 15TB) I always get an error after
creating the second or third or sometimes subsequent spoolfile
(Error: lib/bsock.c...). Never for the first spoolfile! I've tried
several spool sizes (from 500GB to 16TB) and different network
settings. As attachment (bacula_mailing_list_some_error_logs.txt)
you can find some logs, when the error occurred. What I have also
- using different Networkinterfaces (at the moment an ethernet cable
is directly connected (no switch) between the fileserver and the
backupserver and this connection is used for the backups (checked
with netstat))
- heartbeat: enabling on the SD (60 seconds) and
net.ipv4.tcp_keepalive_time also set to 60
- AllowCompression Yes/No
- many, many hours for trials
So you really have files with 15TB in size? What would be worth a try
Sorry, that was badly written of me. It's not the size of one file,
it's the size of all files. So what I wanted to write was: "if the
amount of files to be backuped is not too large (< 5TB) it works. If
the amount of the files to be backuped ist larger than 15TB it fails
always!"
This might point to another problem case. We had a similar problem on
our main filer with backup until around 2TB succeeded, anything above
had a ~50% to fail with network errors. We switched the NIC and use
some Intel PlugIn card and the problem went away.
Post by Hans Thueminger
Post by l***@kwsoft.de
- Increase the "maximum file size" for the tape drive. The default is
1G and it limits how big the "blocks" on tape are between EOF markers.
Maybe the counter per file is an int and you therefore have trouble
with 15TB files?
I guess the spool file is written as one file to the tape which
would mean, that for every spool file only one EOF marker would be
written? Can you confirm that, or are I'm wrong? What would you
suggest to set for "maximum file size"?
No, the spool file is only to decouple the streaming to tape from
network and client speed and delays. There is not much buffering when
writing to tape so the spool area need to be able to constantly
deliever data faster tahn the tape can consume. It is still written
with the "maximum block size" per transfer and a EOF marker every
"maximum file size" to tape.
Post by Hans Thueminger
Post by l***@kwsoft.de
- You should increase the default block size written to tape with
"maximum block size" set to for example 2M. Warning: You could not
read already written tapes with non matching block sizes.
Ok and thank you for the warning, that's rather good to know!
Post by l***@kwsoft.de
- The spool area doesn't need to be that big, but really fast to
saturate the tape drive and keep them streaming. Recommended is
something like fast SSD or similar.
The tapes we are using have a native capacity of 4TB. I thought that
should be the lowest size of the spool size to prevent start and
stop operations of the drives. With the hardware compression
sometimes more than 9TB are written on a tape, so I decided the
ideal spool size is about 16TB?! So where is my error in reasoning?
There is no problem in cutting the job in pieces of some GB with the
tape being idle in between. The problem arise if you can deliever
data, but not fast enough. So the tape drive try to reach its max.
speed try to settle down if data are to slow and stop and rewind if
nothing helps. To prevent this you need data chunks delieverable with
(much) more than the tape can handle, so for this chunk you never has
a "underun" of data. IMHO a smaller faster spool is always better than
a bigger slower spool.

Regards

Andreas
Hans Thueminger
2013-11-21 17:28:54 UTC
Permalink
Post by l***@kwsoft.de
Post by Hans Thueminger
Post by l***@kwsoft.de
[...]
With up to Windows 2008 R2 the supported volume size was 16TB with
Windows 2012 it is 64TB. Note that their are other constraints with
VSS when used with volumes containing many files or having heavy load
while doing snapshots. That said you should always be able to backup
without VSS, but open files get you in trouble in this case.
I didn't find a way to make a backup without VSS, neither with the
in the operating system included Windows-Server-Backup program nor
with Bacula. But while writing this sentence I had an idea: What is
happening, if backing up a mount point instead of a drive letter?
21-Nov 12:49 bacula-sd JobId 292: Spooling data ...
21-Nov 12:49 fs2-fd JobId 292: Generate VSS snapshots. Driver="Win64
VSS", Drive(s)="C"
JobId 292 Job fs2-PHOTO.2013-11-21_12.46.00_11 is running.
VSS Full Backup Job started: 21-Nov-13 12:47
Files=31,223 Bytes=308,963,855,104 Bytes/sec=89,192,798 Errors=0
Files Examined=31,223
C:/PHOTO/Projects/PHOTO-Projects/09_ALS_Kaernten/GailtalLatschur/Dif-Gailtal-Latschur/s577_s592_p02.tif
Projects is the mountpoint for the 120TB filesystem!
By now trying to backup G:/PHOTO-Projects/ (which is the same 120TB
17-Sep 15:54 fs2-fd JobId 36: Generate VSS snapshots. Driver="Win64
VSS", Drive(s)="G"
17-Sep 15:55 fs2-fd JobId 36: Fatal error: CreateSGenerate VSS
snapshots failed. ERR=The operation completed successfully.
It seems, that this is a way to trick bacula and windows :-) Of
course with this workaround we still have the problems with open
files, but the actual problem which I want to discuss with you is
the error I receive after creating the second or third or sometimes
Uhm, no. The idea was to set "enable vss = no", but as said this only
works some sort of if the volume in question does not have open files.
Ok, now it's clear why it makes no difference to set Enable VSS=yes or
no: There are always open files on these filesystems (they are already
in use)...
Post by l***@kwsoft.de
Post by Hans Thueminger
Post by l***@kwsoft.de
Post by Hans Thueminger
Now I'm glad to make backups and restores of files of our 4x40TB
Filesystems with Bacula if they are not too big (< 5TB). That works
fine. If they are too big (> 15TB) I always get an error after
creating the second or third or sometimes subsequent spoolfile
(Error: lib/bsock.c...). Never for the first spoolfile! I've tried
several spool sizes (from 500GB to 16TB) and different network
settings. As attachment (bacula_mailing_list_some_error_logs.txt)
you can find some logs, when the error occurred. What I have also
- using different Networkinterfaces (at the moment an ethernet cable
is directly connected (no switch) between the fileserver and the
backupserver and this connection is used for the backups (checked
with netstat))
- heartbeat: enabling on the SD (60 seconds) and
net.ipv4.tcp_keepalive_time also set to 60
- AllowCompression Yes/No
- many, many hours for trials
So you really have files with 15TB in size? What would be worth a try
Sorry, that was badly written of me. It's not the size of one file,
it's the size of all files. So what I wanted to write was: "if the
amount of files to be backuped is not too large (< 5TB) it works. If
the amount of the files to be backuped ist larger than 15TB it fails
always!"
This might point to another problem case. We had a similar problem on
our main filer with backup until around 2TB succeeded, anything above
had a ~50% to fail with network errors. We switched the NIC and use
some Intel PlugIn card and the problem went away.
I thought I can exclude such hardware problems, because I have already
tried different network interfaces. But good to know, that such an error
can occur from a NIC too!
Post by l***@kwsoft.de
Post by Hans Thueminger
Post by l***@kwsoft.de
- Increase the "maximum file size" for the tape drive. The default is
1G and it limits how big the "blocks" on tape are between EOF markers.
Maybe the counter per file is an int and you therefore have trouble
with 15TB files?
I guess the spool file is written as one file to the tape which
would mean, that for every spool file only one EOF marker would be
written? Can you confirm that, or are I'm wrong? What would you
suggest to set for "maximum file size"?
No, the spool file is only to decouple the streaming to tape from
network and client speed and delays. There is not much buffering when
writing to tape so the spool area need to be able to constantly
deliever data faster tahn the tape can consume. It is still written
with the "maximum block size" per transfer and a EOF marker every
"maximum file size" to tape.
Post by Hans Thueminger
Post by l***@kwsoft.de
- You should increase the default block size written to tape with
"maximum block size" set to for example 2M. Warning: You could not
read already written tapes with non matching block sizes.
Ok and thank you for the warning, that's rather good to know!
Post by l***@kwsoft.de
- The spool area doesn't need to be that big, but really fast to
saturate the tape drive and keep them streaming. Recommended is
something like fast SSD or similar.
The tapes we are using have a native capacity of 4TB. I thought that
should be the lowest size of the spool size to prevent start and
stop operations of the drives. With the hardware compression
sometimes more than 9TB are written on a tape, so I decided the
ideal spool size is about 16TB?! So where is my error in reasoning?
There is no problem in cutting the job in pieces of some GB with the
tape being idle in between. The problem arise if you can deliever
data, but not fast enough. So the tape drive try to reach its max.
speed try to settle down if data are to slow and stop and rewind if
nothing helps. To prevent this you need data chunks delieverable with
(much) more than the tape can handle, so for this chunk you never has
a "underun" of data. IMHO a smaller faster spool is always better than
a bigger slower spool.
Thank you for this explantation. Than we will reconstruct the
backupserver and remove some of the HDs and put som SSDs in the Server...


Regards

Hans
Thomas Lohman
2013-11-21 15:13:37 UTC
Permalink
Post by Hans Thueminger
- heartbeat: enabling on the SD (60 seconds) and
net.ipv4.tcp_keepalive_time also set to 60
In glancing at your error (Connection reset by peer) and your config
files, I didn't see the Heartbeat Interval setting in all the places
that it may need to be. Make sure it is in all the following locations:

Director definition for the server Director daemon.
Storage definition for the server Storage daemon.
FileDaemon definition for the Client File daemon

That error typically means the network/socket connection between the
file daemon and the storage daemon was closed unexpectedly at one end or
by something in between blocking/dropping it. I have also seen that
error suddenly pop up on Windows clients for no obvious reason but a
reboot of the Windows box has fixed it.


--tom
Hans Thueminger
2013-11-21 16:02:33 UTC
Permalink
Post by Thomas Lohman
Post by Hans Thueminger
- heartbeat: enabling on the SD (60 seconds) and
net.ipv4.tcp_keepalive_time also set to 60
In glancing at your error (Connection reset by peer) and your config
files, I didn't see the Heartbeat Interval setting in all the places
Director definition for the server Director daemon.
in bacula-dir.conf I added the last line:
Director {
Name = sd1-dir
DIRport = 9101
QueryFile = "/usr/libexec/bacula/query.sql"
WorkingDirectory = "/var/spool/bacula"
PidDirectory = "/var/run"
Maximum Concurrent Jobs = 5
Password = "secret"
Messages = Daemon
* Heartbeat Interval = 60*
}

in the job-definition I added the last line:
Client {
Name = fs2-fd
Address = 192.168.72.84
FDPort = 9102
Catalog = MyCatalog
Password = "secret"
File Retention = 30 days
Job Retention = 6 months
AutoPrune = yes
* Heartbeat Interval = 60*
}
Post by Thomas Lohman
Storage definition for the server Storage daemon.
in bacula-sd.conf:
Storage {
Name = bacula-sd
SDPort = 9103
WorkingDirectory = "/var/spool/bacula"
Pid Directory = "/var/run"
Maximum Concurrent Jobs = 20
Heartbeat Interval = 60
}
Post by Thomas Lohman
FileDaemon definition for the Client File daemon
in bacula-fd.conf:
FileDaemon {
Name = sd1-fd
FDport = 9102
WorkingDirectory = /var/spool/bacula
Pid Directory = /var/run
Maximum Concurrent Jobs = 20
Heartbeat Interval = 60
}

Did I get it for this job?
Post by Thomas Lohman
That error typically means the network/socket connection between the
file daemon and the storage daemon was closed unexpectedly at one end
or by something in between blocking/dropping it.
That's what I thought, but wasn't sure about the coherences. The next
thing I wanted to do is to give a trail with disabled Windows Firewall.
Post by Thomas Lohman
I have also seen that error suddenly pop up on Windows clients for no
obvious reason but a reboot of the Windows box has fixed it.
Not the best view for a fileserver :-( I'm getting these errors for
month and rebooted the Windows server from time to time. Next time I
will reboot on Saturday (23rd of November), and after that I will start
the next big try with the new heartbeat settings above. If the error
still occurs I will try to disable the firewall...

Thank you very much!

Cheers
Hans
l***@kwsoft.de
2014-01-24 20:44:17 UTC
Permalink
Post by Hans Thueminger
Dear Bacula Users!
Post by Hans Thueminger
Dear Bacula Users!
I'm a newbie and have been trying to configure a Bacula backup
solution for our Institute for two months. I'm excited of this
Software (thanks to the developers for that great work) and I
absolutely want to use it :-) At the moment, unfortunately I can't
use it as I would like.
At the moment it looks very good :-)
Post by Hans Thueminger
We have to backup a Fileserver (Windows-Server-2012, Bacula client
5.2.10) with 400TB of Data with an Autochanger (IBM-TS3500) with
two Drives (TS1140). We operate a dedicated Backupserver (CentOS
6.4) with Bacula (5.2.13) which is connectetd with a dedicated 1
GBit Interface to the Fileserver. The two drives of the autochanger
are directly connected to a QLogic QLE2562 Dual Port Fibre Channel
HBA (on the backupserver). To be ensured, the drives are operated
in streaming mode, the backupserver has a 35TB spool area. There
3x120TB
4x40TB
changes of the Backupserver. Now for each Drive is a dedicated 1 TB
Spooldrive (two SSDs RAID 0) available .
Post by Hans Thueminger
The first thing we have to do is to split the 3x120TB in 6x60TB
since Microsoft does not support VSS for filesystems >(64TB-8GB)
not even with the latest release (Server-2012-R2). This is a not
documented "design" and we got this information from the Microsoft
support team. To re-emphasize this, in Windows-Server-2012 it's not
a problem to create and use a filesystem >64TB but it's not
possible to make a backup of any file of any size in such a
filesystem; not even with the in the operating system included
Windows-Server-Backup program :-(
With the trick to use a mount point, instead of a drive letter, it's
no problem to start a backup of the 3x120TB Filesystems (the mount
point is located on a smaller disk drive and with this disk drive
the VSS is not a problem). Of course there is no snapshot created
from the open files of the mounted filesystem. But better a backup
with some missing files instead of no backup!
Post by Hans Thueminger
Now I'm glad to make backups and restores of files of our 4x40TB
Filesystems with Bacula if they are not too big (< 5TB). That works
fine. If they are too big (> 15TB) I always get an error after
creating the second or third or sometimes subsequent spoolfile
(Error: lib/bsock.c...). Never for the first spoolfile! I've tried
several spool sizes (from 500GB to 16TB) and different network
settings. As attachment (bacula_mailing_list_some_error_logs.txt)
you can find some logs, when the error occurred. What I have also
- using different Networkinterfaces (at the moment an ethernet
cable is directly connected (no switch) between the fileserver and
the backupserver and this connection is used for the backups
(checked with netstat))
- heartbeat: enabling on the SD (60 seconds) and
net.ipv4.tcp_keepalive_time also set to 60
- AllowCompression Yes/No
- many, many hours for trials
configuration of the heartbeat and checked it with wireshark and now
I'm sure it works. What else did I do? After running a backup for 6
days I realized that there is watchdog counter which stopps the
backup after this time :-(
After that reconfiguration and a new compilation I started a Backup
of a 86 TByte filesystem and now I'm quite proud to tell you, that
it worked :-) After 21 days running I have this data on 14 tapes.
And I also restored some of the directories, everything worked fine.
It would be interesting to hear what speed you get for despooling to
tape with the TS1140 Drives? Maybe you should get a 10GB path between
your server and the backup server, but biggest "problem" is that
Windows support for concurrent jobs isn't that great, so your spooling
stops as long as your despooling to tape :-(

But fine that you finally got it running, the long lasting, data heavy
TCP connections used by Bacula reveal problems in corners where you
don't expect it anymore (NIC/Switches/Packet-Filters).

Regards

Andreas

Loading...