> It looks like we are backing up the correct data for RIS. It is only the
> "raw data".
>
> I had a quick look at the raw data, and I think we can save a *lot* of
> space. A comparison tells me that if we convert from gzip to bzip2, we
> would save something like 25% of the space. This is only 100 Gbyte or
> so, but nice to have.
I think it's a good idea. We can convert from gzip to bzip2. Few
applications that use rawdata need to be changed to use bzip2.
If you agree to use bzip2, I can proceed, and do the change.
I guess we need to announce our users.
Arife
>
> In fact, perhaps more generally we should look at our various compressed
> files and convert from gzip to bzip2. We already did this for the Whois
> database logs a few years ago, and saw a large savings.
>
> Andrei Robachevsky wrote:
>
> >Shane,
> >
> >FYI. This may help in the RIS backup design.
> >
> >Andrei
> >
> >-------- Original Message --------
> >Subject: backup bottlenecks and possible improvements
> >Date: Tue, 9 Aug 2005 16:23:59 +0200 (CEST)
> >From: gerard(a)ripe.net (Gerard Leurs)
> >To: andrei(a)ripe.net, gerard(a)ripe.net, ruud(a)ripe.net
> >
> >Andrei + Ruud.
> >
> >I promised to write something about our current backup.
> >
> >Pls have a read, and when there are questions/suggestions
> >I'm happy to answer them.
> >
> > Gerard.
> >
> >
> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >Backup system.
> >
> >Short description.
> >
> >We backup (or rather synchronise) unique filesystems from servers
> >on a frequent basis to a huge EMC diskvolume.
> >At least twice a day, but for some servers-filesystems up to every
> >half hour.
> >
> >We keep the backups for a period, so we can restore from online-backup.
> >The retentionperiod depends a little bit on the server, but in general
> >we have snapshots for the last 14 days + 1 week + 1 month. This provides
> >for 'online' restoring files of the last 7 weeks.
> >
> >This strategy proves to be sufficient for almost all cases we have
> >wrt restoring files, or installing and synchronisation of a replacement
> >server.
> >
> >When we purchased our current hardware we already anticipated possible
> >future growth. And possible addition of servers and services.
> >
> >
> >Current status.
> >
> >Although we reduced the amount of snapshots to keep drastically we
> >do not gain lots of diskspace. Most data on any filesystem does not
> >change over time. And only the changed data requires extra diskspace
> >on our backupserver.
> >
> >There is still enough EMC-diskspace for the serverbackup.
> >We can even add some new servers, without hassle.
> >But for the filer backups it is getting awkward.
> >
> > size used available
> >emc1 (serverbackup) 1.6 TB 1.4 TB 247 GB
> >emc2 (filers + RIS + DNSMon) 1.6 TB 1.5 TB 184 GB
> >
> >
> >We did not anticipate that the diskusage for the former NP-services
> >would have such an enormous impact on the EMC. Approx. 1 GB is solely
> >taken by the full backup for RIS, TTM and dnsmon.
> >
> >Some rough estimates of diskusage are:
> >For EMC2:
> >- filer2 189 GB
> > 80 GB homedirs
> > 46 GB groupdirs
> > 32 GB roledirs
> > 21 GB ticketDB
> >- filer3 183 GB
> > 95 GB datadirs
> > 39 GB web dirs
> > 12 GB FTP dirs
> >- weesp
> > 236, 111, 2 GB for RIS
> >- dolphin
> > 94, 102 GB for dnsmon
> >
> >For EMC1:
> >- ceiba
> > 34, 34, 34, 33, 11, 132 GB, all for TTM data
> >- too many other servers to report
> >
> >
> >Future plans (2006).
> >
> >- Replacing our current filers with more powerfull filer, with more
> > diskspac, increases demands for diskspace.
> >- Adding services on the TTM-cloud, and collecting this data on a
> > central server, also requires extra diskspace.
> >- Adding backup of laptops also requires extra diskspace.
> >- "Natural growth" of diskusage requires extra diskspace.
> >
> >
> >How can we be pro-active for possible diskspace problems?
> >
> >- Reduce amount of snapshots to the absolute minimum.
> > This will however not free half of the current EMC-diskspace.
> > But possibly only 10%.
> > And the number of snapshots is not that high anymore.
> > I think not the option, which solves our diskgreed for years.
> >
> >- Change backup strategy.
> > + Expire all files older than 1 year. And do not sync them onto the
> > EMC.
> > pro: saves diskspace
> > con: hard and time-consuming to restore a filesystem/server
> > from backups
> > hard to implement
> > will ot save time for the sync-operation
> > admin. overhead increased by factors
> > I would not consider this a good option.
> >
> > + Staff removes files older than x months/years.
> > pro: saves diskspace
> > con: time consuming for everyone
> > hard to convince everyone
> > will not work for dirs of websites, ftpsite,
> > ticketDB and lots of other dirs.
> > I would not consider this a good option.
> >
> > + Do not add new servers, filesystems and laptops to the backup
> > Obviously one of the worst options possible.
> >
> >- Buy an expansion diskarray, similar to the current diskarray.
> > This gives us 3 TB of extra diskspace.
> > pro: no administrative workload
> > no reduced quality of service
> > get dedicated EMC-filesystem for filers
> > be able to backup the new filers (larger diskspace)
> > anticipated for additional data of services on TBs
> > possibility to add more filesystems, and servers to backup
> > con: invest 17.150 euro
> > spend time on configuration (and on data-migration)
> >
> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >
> >
--
Arife Vural
SED, RIPE NCC