Re: [Fwd: backup bottlenecks and possible improvements]

Dear all..
space. A comparison tells me that if we convert from gzip to bzip2, we would save something like 25% of the space. This is only 100 Gbyte or so, but nice to have.
Nice idea, but saving only 100GB is not much on a 1.6TB filesystem. And it is not a real solution to the diskspace-problem I was talking about. One extra concern: if you do so we will have 300GB of gzipped files on the backupserver, and 200GB of bzipped2 files. And as a result we do not have any space anymore on the backupserver. And that is exactly what I/ops is trying to solve. Pls do not do this on your own and consult ops before actually planning this! Gerard.

space. A comparison tells me that if we convert from gzip to bzip2, we would save something like 25% of the space. This is only 100 Gbyte or so, but nice to have.
Nice idea, but saving only 100GB is not much on a 1.6TB filesystem. And it is not a real solution to the diskspace-problem I was talking about.
One extra concern: if you do so we will have 300GB of gzipped files on the backupserver, and 200GB of bzipped2 files. And as a result we do not have any space anymore on the backupserver. And that is exactly what I/ops is trying to solve. Pls do not do this on your own and consult ops before actually planning this!
sure, I will coordinate with you guys. I do not want to mess your backup schedule. arife
Gerard.
-- Arife Vural SED, RIPE NCC

[Sorry to be repetitive at the frequency of 1/year] This is experimental data which is immutable once aquired; read: it does not change anyore, ever. It is also not read very often and less so the older it gets. It also needs no high-speed access, we have copies of it in databases for that. The costs of keeping it around are roughly: equipment + ops time. It appears that particular equipment (netapps filer), which is easy to operate, becomes too expensive. It might be a good idea to invest a little ops time in a cheap storage boxes for stuff that does not need filer-quality storage. ATA disks are available at about half a Euro per Gigabute in units of 300GB. Thus it is possible to put up about a Terabyte of storage in any simple Wintel box. Slightly more without redundancy, slightly less with software RAID. Any (old) Wintel box will be fine. The equipment cost will be negligible: EUR600/terabyte if you use old wintel boxes, otherwise add the cost of the simplest wintel box. When building a couple of them and operating them all the same, the ops cost will not be too high. One does not even need RAID. Just build two of them and have a cron job rsyncing between them for full hot redundancy. Name them cheapfiler-1 and cheapfiler-1-copy, make the copy read-only to users. Make as many as we need. Spread them around for physical redundancy. Not rocket science. What's the problem? Daniel

On 07.09 09:05, Daniel Karrenberg wrote:
Not rocket science. What's the problem?
Forgot to say: I speak from experience. This is how I do file storage at home: /dev/hda1 7,4G 5,3G 1,8G 76% / /dev/hdc1 150G 111G 39G 75% /silo1 /dev/hdd4 145G 106G 40G 73% /silo2 /dev/hda3 67G 24G 43G 37% /silo3 /dev/hdb3 67G 53G 15G 79% /silo4 /dev/hdb1 7,4G 5,2G 2,0G 73% /root2 Whenever the family has collected too much, I just replace the oldest disk with the newest for more space. Important data is rsynced between spindles. More important data is rsynced to a different box. All this runs on an old PII 350MHz with 256MB which also is print server, VoIP phone switch, router, NAT and firewall. The key is to use a good case with extra fans that keeps eveerything cool including the disks: Sep 7 09:23:08 houser hddtemp[4332]: /dev/hdc: SAMSUNG SV1604N: 33 C Sep 7 09:23:08 houser hddtemp[4332]: /dev/hdd: SAMSUNG SP1604N: 35 C I have never had one break before I swapped it because it became got too small. Daniel

Daniel Karrenberg wrote:
[Sorry to be repetitive at the frequency of 1/year]
This is experimental data which is immutable once aquired; read: it does not change anyore, ever. It is also not read very often and less so the older it gets. It also needs no high-speed access, we have copies of it in databases for that.
The costs of keeping it around are roughly: equipment + ops time.
It appears that particular equipment (netapps filer), which is easy to operate, becomes too expensive. It might be a good idea to invest a little ops time in a cheap storage boxes for stuff that does not need filer-quality storage.
ATA disks are available at about half a Euro per Gigabute in units of 300GB. Thus it is possible to put up about a Terabyte of storage in any simple Wintel box. Slightly more without redundancy, slightly less with software RAID. Any (old) Wintel box will be fine. The equipment cost will be negligible: EUR600/terabyte if you use old wintel boxes, otherwise add the cost of the simplest wintel box. When building a couple of them and operating them all the same, the ops cost will not be too high. One does not even need RAID. Just build two of them and have a cron job rsyncing between them for full hot redundancy. Name them cheapfiler-1 and cheapfiler-1-copy, make the copy read-only to users. Make as many as we need. Spread them around for physical redundancy.
Not rocket science. What's the problem?
I more-or-less agree. For 12000 Euros you can put 11 TB in a rack-mounted box (prices from alternate.nl, a month or so ago): Procase Procase C4EE Case, 24x3.5" drives, 950 Watt PSU 1 EUR 2.499,00 EUR 2.499,00 Hitachi Deskstar 7K500 SATA hard disk, 500 GB, 8,5 ms, 16 MB, 7200 RPM 22 EUR 359,00 EUR 7.898,00 Promise SATA II 150 SX8 SATA controller, PCI-X 64-bit 133 MHz, 8xSATA, 150 MB/s 2 EUR 189,00 EUR 378,00 Tyan Tyan Thunder K8S Pro (S2882G3NR) 2x Opteron, PCI-X, 8xDDR-SDRAM, ATI Rage XL, 2x1GHz LAN 1 EUR 459,00 EUR 459,00 AMD Opteron 244 1.8 GHz CPU 2 EUR 199,00 EUR 398,00 Kingston Kingston DIMM 2 GB 333 MHz DDR, 2 GB (PC2700) 1 EUR 549,00 EUR 549,00 EUR 12.181,00 The bottleneck here will probably be the dual Gigabit Ethernet controllers, rather than the disks, if we are using it for backup. This is the *worst-case* cost, mind you. If we were to build a system to backup RIS raw data, it could be as simple as putting a USB drive on my desk next to someone's workstation (I vote for Arife, since her workstation is in the corner and not likely to get bumped by someone walking by) - total cost, 359 Euros: http://www.alternate.nl/html/shop/productDetails.html?artno=A9UE03& Which is a bit extreme, but it shows what we could do if we decided to optimise for cost/performance instead of risk- and work-aversion. -- Shane p.s. With 11 TB I could achieve my dream of putting all of RIS data on-line. :)
participants (4)
-
arife@ripe.net
-
Daniel Karrenberg
-
Gerard Leurs
-
Shane Kerr