Welcome to the dBforums forums.

You are currently viewing our boards as a guest which gives you limited access to view most discussions, articles and access our other FREE features. By joining our free community you will have access to post topics, communicate privately with other members (PM), respond to polls, upload your own photos and access many other special features. Registration is fast, simple and absolutely free so please, join our community today!

If you have any problems with the registration process or your account login, please contact contact support.

If you prefer not to see double-underlined words and corresponding ads, place your cursor
here for ContentLink opt out.

Go Back  dBforums > Database Server Software > PostgreSQL > Larg(ish)-scale pgsql?

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 06-23-03, 21:50
Flatlander Flatlander is offline
Registered User
 
Join Date: Jun 2003
Posts: 4
Larg(ish)-scale pgsql?

Anybody have any experience deploying PostgreSQL in a fairly sizable (but not enormous) data warehousing application - say half a terabyte or so?

Just wondering if it's remotely possible to scale pgsql to that sort of degree, - or if I'm dreaming, and should go back to the world of high-priced proprietary DBMSs.

Any personal stories, links to case studies, etc. helpful.
Reply With Quote
  #2 (permalink)  
Old 06-26-03, 17:47
eperich eperich is offline
Registered User
 
Join Date: Sep 2001
Location: Vienna
Posts: 398
data warehousing

what do you want to hear?

we have been in greece to train there some person in postgresql
they have about 1 billion data every day (loggiong data from a telco)

Second german data warehousing with blobs
2 Terrabyte

the only problem with such enormous data is that you have enogh space on the hardware

this is the only issue you have
__________________
http://www.postgresql.org
--Postgresql is the only kind of thing--
Reply With Quote
  #3 (permalink)  
Old 06-27-03, 10:43
Flatlander Flatlander is offline
Registered User
 
Join Date: Jun 2003
Posts: 4
Thanks for the reply, it's encouraging to know that others have had some success with larger pgsql databases - in addition to this example, I've come across one or two other similar instances in the PostgreSQL mailing list archives.

Still, the overwhelming majority of pgsql deployments seem to be quite a bit smaller than what I am considering.

I'm sorry that perhaps I didn't spell out my question explicitly enough, but I'm concerned about PostgreSQL's ability to handle to usual sticking points in warehousing applications: the effeciency of bulk load and backup/restore operations, and query performance against very large, denormalized datasets.
Reply With Quote
  #4 (permalink)  
Old 06-29-03, 08:06
eperich eperich is offline
Registered User
 
Join Date: Sep 2001
Location: Vienna
Posts: 398
data warehousing

ok let's see what I can do

maybe you take a look at

EFEU

this is an OLAP like tool you can use with postgres.
It's gpl.

We will work on an real OLAP Application in the near future.

We were working with the austrian socaial data and I think in version 7.4 the performance is increased dramatically for such things as querys on denormalized datasets

Bulk load you can handler with the COPY command
The performacnde depends on hardware and so on
but I think it will be better you test the copy command with generated data and that meet your needs and then test it

Backup/restoring
use pg_dump -> consistent backup (transaction mode is serializable)
you can make a binary backup of your database.

if you want a commercial solution netvault made recently a postgresql backup solution.

hope that halps a bit
__________________
http://www.postgresql.org
--Postgresql is the only kind of thing--
Reply With Quote
  #5 (permalink)  
Old 07-13-03, 14:49
LayOut LayOut is offline
Registered User
 
Join Date: Mar 2003
Posts: 5
Unhappy

I'm having some issues with a very large postgres db on Linux. Several tables with over 10 million records. One of these tables is difficult to vacuum - requires several hours and that i stop some critical processes. If I don't vacuum for more than a day, there is easily over a million deletions. Right now I think the server its running on is a dual 1.8G processor with 2G ram... We're in the process of upgrading, but seems like its a hardware limitation problem we're having. Any thoughts?
Reply With Quote
  #6 (permalink)  
Old 07-13-03, 15:12
eperich eperich is offline
Registered User
 
Join Date: Sep 2001
Location: Vienna
Posts: 398
what version

what version of postgres are you using?
__________________
http://www.postgresql.org
--Postgresql is the only kind of thing--
Reply With Quote
  #7 (permalink)  
Old 07-13-03, 15:46
LayOut LayOut is offline
Registered User
 
Join Date: Mar 2003
Posts: 5
7.3

and some other possible info...
seems that when the database is almost done with a very large query or vacuum process, the load spikes on the database machine.

Last edited by LayOut : 07-14-03 at 01:27.
Reply With Quote
  #8 (permalink)  
Old 07-15-03, 11:32
LayOut LayOut is offline
Registered User
 
Join Date: Mar 2003
Posts: 5
I completed the vacuum on that heavily used table last night -
took 4 hours.

I have this monster and another that takes 3 hours
and one that takes over 90 min.

I guess it's all about scheduling for me now. I have a process which queries the db and runs for 2 to 5 hours depending on system load. The initial query for this process runs about 20-30 min. It needs to run constantly during the day, and if the user load is low (night time) I'll run multiples of this to maximize my output.

It seems as though I have hit the ceiling for this system.
Any other experiences you all may have had would be good to hear about.

Thanks.
LayOut
Reply With Quote
  #9 (permalink)  
Old 07-15-03, 11:39
eperich eperich is offline
Registered User
 
Join Date: Sep 2001
Location: Vienna
Posts: 398
questions

A few questions about this

how big is the database?

how much rows are in the problem zone?

what does this process?
(select,insert,copy update)
__________________
http://www.postgresql.org
--Postgresql is the only kind of thing--
Reply With Quote
  #10 (permalink)  
Old 07-16-03, 13:17
LayOut LayOut is offline
Registered User
 
Join Date: Mar 2003
Posts: 5
The database is approximately 60 Gig

The table to be vacuumed has 8.7 Million rows, 7 cols

The process that runs does a SELECT from this table and takes about 20-30min. As it loops through the records, it will either update or delete the record (modifying about 200K records per 3-5 hour run). This process runs constant through the day.

One more daily process also modifies that table, adding roughly 1 Million records.

It is possible that the size of the database could be reduced in the future.
Reply With Quote
  #11 (permalink)  
Old 07-16-03, 13:24
eperich eperich is offline
Registered User
 
Join Date: Sep 2001
Location: Vienna
Posts: 398
problem

how do you insert the 1 millions rows daily

INSERT OR COPY???

If you do it with insert you will have to vacuum the database after this step

this is the first thing

the process runs all day long and is updateing deleting rows

that explains why the vacuum takes so long
there is much transaction overhead then

maybe your only chance is a automatic vacuum behind but thats not implemented yet

I thinnk there is a vacuuming daemon on gborg.postgresql.org
take a look at that maybe it helps
__________________
http://www.postgresql.org
--Postgresql is the only kind of thing--
Reply With Quote
  #12 (permalink)  
Old 07-16-03, 13:59
LayOut LayOut is offline
Registered User
 
Join Date: Mar 2003
Posts: 5
The 1 million updates are inserts. I've thought about vacuuming after every process run, but seems to be more time efficient to do it once a day.

I couldn't find the vacuum daemon you spoke of at gborg.

Would you say this is a limitation of postgres compared to one of the alternative expensive databases?

Are they working on an auto vacuum now? Is there discussion on when it may be accomplished?
Reply With Quote
  #13 (permalink)  
Old 07-16-03, 16:02
eperich eperich is offline
Registered User
 
Join Date: Sep 2001
Location: Vienna
Posts: 398
vacuum

you have huge inserts in the database
and that all time long
what you can do is to switch to another transactino mode

i think this is a 24/7 system.

in the 7.4 version there is something called read transactions
only for selects

and try to switch to serializable mode.
it should be faster then but I don't know what are you doing excatly with the transactions



you can also try
http://gborg.postgresql.org/project/...rojdisplay.php

this is what i meant
http://gborg.postgresql.org/project/...rojdisplay.php


there will be an autovacuuming in the next releases
maybe 7.5

but don't count on that

use this vacuuming daemon which is i think table based
__________________
http://www.postgresql.org
--Postgresql is the only kind of thing--
Reply With Quote
  #14 (permalink)  
Old 07-17-03, 08:38
eperich eperich is offline
Registered User
 
Join Date: Sep 2001
Location: Vienna
Posts: 398
thats just come in

maybe helpful to you:

ANNOUNCEMENT: Availability of TelegraphCQ v0.2 (BETA release)
-------------------------------------------------------------

The Telegraph Team at UC Berkeley is pleased to announce the immediate
availability of TelegraphCQ v0.2 (a BETA release). TelegraphCQ is a
system for processing long-running continuous queries over data
streams. TelegraphCQ is implemented as a significant modification to
the PostgreSQL open-source database system. Specifically, TelegraphCQ
v0.2 is based on the PostregreSQL 7.3.2 code-base.

For more information on the Telegraph project please visit:

http://telegraph.cs.berkeley.edu

To download TelegraphCQ and for more information, please visit:

http://telegraph.cs.berkeley.edu/telegraphcq

Here is the README.TelegraphCQ from the source distribution:

----------------------------------------------------------------------
This file contains information on TelegraphCQ 0.2 (a Beta releae).

What's new (since TelegraphCQ 0.1)
---------------------------------

- Better stability: Lots of bugs have been squashed

- CSV Wrapper: An easy to use wrapper that accepts simple comma
separated text as input and supports all PostgreSQL and user-defined
types. There is now no need to write C code to develop a new wrapper
if your data source produces textual data.

- Pull Wrappers: If you want to pull (lookup) data from external
sources (such as web pages) based on query predicates, you can use the
new infrastructure for the Pull Wrapper.

- Introspective Query Processing: The dynamic catalog is a new feature
in TelegraphCQ. It lets users query three special purpose internal
data streams to get a continuous view of how the TelegraphCQ system is
behaving.

- Applications: Included with this distribution are a few applications
that we demonstrated at the ACM FCRC/SIGMOD 2003 Conference in San
Diego. We have integrated data from CalTrans (California Department of
Transportation), CHP (California Highway Patrol) and a local network
interface (using the tcpdump utility). We have also included graphical
visualizers that run some interesting queries over data streams from
these sources.

Request
-------

If you are using TelegraphCQ for any project (research or otherwise),
we'd like to know !

- If you are making any performance measurements of TelegraphCQ and
are publishing it or disseminating it in some way, please do let us
know.

- We welcome bug fixes and code contributions of any kind. Please
send all patches to tcqfeedback@db.cs.berkeley.edu

Caveat Emptor
-------------

The current version of TelegraphCQ (0.2) is still a Beta release and
is very much research code. We haven't optimized it for
performance. Also, you will need to create a fresh database to use it
as it will not work against an existing PostgreSQL database. You are
free to use pgdump to export and then import your data from an
existing PostgreSQL database.

- Archived streams: Note that although TelegraphCQ 0.2 supports
archived streams, we currently do not support access of historical
data. This will be fixed in our next release.

For more information
--------------------

Getting Started:

A TelegraphCQ Getting Started guide can be found in
docs/TelegraphCQ/gettingstarted.html

The Getting Started guide also has links to other information such
as DDL/DML syntax, examples etc. Note that the documentation
installed in your TelegraphCQ installation directory contains
documentation for the new syntax changes (CREATE/ALTER/DROP STREAM,
the WINDOW clause in DML) integrated into the existing PostgreSQL
documentation.

The Telegraph Project:

http://telegraph.cs.berkeley.edu

Supported Platforms
-------------------

TelegraphCQ is supported on Linux x86 (various distributions including
Red Hat Linux 8,9 and Debian) and MacOS X 10.2, the platforms on which
it was developed. In addition, it has been minimally tested on the
SPARC Solaris platform.

In general, TelegraphCQ should run anywhere PostgreSQL runs. However,
there are significant differences in how we use shared memory and
synchronization primitives such as semaphores. This is likely to cause
some hiccups on platforms where we have not tested it yet. We welcome
patches that help in porting TelegraphCQ.


Please send all comments (and patches) to tcqfeedback@db.cs.berkeley.edu

----------------------------------------------------------------------
__________________
http://www.postgresql.org
--Postgresql is the only kind of thing--
Reply With Quote
Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On