Только мне не поможет врач:
Я болею тобой, ты нужен мне…
Не могу без тебя, хоть плачь…
|You are viewing cheval_rouge's network page|
Create a Dreamwidth Account Learn More
What book frightened you as a young person?
None I can remember.
If you had to become a ‘living book’ (i.e. able to recite the contents of a book cover to cover upon request – reference Fahrenheit 451), what book would it be?
To Be of Use by Marge Piercy, poetry
What movie or TV show scared you as a kid?
The Outer Limits. I’d watch with my older sister and she told me when it was safe to lower my hands from my eyes.
What movie (scary or otherwise) will you never ever watch?
Silence of the lambs et seq
Do you have any phobias?
Centipedes, millipedes, and other Myriapodae make me recoil and squeal a little.
Postgres under the covers in simplified terms is one giant append only log. When you insert a new record that gets appended, but the same happens for deletes and updates. For a delete a record is just flagged as invisible, it’s not actually removed from disk immediately. For updates the old record is flagged as invisible and a new record is written. What then later happens is Postgres comes through and looks at all records that are invisible and actually frees up the disk storage. This process is known as vacuum.
There are a couple of key levels to
VACUUM within Postgres:
VACUUM ANALYZE- This one is commonly run when you’ve recently loaded data into your database and want Postgres to update it’s statistics about the data
VACUUM FULL- This will take a lock during the operation, but will scan the full table and reclaim all the space it can from dead tuples.
Postgres also has a mechanism for regularly freeing up unused space known as
autovacuum. Autovacuum we’ve previously discussed in an earlier post. We do highly recommend you tune autovacuum appropriately to prevent your bloat ballooning. But a good question remains: Do I have bloat on my Postgres database?
This query is useful to periodically monitor your bloat. How often you should check is dependent on your application, but every month or two is a reasonable timeframe.
WITH constants AS ( -- define some constants for sizes of things -- for reference down the query and easy maintenance SELECT current_setting('block_size')::numeric AS bs, 23 AS hdr, 8 AS ma ), no_stats AS ( -- screen out table who have attributes -- which dont have stats, such as JSON SELECT table_schema, table_name, n_live_tup::numeric as est_rows, pg_table_size(relid)::numeric as table_size FROM information_schema.columns JOIN pg_stat_user_tables as psut ON table_schema = psut.schemaname AND table_name = psut.relname LEFT OUTER JOIN pg_stats ON table_schema = pg_stats.schemaname AND table_name = pg_stats.tablename AND column_name = attname WHERE attname IS NULL AND table_schema NOT IN ('pg_catalog', 'information_schema') GROUP BY table_schema, table_name, relid, n_live_tup ), null_headers AS ( -- calculate null header sizes -- omitting tables which dont have complete stats -- and attributes which aren't visible SELECT hdr+1+(sum(case when null_frac <> 0 THEN 1 else 0 END)/8) as nullhdr, SUM((1-null_frac)*avg_width) as datawidth, MAX(null_frac) as maxfracsum, schemaname, tablename, hdr, ma, bs FROM pg_stats CROSS JOIN constants LEFT OUTER JOIN no_stats ON schemaname = no_stats.table_schema AND tablename = no_stats.table_name WHERE schemaname NOT IN ('pg_catalog', 'information_schema') AND no_stats.table_name IS NULL AND EXISTS ( SELECT 1 FROM information_schema.columns WHERE schemaname = columns.table_schema AND tablename = columns.table_name ) GROUP BY schemaname, tablename, hdr, ma, bs ), data_headers AS ( -- estimate header and row size SELECT ma, bs, hdr, schemaname, tablename, (datawidth+(hdr+ma-(case when hdr%ma=0 THEN ma ELSE hdr%ma END)))::numeric AS datahdr, (maxfracsum*(nullhdr+ma-(case when nullhdr%ma=0 THEN ma ELSE nullhdr%ma END))) AS nullhdr2 FROM null_headers ), table_estimates AS ( -- make estimates of how large the table should be -- based on row and page size SELECT schemaname, tablename, bs, reltuples::numeric as est_rows, relpages * bs as table_bytes, CEIL((reltuples* (datahdr + nullhdr2 + 4 + ma - (CASE WHEN datahdr%ma=0 THEN ma ELSE datahdr%ma END) )/(bs-20))) * bs AS expected_bytes, reltoastrelid FROM data_headers JOIN pg_class ON tablename = relname JOIN pg_namespace ON relnamespace = pg_namespace.oid AND schemaname = nspname WHERE pg_class.relkind = 'r' ), estimates_with_toast AS ( -- add in estimated TOAST table sizes -- estimate based on 4 toast tuples per page because we dont have -- anything better. also append the no_data tables SELECT schemaname, tablename, TRUE as can_estimate, est_rows, table_bytes + ( coalesce(toast.relpages, 0) * bs ) as table_bytes, expected_bytes + ( ceil( coalesce(toast.reltuples, 0) / 4 ) * bs ) as expected_bytes FROM table_estimates LEFT OUTER JOIN pg_class as toast ON table_estimates.reltoastrelid = toast.oid AND toast.relkind = 't' ), table_estimates_plus AS ( -- add some extra metadata to the table data -- and calculations to be reused -- including whether we cant estimate it -- or whether we think it might be compressed SELECT current_database() as databasename, schemaname, tablename, can_estimate, est_rows, CASE WHEN table_bytes > 0 THEN table_bytes::NUMERIC ELSE NULL::NUMERIC END AS table_bytes, CASE WHEN expected_bytes > 0 THEN expected_bytes::NUMERIC ELSE NULL::NUMERIC END AS expected_bytes, CASE WHEN expected_bytes > 0 AND table_bytes > 0 AND expected_bytes <= table_bytes THEN (table_bytes - expected_bytes)::NUMERIC ELSE 0::NUMERIC END AS bloat_bytes FROM estimates_with_toast UNION ALL SELECT current_database() as databasename, table_schema, table_name, FALSE, est_rows, table_size, NULL::NUMERIC, NULL::NUMERIC FROM no_stats ), bloat_data AS ( -- do final math calculations and formatting select current_database() as databasename, schemaname, tablename, can_estimate, table_bytes, round(table_bytes/(1024^2)::NUMERIC,3) as table_mb, expected_bytes, round(expected_bytes/(1024^2)::NUMERIC,3) as expected_mb, round(bloat_bytes*100/table_bytes) as pct_bloat, round(bloat_bytes/(1024::NUMERIC^2),2) as mb_bloat, table_bytes, expected_bytes, est_rows FROM table_estimates_plus ) -- filter output for bloated tables SELECT databasename, schemaname, tablename, can_estimate, est_rows, pct_bloat, mb_bloat, table_mb FROM bloat_data -- this where clause defines which tables actually appear -- in the bloat chart -- example below filters for tables which are either 50% -- bloated and more than 20mb in size, or more than 25% -- bloated and more than 1GB in size WHERE ( pct_bloat >= 50 AND mb_bloat >= 20 ) OR ( pct_bloat >= 25 AND mb_bloat >= 1000 ) ORDER BY pct_bloat DESC;
The above query will give you something that looks like this:
databasename | schemaname | tablename | can_estimate | est_rows | pct_bloat | mb_bloat | table_mb ---------------+------------+---------------+--------------+----------+-----------+----------+---------- djsdcv27d8mqq | public | github_events | t | 1820360 | 29 | 983.65 | 3397.547 djsdcv27d8mqq | public | github_users | t | 952920 | 28 | 475.80 | 1710.852 (2 rows)
The amount of bloat you have is relative to the size of your database. If you’re running a smaller database with only 20 GB of data total this amount may be noticable and useful to clean up up, but you also may be able to skate by for a while. If you are at several hundred GB of data stored within have bloat that is 100 GB or more and a significant percentage of your database in that case it can be really worth cleaning up.
If you do find yourself with significant bloat you have a few options:
Fans of Rapid Application Development (RAD!) are probably already familiar with Docker, but what does that have to do with Postgres? Database-driven applications are a dime a dozen these days, and a good RAD environment is something of a Holy Grail to coders and QA departments alike. Docker lets us spin up a Postgres instance in seconds, and discard it with a clean conscience.
There have even been some noises within certain circles about using it in a production context. Can we do something like that responsibly? Docker containers are practically guaranteed to be ephemeral, while production data most decidedly isn’t. The answer to this is ultimately complex, and something we’ll be exploring over the next several weeks.
Let’s get started.
Since Docker itself is a commercial product, getting the latest community packages (Docker CE) means making a free account on their website and downloading the packages for your preferred platform. For slightly older versions, RHEL and CentOS variants can install EPEL, where and the
docker-io package resides. Debian variants such as Ubuntu can simply install
Once we have Docker itself, we can actually start playing with it right away. We should start by grabbing the latest version of Postgres from the official Docker Hub image:
$> docker pull postgres:latest latest: Pulling from library/postgres 3e17c6eae66c: Pull complete 3d89ae9a47e4: Pull complete f7726fda7efe: Pull complete d1838499bd8f: Pull complete a5ec5aa60735: Pull complete 1571d7170291: Pull complete 0d6e41e13732: Pull complete 787e3c45a9bb: Pull complete 7b234cf83b22: Pull complete 3a8ad2440289: Pull complete 9351993374c0: Pull complete a8f3575e09a1: Pull complete a4c4b2ff0c3a: Pull complete Digest: sha256:73a1c4e98fb961bb4a5c55ad6428470a3303bd3966abc442fe937814f6bbc002 Status: Downloaded newer image for postgres:latest $> docker images REPOSITORY TAG IMAGE ID CREATED SIZE postgres latest b106d5a0dc75 22 hours ago 287.2 MB
Since Docker images can build on top of each other, we can see that Postgres is comprised of a long chain of dependencies. Despite that, the image itself isn’t very large; 287MB for an entire operating system and database server is pretty good. There’s even smaller versions if we are willing to forgo certain UNIX tools. The ‘postgres:10-alpine’ image for instance, is just under 40MB.
Just having an image is pretty pointless, of course. Let’s start a container and see what’s inside. This is done with the
docker run command. We’ll pass it a few parameters to make things easier to use later as well. Setting the name, for example, lets us manipulate the container later without having to use an ugly MD5 hash.
Since Docker containers run on a separate network layer, we want to publish the default Postgres
5432 port to
5555 to prevent conflicts with existing applications on the host system. And finally, we want to detach from the terminal so we can use it for other things, and that’s also probably how we’ll be using the container in any reasonable configuration.
$> docker run --name pg-test -p 5555:5432 -d postgres 69f21490ff415e485501cd9c3ed04336d7c4147f77025b7de8dbf565b11fdf52 $> docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 69f21490ff41 postgres "docker-entrypoint.sh" 40 seconds ago Up 39 seconds 0.0.0.0:5555->5432/tcp pg-test $> psql -p 5555 -U postgres -h localhost Timing is on. Pager usage is off. psql (10.0) Type "help" for help. postgres=# SELECT version(); version -------------------------------------------------------------------------------------------------- PostgreSQL 10.0 on x86_64-pc-linux-gnu, compiled by gcc (Debian 6.3.0-18) 6.3.0 20170516, 64-bit (1 row)
Run and done. We can connect to the container on port
5555 as expected, we have the latest Postgres 10, and Postgres confirms its full version and build environment. Let’s try to make some tables.
We’ve used the
sensor_log table a lot in the past. A good reason for that is due to the sheer size of the data we normally insert. There’s no reason to stop now, is there? Let’s keep our connection and do a couple of basic timings:
\timing on CREATE TABLE sensor_log ( sensor_log_id SERIAL PRIMARY KEY, location VARCHAR NOT NULL, reading BIGINT NOT NULL, reading_date TIMESTAMP NOT NULL ); Time: 7.945 ms INSERT INTO sensor_log (location, reading, reading_date) SELECT s.id % 1000, round(random() * 100), CURRENT_DATE + INTERVAL '1d' - ((s.id * 10)::TEXT || 's')::INTERVAL FROM generate_series(1, 5000000) s(id); Time: 30981.570 ms (00:30.982) CREATE INDEX idx_sensor_log_location ON sensor_log (location); Time: 9339.747 ms (00:09.340) CREATE INDEX idx_sensor_log_date ON sensor_log (reading_date); Time: 3689.406 ms (00:03.689) ANALYZE sensor_log; Time: 247.532 ms SELECT count(*) FROM sensor_log; count --------- 5000000 Time: 518.908 ms SELECT count(*) FROM sensor_log WHERE location = '50'; count ------- 5000 Time: 8.243 ms SELECT count(*) FROM sensor_log WHERE reading_date BETWEEN '2016-01-01' AND '2016-06-01'; count -------- 680001 (1 row) Time: 139.237 ms
One weakness of this approach is that the Postgres data exists entirely within this single container. Since containers are intended to be temporary, that’s not great if we want to use the container as a somewhat bulky linked binary. We want to be able to kill—and even erase—our container in some cases, yet retain the data.
How do we manage that?
One way to do this with Docker is volumes. We can build a volume and start another container that uses it like this:
$> docker volume create --name pgdata $> docker run --name pg-mount -p 5600:5432 -d \ --volume pgdata:/var/lib/postgresql/data postgres
We created a
pgdata volume, then used told Docker that the volume should be mounted at
/var/lib/postgresql/data. This is the default location used by the maintainers of the official image, so a new container will use the external volume instead of the built-in storage.
Let’s connect to this new container, make a very small proof-of-concept table, then stop and erase the container. Then we can create a new container that uses the same volume, and see if the table is still there.
$> psql -p 5600 -U postgres -h localhost \ -c "CREATE TABLE foo (id SERIAL);" $> docker stop pg-mount $> docker rm pg-mount $> docker run --name pg-new-mount -p 5600:5432 -d \ --volume pgdata:/var/lib/postgresql/data postgres psql -p 5600 -U postgres -h localhost -c "\d foo" Table "public.foo" Column | Type | Collation | Nullable | Default --------+---------+-----------+----------+--------------------------------- id | integer | | not null | nextval('foo_id_seq'::regclass)
Nice, eh? Now we can test various container variants, or image builds, against the same data volume.
But what about performance? Docker containers utilize Copy on Write (CoW) on top of a proprietary filesystem. Is it possible that this abstraction layer could be interfering with our desired use of the hardware?
One of the handy things Docker can do is mount locations on the host machine. Let’s start another image by mapping to a drive on the Docker host, and see if the timings change at all.
We start by launching a new Docker container with the desired mount. The cool thing about the
--volume flag is that it will create the directory for us if it doesn’t already exist. It’ll even give it the correct permissions, since Docker itself runs as a management daemon under the
root user, and not the
postgres user we’re probably accustomed to.
$> docker run --name pg-local -p 5700:5432 -d \ --volume /db/docker:/var/lib/postgresql/data postgres $> ls -l /db drwx------ 19 vboxadd root 4096 Oct 13 17:09 docker
Now let’s connect to post
5700 and repeat our
\timing on CREATE TABLE sensor_log ( sensor_log_id SERIAL PRIMARY KEY, location VARCHAR NOT NULL, reading BIGINT NOT NULL, reading_date TIMESTAMP NOT NULL ); Time: 7.977 ms INSERT INTO sensor_log (location, reading, reading_date) SELECT s.id % 1000, round(random() * 100), CURRENT_DATE + INTERVAL '1d' - ((s.id * 10)::TEXT || 's')::INTERVAL FROM generate_series(1, 5000000) s(id); Time: 30180.467 ms (00:30.180) CREATE INDEX idx_sensor_log_location ON sensor_log (location); Time: 9279.165 ms (00:09.279) CREATE INDEX idx_sensor_log_date ON sensor_log (reading_date); Time: 3268.765 ms (00:03.269) ANALYZE sensor_log; Time: 205.313 ms SELECT count(*) FROM sensor_log; count --------- 5000000 Time: 483.312 ms SELECT count(*) FROM sensor_log WHERE location = '50'; count ------- 5000 Time: 7.234 ms SELECT count(*) FROM sensor_log WHERE reading_date BETWEEN '2016-01-01' AND '2016-06-01'; count -------- 680001 (1 row) Time: 141.441 ms
It turns out the filesystem mapping is comparable to Docker’s own filesystem, as the differences are statistically insignificant. Of course, now instead of using its own internal filesystem layer, Docker must maintain a mapping to the local filesystem, so it’s hard to say what’s actually going on behind the scenes. We’re also operating in a relatively puny virtual environment with its own storage performance limitations.
Still, a commendable result, as it leaves us the option use standard filesystem paths. LVM? Sure. XFS? Why not. An external SAN device comprised entirely of SSDs? Of course. There are interesting implications here.
So are containers really just a bloated linked Postgres binary, since we could have just managed a local directory with a standard Postgres install? Yes and no. We’ll definitely explore more of these concepts in later articles, but it’s important to note that containers are meant for chaining and automation.
A standard Postgres install has a certain amount of administration involved. Maybe it’s just the configuration file, or user creation, or package maintenance. There is a lot to building a Postgres server. Since containers themselves are a tiny Postgres server, we don’t really need to do any of that in simplified contexts.
Did the official Postgres package see an update? We can just update to the latest image, stop the existing container, and start a new one. The old container is still there, so reverting is trivial.
We can make our own image with contrib, and plpython, and pglogical, and launch that anywhere and everywhere. We can hand it to developers that don’t know anything about Postgres, and they can have their own database server that mirrors the production environment. And we can do it without an army of resource-intensive and difficult to provision VMs.
This is just the very tiniest tip of the proverbial iceberg. We’ll continue to build our toolkit in future articles, and maybe end up with something that isn’t arguably worse than a standard server environment, just different.