psql.md

# `psql`: **PostgreSQL**'s shell client
------------------------------------------------------------------------------
### **§** General Notes

* We will most often abbreviate “PostgreSQL” as “Postgres,”
  simply because it is easier to say and read.  
  ([**Postgres** _vs_ **PostgreSQL** — _What is the difference?_ | **TablePlus**][PGres])
* “PRISM” is the nickname of EECS's network and collection of computers.
* The class's **Postgres(ql)** server
	* runs on _host_ machine `db.eecs.yorku.ca`
	  and
	* is version `11.1`.
* All machines on the PRISM LAN (_local area network_) are hidden
  behind a firewall.
  You must go through the _gateway_ machine
  `red.eecs.yorku.ca`
  to access any of them.
* **\*nix** is the general term for any _operating system_
  based in principle on **unix**.
  This covers, of course,
  any of the variants of **unix**,
  **linux**,
  and **macOS**
  (for which the core is a **unix** _kernel_ called **darwin**).

[PGres]: https://tableplus.com/blog/2018/10/postgres-vs-postgresql.html (Postgres vs PostgreSQL - What is the difference? | TablePlus)

------------------------------------------------------------------------------
### **§** How to “talk” to a database

You need a _program_ to set up a connection to the _database server_
—
and specify the _database_
that the server manages
to which you want to talk
—
to be able to “interact” with the data
(in that database that that database server is managing).
As we shall discover in the course,
the _only_ way to interact with a _relational_ database
is via the “query language”.
(This is by design!
And we shall be learning why.)
Nowadays
for relational databases,
the standard _query language_ is **SQL**.

Later in the course,
we learn how to write programs
(e.g., in **Java** using **JDBC**)
that can communicate (via **SQL**!)
with a database managed by a database server.
In the case of **Java**,
the **JDBC** library lays out how we do this.

But it is useful to be able to connect _directly_
to a database as a user
to issue (**SQL**) commands and queries “by hand.”
In _operating systems_,
a general _command-line_ program
where the user can issue commands directly
is called a _shell_.
The term _shell_ is used more generally
for any interactive, command-line program
interface to a system.
The `psql` shell is the main tool
for interacting with a database
managed by a **PostgreSQL** server.

Most database systems are a _client-server_ configuration:
the **DBMS** (_data-base management system_) runs as a _server_;
and applications that use and manipulate the data
managed by the **DBMS**
are _clients_.
Most often,
a database client does not run on the same machine as the server.
(A **DBMS** server may have many, many active clients at any given time.)
This will commonly be the case
for `psql` clients.

Are there non-shell clients for **Postgres**,
like pretty GUI (_graphical user interface_) ones?
Yes, of course.
But we do not really need them here.

------------------------------------------------------------------------------
### **§** The **EECS-3421** **Postgres** set up

For the class,
we use a **Postgres** server running on `db.eecs.yorku.ca`.
Let us nickname this database server **DB**.
An **RDBMS** can manage many different _databases_.
In the class,
we talk about a database as a logically organized, coherent
_schema_ that models a _domain_
and the data that populates the schema.
On the system side of the fence,
a database for the **RDBMS** 
is a data (and schema) _container_
with its own authorizations and permissions
(and resource allocations).

Each student is assigned their own database on **DB**
for use in the course.
The database is named the same as their EECS account name;
e.g.,
EECS account “godfrey” has
a database on **DB** named “godfrey”.

------------------------------------------------------------------------------
### **§** The **PSQL** shell client

All the machines in PRISM have the `psql` client installed.
To connect to your (or some) _database_ hosted by **DB**,
open a shell on your favourite PRISM machine
and type, say,

```
% psql -h db -U godfrey -d godfrey
```

Well, replace “godfrey” with your account (and database) name.
The `-h` argument (“host”) specifies the _host_ machine
where the database server runs.
A `-p` argument (“port”) would specify the _port_ (e.g., `5432`);
more than one database system _could_ be running on the _same_ host.
This value has a default, which is correct here.
The `-U` argument (“user”) specifies the _database user account_.
**DB** has been set up so this mirrors your EECS account.
We can leave out the `-U`;
it will default to be your “EECS account”.
And finally,
the `-d` argument (“database”) says which _database_
you are connecting to.
Usually,
this will be _your_ database,
which has the same name as your EECS account,
as talked about above.
So,

```
% psql -h db -d godfrey
```

Again,
replace “godfrey” with _your_ account name.
You will be prompted for a _password_.
We have initialized this to be your York _student number_.
Once you have correctly provided your password,
you will be in the `psql` shell.

Whoa!
What are the commands?!
How do you navigate?
We leave that for you to learn!
Reference
[**PostgreSQL**: _Documentation_: _11_: _psql_][psql-11]
to get started.
Note that the shell has a really good built-in _help_ facility.
(To quit,
type “quit” or “^D”.)
And, of course,
we will have to learn **SQL** to make really full use of `psql`!

[psql-11]: https://www.postgresql.org/docs/11/app-psql.html (PostgreSQL: Documentation: 11: psql)

------------------------------------------------------------------------------
### **§** Setting up drop-through authentication

You can live with having to type in your password
each time you start a `psql`,
of course.
And most people will just deal with it.

But do you _really_ not like having to type in your **Postgres** password
every time you invoke the `psql` shell?
If so,
and if you are so inclined,
here is a way to set up _drop-through authentication_.

#### **¶** the `.pgpass` file

By putting a file named “.pgpass” in your home directory
on PRISM
with perms `600`
(e.g., `chmod u=+rw-x,g=-rwx,o=-rwx .pgpass`)
—
thus,
“on the machine” where your client will run
—
will give you drop-through authentication
to the `psql` shell
without having to provide your passwd.

The format of `.pgpass` is as follows:

```
HOST:PORT:DATABASE:USER:PASSWD
```

E.g.,

```
db:5432:godfrey:godfrey:LIES
```

In that line,
“db” is the host machine,
where the **Postgres** server is,
“5432” is the connection port
(on which clients call the server),
the first “godfrey”
is the _database_ hosted by the **Postgres** server
to which we are connecting,
the second “godfrey”
is the _database account_
that we are using,
and “LIES”
is the account's _password_.

In the standard set up for **Postgres**,
the server listens on port `5432` for client connections.
Of course,
in the set up,
one could use a different port for this.
(The server on `db` is using port `5432`.)

You can put in wildcards with “*”.
**PSQL** is matching into the file.
E.g.,

```
db:*:godfrey:godfrey:LIES
```

The command-line call still needs you to provide the HOST,
regardless,
it seems.
E.g., 

```
% psql -h db
```

#### **¶** sources  

* [**PostgreSQL**: Use of psql, .pgpass file and PGPASSFILE to store passwords][PGPASS]
* Can be found in the Postgresql documentation too, of course.

[PGPASS]: https://newfivefour.com/postgresql-pgpass-password-file.html (PostgreSQL: Use of psql, .pgpass file and PGPASSFILE to store passwords)

------------------------------------------------------------------------------
### Using your _own_ **PSQL** client

Want to run your _own_ `psql` client
on your _own_ computer and
connect to the class's **PostgreSQL** server?
_Why?!_
Well,
that would certainly be overkill
for what you need
for your classwork
in this course.
But we _are_ training data scientists here,
and such skills are more than useful.

#### **¶** install the **PostgreSQL** client

So, if you are so inclined and want to do this,
follow along.
First,
install the **PostgreSQL** _client_ on your computer.
The class's server running at `db.eecs.yorku.ca` is 
version `11.1`,
so install the latest **PostgreSQL 11** client.
(This was `11.8` when I did.)

#### **¶** set up an **SSH** tunnel

```
% ssh -L 63333:db.eecs.yorku.ca:5432 godfrey@red.eecs.yorku.ca
```

This opens a shell on `red`.
And this has set port forwarding from `localhost:63333` to `db:5432`
(the connection port for clients for **Postgres**)
from your machine through `red`.

Replace `godfrey` above with _your_ EECS account name,
of course.
Port `63333` is the port on your computer
to be used
by your local `psql` via which to connect.
You can use whatever port number you want
(which is not already in use by something else);
IANA reserves ports `49152` through `65535`
on **\*nix** machines
for “private” use.
And port `5432` _on the other side of the tunnel_
is the standard port that a **PostgreSQL** server
listens on for clients.
(**SSH** is binding your local port `63333` to port `5432` at `db`.)

Why are we wanting to tunnel _through_ `red`?
We cannot connect to `db` _directly_ from outside
of the PRISM local area network (LAN).
The machine `red.eecs.yorku.ca`
is the _gateway_ computer through
which machines outside of PRISM can connect.

#### **¶** and connect your **PSQL** client to EECS's Postgres server

Then, in a second local shell,
we can say

```
% psql -h localhost -p 63333 -d godfrey
```

Replace `godfrey` with _your_ database name.
(By convention,
for **EECS-3421**,
we make each person a _database_
of the _same_ name as their EECS account.)

And this has connected your _local client_ `psql`
to `db`'s **Postgres** server!
Your connection from your computer (`localhost`) to `red`
is on an encrypted channel,
via the ssh tunnel.
The connection from `red` to `db` is not in this ssh tunnel.
**Postgres** on `db` sees your client
as if it were on `red`.

Note you are asked for your _passwd_ for `db`'s **Postgres**,
even if you set up `.pgpass` in your home directory on PRISM
correctly.
Why?
Your `psql` client is running locally,
on your machine!

If you are on a __\*nix__ machine (_linux_, _unix_, _macOS_),
you could set up a `.pgpass` in your home directory
on your local computer to allow for pass-through authentication.
Your local `psql` client sees this connection
as if it were to a server running locally (`localhost`).
It is the magic of the ssh tunnel we have running that
tunnels this to the **Postgres** server on `db.eecs.yorku.ca`!
So,
our local `.pgpass` would need to look, for example, like

```
localhost:*:godfrey:godfrey:LIES
```

Note that,
if you are running a local **Postgres** server of your own,
this could get in the way of your connecting to your local server!
So configure things sensibly.

#### **¶** sources

For this,
I consulted and adapted
[PostgreSQL: Documentation: 9.6: Secure TCP/IP Connections with SSH Tunnels][Tunnel].

#### **¶** Windows OS

[Tunnel]: https://www.postgresql.org/docs/9.6/ssh-tunnels.html (PostgreSQL: Documentation: 9.6: Secure TCP/IP Connections with SSH Tunnels)

If you are on a _windows_ machine,
then … good luck!
There is no doubt a way to set this up,
but I have not investigated how yet.

------------------------------------------------------------------------------
### **§** Using your _own_ **PostgreSQL** server (and **PSQL** client)

**Postgresql** is not too difficult an install
(compared to, say, **IBM DB2** or **Oracle**).
So,
if you are really interested,
have a decent machine,
and are so inclined,
go for it!

------------------------------------------------------------------------------
### **§** Using a different RDBMS

**Postgres** is not the only _relational database management system_.
Everyone has likely heard of **Oracle**, **IBM DB2**,
and Microsoft's **SQL Server**.
And there are many others.

Projects in the class are on a **PostgreSQL** database system,
however.
And while **SQL** is **SQL** is **SQL**,
there are minor differences from system to system,
enough so to be annoying.
So we do insist that you do your work for this class in **Postgres**.
It is your responsibility to ensure
that what you turn in
works properly on PRISM's **Postgres** (**DB**).

That is not to say you shouldn't practise **SQL**
on another database system if you have one readily available.
It's all good.

------------------------------------------------------------------------------

|                     |                       |
|--------------------:|:----------------------|
| **initial author**: | _parke godfrey_       |
|        **created**: | _2020 August 7_       |
|   **last revised**: | _2020 August 13_      |

[PG]: mailto:godfrey@yorku.ca (Mail to "Parke Godfrey" )

------------------------------------------------------------------------------