# `psql`: **PostgreSQL**'s shell client ------------------------------------------------------------------------------ ### **§** General Notes * We will most often abbreviate “PostgreSQL” as “Postgres,” simply because it is easier to say and read. ([**Postgres** _vs_ **PostgreSQL** — _What is the difference?_ | **TablePlus**][PGres]) * “PRISM” is the nickname of EECS's network and collection of computers. * The class's **Postgres(ql)** server * runs on _host_ machine `db.eecs.yorku.ca` and * is version `11.1`. * All machines on the PRISM LAN (_local area network_) are hidden behind a firewall. You must go through the _gateway_ machine `red.eecs.yorku.ca` to access any of them. * **\*nix** is the general term for any _operating system_ based in principle on **unix**. This covers, of course, any of the variants of **unix**, **linux**, and **macOS** (for which the core is a **unix** _kernel_ called **darwin**). [PGres]: https://tableplus.com/blog/2018/10/postgres-vs-postgresql.html (Postgres vs PostgreSQL - What is the difference? | TablePlus) ------------------------------------------------------------------------------ ### **§** How to “talk” to a database You need a _program_ to set up a connection to the _database server_ — and specify the _database_ that the server manages to which you want to talk — to be able to “interact” with the data (in that database that that database server is managing). As we shall discover in the course, the _only_ way to interact with a _relational_ database is via the “query language”. (This is by design! And we shall be learning why.) Nowadays for relational databases, the standard _query language_ is **SQL**. Later in the course, we learn how to write programs (e.g., in **Java** using **JDBC**) that can communicate (via **SQL**!) with a database managed by a database server. In the case of **Java**, the **JDBC** library lays out how we do this. But it is useful to be able to connect _directly_ to a database as a user to issue (**SQL**) commands and queries “by hand.” In _operating systems_, a general _command-line_ program where the user can issue commands directly is called a _shell_. The term _shell_ is used more generally for any interactive, command-line program interface to a system. The `psql` shell is the main tool for interacting with a database managed by a **PostgreSQL** server. Most database systems are a _client-server_ configuration: the **DBMS** (_data-base management system_) runs as a _server_; and applications that use and manipulate the data managed by the **DBMS** are _clients_. Most often, a database client does not run on the same machine as the server. (A **DBMS** server may have many, many active clients at any given time.) This will commonly be the case for `psql` clients. Are there non-shell clients for **Postgres**, like pretty GUI (_graphical user interface_) ones? Yes, of course. But we do not really need them here. ------------------------------------------------------------------------------ ### **§** The **EECS-3421** **Postgres** set up For the class, we use a **Postgres** server running on `db.eecs.yorku.ca`. Let us nickname this database server **DB**. An **RDBMS** can manage many different _databases_. In the class, we talk about a database as a logically organized, coherent _schema_ that models a _domain_ and the data that populates the schema. On the system side of the fence, a database for the **RDBMS** is a data (and schema) _container_ with its own authorizations and permissions (and resource allocations). Each student is assigned their own database on **DB** for use in the course. The database is named the same as their EECS account name; e.g., EECS account “godfrey” has a database on **DB** named “godfrey”. ------------------------------------------------------------------------------ ### **§** The **PSQL** shell client All the machines in PRISM have the `psql` client installed. To connect to your (or some) _database_ hosted by **DB**, open a shell on your favourite PRISM machine and type, say, ``` % psql -h db -U godfrey -d godfrey ``` Well, replace “godfrey” with your account (and database) name. The `-h` argument (“host”) specifies the _host_ machine where the database server runs. A `-p` argument (“port”) would specify the _port_ (e.g., `5432`); more than one database system _could_ be running on the _same_ host. This value has a default, which is correct here. The `-U` argument (“user”) specifies the _database user account_. **DB** has been set up so this mirrors your EECS account. We can leave out the `-U`; it will default to be your “EECS account”. And finally, the `-d` argument (“database”) says which _database_ you are connecting to. Usually, this will be _your_ database, which has the same name as your EECS account, as talked about above. So, ``` % psql -h db -d godfrey ``` Again, replace “godfrey” with _your_ account name. You will be prompted for a _password_. We have initialized this to be your York _student number_. Once you have correctly provided your password, you will be in the `psql` shell. Whoa! What are the commands?! How do you navigate? We leave that for you to learn! Reference [**PostgreSQL**: _Documentation_: _11_: _psql_][psql-11] to get started. Note that the shell has a really good built-in _help_ facility. (To quit, type “quit” or “^D”.) And, of course, we will have to learn **SQL** to make really full use of `psql`! [psql-11]: https://www.postgresql.org/docs/11/app-psql.html (PostgreSQL: Documentation: 11: psql) ------------------------------------------------------------------------------ ### **§** Setting up drop-through authentication You can live with having to type in your password each time you start a `psql`, of course. And most people will just deal with it. But do you _really_ not like having to type in your **Postgres** password every time you invoke the `psql` shell? If so, and if you are so inclined, here is a way to set up _drop-through authentication_. #### **¶** the `.pgpass` file By putting a file named “.pgpass” in your home directory on PRISM with perms `600` (e.g., `chmod u=+rw-x,g=-rwx,o=-rwx .pgpass`) — thus, “on the machine” where your client will run — will give you drop-through authentication to the `psql` shell without having to provide your passwd. The format of `.pgpass` is as follows: ``` HOST:PORT:DATABASE:USER:PASSWD ``` E.g., ``` db:5432:godfrey:godfrey:LIES ``` In that line, “db” is the host machine, where the **Postgres** server is, “5432” is the connection port (on which clients call the server), the first “godfrey” is the _database_ hosted by the **Postgres** server to which we are connecting, the second “godfrey” is the _database account_ that we are using, and “LIES” is the account's _password_. In the standard set up for **Postgres**, the server listens on port `5432` for client connections. Of course, in the set up, one could use a different port for this. (The server on `db` is using port `5432`.) You can put in wildcards with “*”. **PSQL** is matching into the file. E.g., ``` db:*:godfrey:godfrey:LIES ``` The command-line call still needs you to provide the HOST, regardless, it seems. E.g., ``` % psql -h db ``` #### **¶** sources * [**PostgreSQL**: Use of psql, .pgpass file and PGPASSFILE to store passwords][PGPASS] * Can be found in the Postgresql documentation too, of course. [PGPASS]: https://newfivefour.com/postgresql-pgpass-password-file.html (PostgreSQL: Use of psql, .pgpass file and PGPASSFILE to store passwords) ------------------------------------------------------------------------------ ### Using your _own_ **PSQL** client Want to run your _own_ `psql` client on your _own_ computer and connect to the class's **PostgreSQL** server? _Why?!_ Well, that would certainly be overkill for what you need for your classwork in this course. But we _are_ training data scientists here, and such skills are more than useful. #### **¶** install the **PostgreSQL** client So, if you are so inclined and want to do this, follow along. First, install the **PostgreSQL** _client_ on your computer. The class's server running at `db.eecs.yorku.ca` is version `11.1`, so install the latest **PostgreSQL 11** client. (This was `11.8` when I did.) #### **¶** set up an **SSH** tunnel ``` % ssh -L 63333:db.eecs.yorku.ca:5432 godfrey@red.eecs.yorku.ca ``` This opens a shell on `red`. And this has set port forwarding from `localhost:63333` to `db:5432` (the connection port for clients for **Postgres**) from your machine through `red`. Replace `godfrey` above with _your_ EECS account name, of course. Port `63333` is the port on your computer to be used by your local `psql` via which to connect. You can use whatever port number you want (which is not already in use by something else); IANA reserves ports `49152` through `65535` on **\*nix** machines for “private” use. And port `5432` _on the other side of the tunnel_ is the standard port that a **PostgreSQL** server listens on for clients. (**SSH** is binding your local port `63333` to port `5432` at `db`.) Why are we wanting to tunnel _through_ `red`? We cannot connect to `db` _directly_ from outside of the PRISM local area network (LAN). The machine `red.eecs.yorku.ca` is the _gateway_ computer through which machines outside of PRISM can connect. #### **¶** and connect your **PSQL** client to EECS's Postgres server Then, in a second local shell, we can say ``` % psql -h localhost -p 63333 -d godfrey ``` Replace `godfrey` with _your_ database name. (By convention, for **EECS-3421**, we make each person a _database_ of the _same_ name as their EECS account.) And this has connected your _local client_ `psql` to `db`'s **Postgres** server! Your connection from your computer (`localhost`) to `red` is on an encrypted channel, via the ssh tunnel. The connection from `red` to `db` is not in this ssh tunnel. **Postgres** on `db` sees your client as if it were on `red`. Note you are asked for your _passwd_ for `db`'s **Postgres**, even if you set up `.pgpass` in your home directory on PRISM correctly. Why? Your `psql` client is running locally, on your machine! If you are on a __\*nix__ machine (_linux_, _unix_, _macOS_), you could set up a `.pgpass` in your home directory on your local computer to allow for pass-through authentication. Your local `psql` client sees this connection as if it were to a server running locally (`localhost`). It is the magic of the ssh tunnel we have running that tunnels this to the **Postgres** server on `db.eecs.yorku.ca`! So, our local `.pgpass` would need to look, for example, like ``` localhost:*:godfrey:godfrey:LIES ``` Note that, if you are running a local **Postgres** server of your own, this could get in the way of your connecting to your local server! So configure things sensibly. #### **¶** sources For this, I consulted and adapted [PostgreSQL: Documentation: 9.6: Secure TCP/IP Connections with SSH Tunnels][Tunnel]. #### **¶** Windows OS [Tunnel]: https://www.postgresql.org/docs/9.6/ssh-tunnels.html (PostgreSQL: Documentation: 9.6: Secure TCP/IP Connections with SSH Tunnels) If you are on a _windows_ machine, then … good luck! There is no doubt a way to set this up, but I have not investigated how yet. ------------------------------------------------------------------------------ ### **§** Using your _own_ **PostgreSQL** server (and **PSQL** client) **Postgresql** is not too difficult an install (compared to, say, **IBM DB2** or **Oracle**). So, if you are really interested, have a decent machine, and are so inclined, go for it! ------------------------------------------------------------------------------ ### **§** Using a different RDBMS **Postgres** is not the only _relational database management system_. Everyone has likely heard of **Oracle**, **IBM DB2**, and Microsoft's **SQL Server**. And there are many others. Projects in the class are on a **PostgreSQL** database system, however. And while **SQL** is **SQL** is **SQL**, there are minor differences from system to system, enough so to be annoying. So we do insist that you do your work for this class in **Postgres**. It is your responsibility to ensure that what you turn in works properly on PRISM's **Postgres** (**DB**). That is not to say you shouldn't practise **SQL** on another database system if you have one readily available. It's all good. ------------------------------------------------------------------------------ | | | |--------------------:|:----------------------| | **initial author**: | _parke godfrey_ | | **created**: | _2020 August 7_ | | **last revised**: | _2020 August 13_ | [PG]: mailto:godfrey@yorku.ca (Mail to "Parke Godfrey") ------------------------------------------------------------------------------