Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Users are assigned cloud-based workstations to perform research and analysis on the datasets. This section
provides a description of how to launch and use these workstations.

...

  1. Users can see the assigned workstations by clicking on WORKSTATIONS from the top
    menu. By default, all the workstations are in an inactive stopped state.

  2. Click on Start to start the workstation.

  3. The workstation should become available within five minutes; you may not see any change immediately. A message will appear when the workstation has been successfully started.

  4. Now click Launch for the workstation.

  5. This will provide a user access to their workstation within the browser. The workstation may take a few minutes to initialize. When complete, a login screen will appear. User is prompted to re-enter SDC username and password.

Installed Software

...

By default, users will have the following installed on their workstations:

  • Python 3.8

  • Cyberduck

  • 7-zip

  • Javelin PDF Reader

  • Notepad++

  • Git

  • Git Extensions

  • Meld

  • Google Chrome

  • JDK (JAVAJava)

  • LibreOffice

  • Putty

  • R

  • RStudio Desktop

  • SQL Workbench

  • AnacondaAnaconda3

  • KeePass

Connecting to the Data Warehouse

...

Connecting to Waze Data in Redshift Using SQL Workbench

Launch SQL Workbench by double-clicking the SQL Workbench shortcut finding it on the desktopStart menu.

...

Create There should be a Redshift connection profile to connect to Waze data:

  1. Create a new connection profile by selecting the top left corner icon on the “Select
    Connection Profile” window.

  2. Select “Amazon Redshift Driver” from the Driver drop-down.

  3. Update the URL section with the Redshift URL provided in the email from the
    support desk detailing Redshift login credentials.

  4. Provide your username and password received in the welcome email.

  5. Click on the Test button at the bottom to test the connection. A pop-up dialog will
    appear confirming a successful or failed connection.

...

(Waze) connection already setup. Just edit the username/password and you should be able to connect. If you are not able to connect, please open a Service Desk ticket so that we can assist you.

Connecting to Waze Data in Redshift Using Python

Here’s a few helpful links on Python

NOTE: When you are granted access to Waze data, the SDC support team creates a new
Redshift user for you, assigns it with a Redshift password, and emails you with information on
the Redshift host you will connect to. This email message provides the redshiftHost,
userName, and userPassword values shown below. Your Redshift credentials are only used
for connecting to Redshift and NOT for accessing the portal, which uses your separate SDC
credentials.
Important: The default version of Python installed on the SDC Windows Workstations is
v2v3.7.48. There are two required Python modules that must be installed prior to attempting to
connect to Redshift with Python using the example code below. We recommend using a virtual environment, you can install these modules into the system interpreter. To install these modules into the system interpreter, open a
Windows Command Prompt, and enter the following two commands:

Code Block
C:\Users\username> pip install psycopg2

...


C:\Users\username> pip install numpy

The above "pip install …" command(s) only need to be run ONCE on the SDC Windows
Workstation. Once the Python modules are installed, they remain available, even across reboots
of the workstation.
To test Python connectivity to Redshift, open the IDLE python editor and execute the following:

...

Connecting to the Hadoop Hive Metastore

Launch SQL Workbench by double-clicking on the SQL Workbench shortcut on the desktop:

...

Create a new connection profile by selecting the top left corner icon on the “Select
Connection Profile” window.

...

Select “Hive JDBC” from the Driver drop-down.

...

Update URL section with the Hive URL.

...

Provide your username and password received in the welcome email. NOTE: You are
not required to enter the “@internal.sdc.dot.gov” portion of your username to log on.

Click on the Test button at the bottom to validate your connection. A pop-up dialog will
appear confirming a successful or failed connection. If you continue running into a failed
connection, contact the SDC support desk for assistance at sdc-support@dot.gov

...

finding it on the Start menu.

There should already be a Hive (CVP) connection already setup. Just edit the username/password and you should be able to connect. If you are not able to connect, please open a Service Desk ticket so that we can assist you.

Update Data Formatting Settings in SQL Workbench

...

Connecting to the SDC Hadoop Data Warehouse Using Python

Here’s a few helpful links on Python

Important: The default version of Python installed on the SDC Windows Workstations is
v2v3.7.48. There are two required Python modules that must be installed prior to attempting to
connect to Hadoop/Hive with Python using the example code below. We recommend using a virtual environment, you can install these modules into the system interpreter. To install these modules into the system interpreter,
open a Windows Command Prompt, and enter the following two commands:

Code Block
C:\Users\username> pip install impyla

...


C:\Users\username> pip install numpy

The above "pip install …" command(s)s only need to be run ONCE on the SDC Windows
Workstation. Once the Python modules are installed, they remain available, even across reboots
of the workstation.
To test Python connectivity to the data warehouse, open the IDLE python editor and execute:

...

This should result in an array of tables displayed to the user.

Connecting to Redshift from Linux Environment

Credentials to access the Waze Redshift database are communicated from the SDC Support (sdc-support@dot.gov)

  • In R, it is possible to connect to Redshift using multiple packages. The RPostgreSQL
    package provides a simple method. This package requires the PostgreSQL library to be
    installed at the system level; if it is not installed, it would be necessary to install as root in
    the terminal:

$ sudo yum install postgresql-devel

...

In R, you may need to install.packages(“RPostgreSQL”, dep=T) if you
do not already have the package installed.

...

to

...

the

...

user

...

  • A database can then be queried using the dbGetQuery() function.

Accessing Jupyter Notebook and RStudio Server

Linux users can access their Jupyter Notebook and RStudio Server using the Firefox web
browser through windows workstation using below URLs.

Windows users can click on the “RStudio” shortcut icon present on the desktop to open
RStudio console.

Manage Workstations

After launching their workstations, users can manage resizing CPU/RAM and scheduling uptime for a workstation by clicking on its Manage button as shown below.

...