Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 17 Next »

Accessing and Launching Workstations

Users are assigned cloud-based workstations to perform research and analysis. This section
provides a description of how to launch and use these workstations.

Launch Workstations

  1. Users can see the assigned workstations by clicking on WORKSTATIONS from the top
    menu. By default, all the workstations are in an stopped state.

  2. Click on Start to start the workstation.

  3. The workstation should become available within five minutes; you may not see any change immediately. A message will appear when the workstation has been successfully started.

  4. Now click Launch for the workstation. Note: the Launch button does not start your workstation. You must start it before you can log in.

  5. This will provide a user access to their workstation within the browser. The workstation may take a few minutes to initialize. When complete, a login screen will appear. User is prompted to re-enter SDC username and password.

Installed Software

By default, users will have the following installed on their workstations:

  • Python 3.8

  • Cyberduck

  • 7-zip

  • Javelin PDF Reader

  • Notepad++

  • Git

  • Git Extensions

  • Meld

  • Google Chrome

  • JDK (Java)

  • LibreOffice

  • Putty

  • R

  • RStudio Desktop

  • SQL Workbench

  • Anaconda3

  • KeePass

Connecting to the Data Warehouse

The following sections illustrate how the user can connect to the data stores available to the

SDC.

Connecting to Waze Data in Redshift Using SQL Workbench

Launch SQL Workbench by finding it on the Start menu.

There should be a Redshift (Waze) connection already setup. Just edit the username/password and you should be able to connect. If you are not able to connect, please open a Service Desk ticket so that we can assist you.

Connecting to Waze Data in Redshift Using Python

Here’s a few helpful links on Python

NOTE: When you are granted access to Waze data, the SDC support team creates a new
Redshift user for you, assigns it with a Redshift password, and emails you with information on
the Redshift host you will connect to. This email message provides the redshiftHost,
userName, and userPassword values shown below. Your Redshift credentials are only used
for connecting to Redshift and NOT for accessing the portal, which uses your separate SDC
credentials.
Important: The default version of Python installed on the SDC Windows Workstations is
v3.8. There are two required Python modules that must be installed prior to attempting to
connect to Redshift with Python using the example code below. We recommend using a virtual environment, you can install these modules into the system interpreter. To install these modules into the system interpreter, open a Windows Command Prompt, and enter the following two commands:

C:\Users\username> pip install psycopg2
C:\Users\username> pip install numpy

The above "pip install …" command(s) only need to be run ONCE on the SDC Windows
Workstation. Once the Python modules are installed, they remain available, even across reboots
of the workstation.
To test Python connectivity to Redshift, open the IDLE python editor and execute the following:

from __future__ import print_function
import psycopg2
import numpy
dbName = 'dot_sdc_redshift_db'
redshiftHost = '[host address]'
redshiftPort = 5439
userName = '[username]'
userPassword = '[password]'
# query = 'select * from dw_waze.alert limit 10;'
query = "select * from dw_waze.alert where 
alert_type='ACCIDENT' and city = 'Severance, CO'"
conn = psycopg2.connect(
dbname=dbName,
host=redshiftHost,
port=redshiftPort,
user=userName,
password=userPassword)
cursor = conn.cursor()
cursor.execute(query)
result = cursor.fetchall()
result = numpy.array(result)
# print(result)
for r in result:
print (r[1], r[8], r[18], r[19], r[22], sep='\t')

For further information and examples, refer to the internal SDC GitLab collaboration site.

Connecting to the Hadoop Hive Metastore

Launch SQL Workbench by finding it on the Start menu.

There should already be a Hive (CVP) connection already setup. Just edit the username/password and you should be able to connect. If you are not able to connect, please open a Service Desk ticket so that we can assist you.

Update Data Formatting Settings in SQL Workbench

Once the connection has been established, navigate to Tools | Options | Data formatting and update the Decimal digits value to 0.

Connecting to the SDC Hadoop Data Warehouse Using Python

Here’s a few helpful links on Python

Important: The default version of Python installed on the SDC Windows Workstations is
v3.8. There are two required Python modules that must be installed prior to attempting to
connect to Hadoop/Hive with Python using the example code below. We recommend using a virtual environment, you can install these modules into the system interpreter. To install these modules into the system interpreter, open a Windows Command Prompt, and enter the following two commands:

C:\Users\username> pip install impyla
C:\Users\username> pip install numpy

The above "pip install …" command(s)s only need to be run ONCE on the SDC Windows
Workstation. Once the Python modules are installed, they remain available, even across reboots
of the workstation.
To test Python connectivity to the data warehouse, open the IDLE python editor and execute:

from __future__ import print_function
from impala.dbapi import connect
import numpy
conn = connect(
host='[host address]', 
port=10000, 
auth_mechanism='PLAIN',
user='[your_username]' ,password='[your_password]')
cursor = conn.cursor()
cursor.execute('SHOW TABLES')
result = cursor.fetchall()
result = numpy.array(result)
# print(result)
for r in result:
print (r)

This should result in an array of tables displayed to the user.

Manage Workstations

After launching their workstations, users can manage resizing CPU/RAM and scheduling uptime for a workstation by clicking on its Manage button as shown below.

A dialogue window appears with two checkbox options:

Selecting each option renders the appropriate tabs in the dialogue window. The icon shown next to each option provides an informational tooltip on their functions.

Resize Workstation

  1. To resize the workstation, select the checkbox for Resize Workstation and then Next to continue.

  2. A message is shown at the bottom of the screen indicating that the workstation will be stopped before applying the resize.

  3. The Resize Workstation tab allows users to select desired CPU/RAM for their
    workstation. Current configurations will be grayed out and unavailable. Users can also
    explore pricing details using the link provided under “click here.”

  4. Select the “Please start my workstation after resizing to the new configuration” checkbox
    to automatically start the workstation with the new configuration after saving changes.

  5. Select Submit after all details are entered.

  6. A Recommended List of instances will appear. Select the desired instance and then the
    Next button.

  7. On the Schedule Date tab, users are prompted to enter a date range for how long the
    resize should last for the workstation instance. Enter the From and To dates and then
    select Submit.

8. Users will be returned to the Workstations tab with updated CPU and memory information. They will also receive a success email message from the system confirming the resize expiration date.

Schedule/Extend Uptime

  1. By default, all workstations are shut down at 11 pm EST. If you want to schedule your
    workstations to be up for a longer period to accommodate analysis runs, select the
    checkbox for Schedule Workstation Uptime and then Next to continue.

  2. The Schedule Workstation Uptime tab allows users to enter a date range for how long the
    workstation uptime should last to skip shutdown. Enter the From and To dates and then
    select Submit.

  3. To extend any currently scheduled uptime for the workstation, select the Workstations
    tab and then select Manage again for the workstation. A new tooltip is now shown for the
    Schedule Workstation Uptime checkbox on mouse hover that indicates previously
    scheduled uptime.

  4. Repeat steps 1-2. For step 2, the From date will already include the date from the
    previously scheduled uptime. Add a new To date later in the calendar and then submit the
    update. The previously scheduled uptime goes inactive while the new one becomes
    active.

  5. After selecting Submit, return to the Workstations tab and then select Manage for the
    workstation. The tooltip shown on hover for the Schedule Workstation Uptime checkbox
    now displays the extended uptime.

Stop Workstations

Users can see the assigned workstations by clicking on the workstations tab on the top right corner of the page. By default, all the workstations are scheduled to stop every day at 11 PM EST. Users can stop the workstations manually by clicking on the Stop button as shown below. A message will appear when the instance is successfully stopped.

  • No labels