...
Python 3.9
VSCodium
Cyberduck
7-zip
Notepad++
Git
Git Extensions
Meld
Google Chrome
JRE (Java)
LibreOffice
Putty
R
RStudio Desktop
DBeaver
Anaconda3
KeePass
Connecting to the Data Warehouse
The following sections illustrate how the user can connect to the data stores available to the
SDC.
Connecting to Waze Data in Redshift Using SQL Workbench
Launch SQL Workbench by finding it on the Start menu.
There should be a Redshift (Waze) connection already setup. Just edit the username/password and you should be able to connect. If you are not able to connect, please open a Service Desk ticket so that we can assist you.
Connecting to Waze Data in Redshift Using Python
Here’s a few helpful links on Python
NOTE: When you are granted access to Waze data, the SDC support team creates a new
Redshift user for you, assigns it with a Redshift password, and emails you with information on
the Redshift host you will connect to. This email message provides the redshiftHost,
userName, and userPassword values shown below. Your Redshift credentials are only used
for connecting to Redshift and NOT for accessing the portal, which uses your separate SDC
credentials.
Important: The default version of Python installed on the SDC Windows Workstations is
v3.8. There are two required Python modules that must be installed prior to attempting to
connect to Redshift with Python using the example code below. We recommend using a virtual environment, you can install these modules into the system interpreter. To install these modules into the system interpreter, open a Windows Command Prompt, and enter the following two commands:
Code Block |
---|
C:\Users\username> pip install psycopg2
C:\Users\username> pip install numpy |
The above "pip install …" command(s) only need to be run ONCE on the SDC Windows
Workstation. Once the Python modules are installed, they remain available, even across reboots
of the workstation.
To test Python connectivity to Redshift, open the IDLE python editor and execute the following:
Code Block |
---|
from __future__ import print_function
import psycopg2
import numpy
dbName = 'dot_sdc_redshift_db'
redshiftHost = '[host address]'
redshiftPort = 5439
userName = '[username]'
userPassword = '[password]'
# query = 'select * from dw_waze.alert limit 10;'
query = "select * from dw_waze.alert where
alert_type='ACCIDENT' and city = 'Severance, CO'"
conn = psycopg2.connect(
dbname=dbName,
host=redshiftHost,
port=redshiftPort,
user=userName,
password=userPassword)
cursor = conn.cursor()
cursor.execute(query)
result = cursor.fetchall()
result = numpy.array(result)
# print(result)
for r in result:
print (r[1], r[8], r[18], r[19], r[22], sep='\t') |
For further information and examples, refer to the internal SDC GitLab collaboration site.
Connecting to the Hadoop Hive Metastore
Launch SQL Workbench by finding it on the Start menu.
There should already be a Hive (CVP) connection already setup. Just edit the username/password and you should be able to connect. If you are not able to connect, please open a Service Desk ticket so that we can assist you.
Update Data Formatting Settings in SQL Workbench
Once the connection has been established, navigate to Tools | Options | Data formatting and update the Decimal digits value to 0.
...
Connecting to the SDC Hadoop Data Warehouse Using Python
Here’s a few helpful links on Python
Important: The default version of Python installed on the SDC Windows Workstations is
v3.8. There are two required Python modules that must be installed prior to attempting to
connect to Hadoop/Hive with Python using the example code below. We recommend using a virtual environment, you can install these modules into the system interpreter. To install these modules into the system interpreter, open a Windows Command Prompt, and enter the following two commands:
Code Block |
---|
C:\Users\username> pip install impyla
C:\Users\username> pip install numpy |
The above "pip install …" command(s)s only need to be run ONCE on the SDC Windows
Workstation. Once the Python modules are installed, they remain available, even across reboots
of the workstation.
To test Python connectivity to the data warehouse, open the IDLE python editor and execute:
Code Block |
---|
from __future__ import print_function
from impala.dbapi import connect
import numpy
conn = connect(
host='[host address]',
port=10000,
auth_mechanism='PLAIN',
user='[your_username]' ,password='[your_password]')
cursor = conn.cursor()
cursor.execute('SHOW TABLES')
result = cursor.fetchall()
result = numpy.array(result)
# print(result)
for r in result:
print (r)
|
This should result in an array of tables displayed to the user.
Manage Workstations
After launching their workstations, users can manage resizing CPU/RAM and scheduling uptime for a workstation by clicking on its Manage button as shown below.
...