How to: Get Started with Python on your SDC Workstation
Using Python
SDC Data Analysts use the Python programming language to analyze SDC data. The Data Analysts need to be able to pull data from the SDC Data Lake into Python for analysis.
SDC workstations are pre installed with Python version 3.11.x.
SDC workstations have AWS CLI pre installed and researchers can use it via the Command Prompt.
SDC workstations come with pre-installed text editor VS Codium.
For more information visit - BeginnersGuide - Python Wiki
Creating your virtual environments
The
venv
module supports creating lightweight “virtual environments”, each with their own independent set of Python packages installed in theirsite
directories.For more info visit venv — Creation of virtual environments
How to create and run your first python program using VS Code - visit link https://code.visualstudio.com/docs/python/python-tutorial
Working with Version control
You can find more information on Version control here - RT Guide: Chapter 5, Using GitLab
Accessing SDC data from your Workstation-Using Python to access data.
SDC researchers can access the data in two ways, using-
AWS CLI is installed on every SDC machine and can be used to access data.
Additionally users can download data on their SDC machine using Cyberduck. For complete description on how to use Cyberduck please click here https://securedatacommons.atlassian.net/wiki/spaces/DESK/pages/910622733
For additional questions please email sdc-support@dot.gov.
Using Anaconda
Anaconda Navigator is a desktop graphical user interface (GUI) included in Anaconda® Distribution that allows you to launch applications and manage conda packages, environments, and channels without using command line interface (CLI) commands. Navigator can search for packages on http://Anaconda.org or in a local Anaconda Repository.
How to start an Anaconda session.
After logging into your SDC Workstation open app Anaconda Navigator.
By default, all applications available to launch or install within Navigator are displayed on the Home page.
Working with Notebooks.
The Jupyter Notebook application allows you to create and edit documents that display the input and output of a Python or R language script.
Click On Jupyter Notebooks, hit “Launch” and open with Chrome
Now click on “New” dropdown on the upper right and click on Python 3 (ipykernel) as shown in the screenshot below
You can now start running your code/commands in here using Python 3.
Working with Virtual environments
With Anaconda Navigator, you can create, export, list, remove, and update environments that have different versions of Python and/or other packages installed. Switching or moving between environments is called activating the environment. Only one environment is active at any point in time.
Go the the application homepage and click on the Environments from Left Panel as shown in the screenshot below and from the bottom of the environments list, select Create.
Name your environment and select Python version and click “Create” as shown below.
You have successfully created your own environment. You can create libraries and install packages specific to that environment. This allows you to have your own little project for the number of scripts that you are running.
For more documentation on managing environments please visit - https://docs.anaconda.com/free/navigator/tutorials/manage-environments/
Benefits of Anaconda for the SDC Data Analysts
Anaconda has a major advantage as it comes with many pre-installed packages generally used in machine learning and data science. This saves a lot of effort and time as one does not need to install each package separately.
With Anaconda you can create separate environments for different projects.
You can create notebooks within a virtual environment.
How to create and run your first python program (Hello World, for example)
https://docs.anaconda.com/free/anaconda/getting-started/hello-world/
References: