Welcome to Software Carpentry Etherpad!

This pad is synchronized as you type, so that everyone viewing this page sees the same text. This allows you to collaborate seamlessly on documents.

Use of this service is restricted to members of the Software Carpentry and Data Carpentry community; this is not for general purpose use (for that, try etherpad.wikimedia.org).

Users are expected to follow our code of conduct: http://software-carpentry.org/conduct.html

All content is publicly available under the Creative Commons Attribution License: https://creativecommons.org/licenses/by/4.0/

Instructor emails:


POST WORKSHOP SURVEY https://www.surveymonkey.com/r/swc_post_workshop_v1?workshop_id=2018-05-24-LBNL

2018-05-24-LBNL
sudo pip install requests
python --version
[Should print something with "Anaconda" in it.]

If not:
    If on Windows:
        PATH=/c/Users/<USERNAME>/Anaconda3:$PATH
        n
Day 1 - AM:        
## UNIX Shell

Bash terminal (emulate UNIX)
ls -F (show difference btw folders and files); many other display options for ls (-l, -t, -h, -R, -a)
pwd (show working directory)
cd .. (go up one level); cd ~ (back to home directory)
mkdir (make directory)
nano (text editor, either open an existing or create a new)
touch filename.txt (another way of creating a file. It's empty)
rm filename.txt (remove file, no way back!); rm -i (gives a way to confirm); rm -r (remove directory, no confirmation); rm -r -i (confirm file by file)
up arrow allows recovering previous commands; tab to finish typing a file name.
mv directory/filename1.txt directory/filename1.txt (rename a file, from the directory above, also can be done in the directory)
cp directory/filename1.txt directory/filename1.txt (copy a file and give a new name, from the directory above)
ls | wc *.pdb (show everything with a certain extension, can do w/o "ls |". results has 3 columns: # of lines, # of words, # of characters)
cat ("concatenate"; with just one file it will print the contents to the screen.  With multiple arguments it will print all of those files back-to-back.)
^C to get out
* and ? (wild cards: * is everyting, ? is just one letter)
(under "molecules") wc -l * > lengths.txt
sort -n (treat as numbers) lengths.txt > sorted-lengths.txt (use ">" to output to a file)
echo hello > testfile01.txt (output a string to a flie)
echo hello >> testfile02.txt 
(under "data\animals.txt") head -n 3 animals.txt > animalsUpd.txt  $ tail -n 2 animals.txt >> animalsUpd.txt (">>" append to the previous)
wc -l *.txt | sort -n | head -n 5 (piping: count all txt files, sort as number, then show the first 5)
(under "creatures") (for loop: list; "$" is like "of"; get the first 3 lines in both files)

Day 1 - PM:

## Python 3
Running Jupyter notebook does NOT require internet connection (for data privacy). 
help(max) 
to show full function arguements (Shift + Tab)

Day 2 - AM:
organize data folder under jupyter notebook file so can access anytime.
change a cell to markdown: Ctrl + M (release then) M
to look up shortcuts in Jupyter notebook: click on the "keyboard" button.
$$: to center
% is a jupyter notebook magic command, not a python command.
%matplotlib notebook: interactive figure (save button does not work with Chrome, but with firefox).
Glob: loop thru different files.
Create python file in Jupyter notebook: "new" txt file, then change name to "xxx.py".
use seaborn to create regression and other plots by simply copy their codes. https://seaborn.pydata.org/tutorial.html

Day 2 - PM:
In bash terminal: (one-time configurations)
first create a folder on desktop using "mkdir guacamole" command, then go in there "cd guacamole/"
$ git config --global user.name "your alias" (changed/saved by who).
$ git config --global user.email "your email address" (people might contact you)
$ git config --global color.ui "auto" 
$ git config --global core.editor "nano -w"
$ git init (initiate an empty repository) - everything in the "guacamole" directory.
$ git status (state of the git at the moment)
$ nano instructions.txt
^o to save, enter, ^x to exit
$ git status
$ git add (stage)
$ git commit (add some text for the log, then)
^o to save, enter, ^x to exit
$ git log (list of commits made so far)
$ git diff (show differences made)
(after making some changes in myfile.txt)
git add myfile.txt
git commit -m "my recent changes" ("my recent changes" will be in the log)
git commit --amend (to change log text)
git checkout (grab an older version of file)
HEAD^^ means 2 commits from the most recent one (so 3 versions back). 
Or you can use the unique number (5, 6 digit usually enought) in the log to refer to the commit that you wanted to restore.
checkout command
reset command does not restore the changes in the file, but only drop the commit command.
(not going to cover branching)

## Use GitHub for collaboration:
Bash terminal assumes UNIX system.
Add collaborator: get their URL, terminal clone, add in collaborator (in browser), accept in email.
Resolve conflict (both parties changed the same line): manually keep the line you want to keep and then remove the rest.
To view command history in terminal: history
save a commit for every logical change and view change in browser or in terminal providing unique codes of two commits.

### List of commands



### Checkpoint questions



# Python - Day I
Lesson Materials: http://swcarpentry.github.io/python-novice-gapminder/
Exported Notebook: https://bsmith89.github.io/2018-05-24-LBNL/files/python-day-1.html




jupyter notebook
python 3 notepad
rename the newly reated notepad. saved as .ipynb file

import unmpy as np
np.mean() etc

import pandas
pandas.read_cvs("filename")

pandas.read_cvsSHIFTTAB for help

pandas typically as pd
numpy as np

xx.loc('') for row names
xx.columnname.rowname for element access

import matplotlib.pyplot for plots
matplotlib.pyplot.hist(europe.gdpPercap_1952,bins=20) for histogram

Histogram final:
plt.hist(europe.gdpPercap_1952,bins = 20,alpha=0.5,label='1952')
plt.hist(europe.gdpPercap_1957,bins = 20,alpha=0.5,label='1957')
plt.xlabel('GDP Per-capita (US$2007)')
plt.ylabel('Number of European countries')
plt.legend()
plt.title('Economic shifts in post-war Europe')

print()

Scatter plot:
    
Jupyter: use !ls for linux command
ctr-m -> command mode
ctr-m m -> markdown (enter cell by hitting enter)
equation in markdown with $$ or $$ $$ (centered)

start plot in jupyter (inline) not extra file
%matplotlib input 
then plt.plot

or 
%matplotlib notebook (will start an interactive figure)
then plt.figure() and plt.plot()

a=range(0,40,2)
b=list(a).   # list is specific to python; vectors in numpy

globbing = pattern match read multtiple files


import pandas as pd

def avg_gdpPercap(country,continent,year):
    '''
    this does xyz
    '''
    df = pd.read_csv('gapminder-data/gapminder_gdp_'+continent+'.csv',index_col=0)
    c = df.loc[country]
    gdp_decade = 'gdpPercap_'+str(year//10)
    avg=(c.loc[gdp_decade+'2']+c.loc[gdp_decade+'7'])/2
    return avg

for continent in df['continent']:
    if continent == 'Africa':
        print(1)
    elif continent == 'Europe':
        print(1/2)
    else:
        print(0)
        
        
import seaborn as sns
sns.get_dataset_names()


To leave a space when concatenating strings, you can add ' '; for example physicist + ' ' + chemist
A mean from numpy would be:
    import numpy
    numpy.mean(....)
    
    
Standard abbreviations are:
    import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt
    
Bad practice (most of the time...) is 
from numpy import *


# Python - Day II
Helpful visualization tool: http://pythontutor.com/
Seaborn Tutorial: https://seaborn.pydata.org/tutorial.html
https://seaborn.pydata.org/examples/anscombes_quartet.html

Play Data: kaggle.com

# Version Control with Git and Github



What does  git init do ? One-time. Adds a .git folder (Turns "folder" into a "repository")
What does  git status do ? Tells you more info about your "repository"
What does  git add do ?  Stages any changes since the last "commit"
What does  git commit do ? Git saves a snapshot of changes that are staged
What does  git log do ? Shows history 

Notebook from day 2: https://github.com/jfkoehler/swc-berkeley