Welcome to Software Carpentry Etherpad!
This pad is synchronized as you type, so that everyone viewing this page sees the same text. This allows you to collaborate seamlessly on documents.
Use of this service is restricted to members of the Software Carpentry and Data Carpentry community; this is not for general purpose use (for that, try etherpad.wikimedia.org).
Users are expected to follow our code of conduct: http://software-carpentry.org/conduct.html
All content is publicly available under the Creative Commons Attribution License: https://creativecommons.org/licenses/by/4.0/
Instructor emails:
- Jacob email: koehlerj@newschool.edu
- Byron email: bsmith89@gmail.com
POST WORKSHOP SURVEY https://www.surveymonkey.com/r/swc_post_workshop_v1?workshop_id=2018-05-24-LBNL
2018-05-24-LBNL
sudo pip install requests
python --version
[Should print something with "Anaconda" in it.]
If not:
If on Windows:
PATH=/c/Users/<USERNAME>/Anaconda3:$PATH
n
Day 1 - AM:
## UNIX Shell
Bash terminal (emulate UNIX)
ls -F (show difference btw folders and files); many other display options for ls (-l, -t, -h, -R, -a)
pwd (show working directory)
cd .. (go up one level); cd ~ (back to home directory)
mkdir (make directory)
nano (text editor, either open an existing or create a new)
touch filename.txt (another way of creating a file. It's empty)
rm filename.txt (remove file, no way back!); rm -i (gives a way to confirm); rm -r (remove directory, no confirmation); rm -r -i (confirm file by file)
up arrow allows recovering previous commands; tab to finish typing a file name.
mv directory/filename1.txt directory/filename1.txt (rename a file, from the directory above, also can be done in the directory)
cp directory/filename1.txt directory/filename1.txt (copy a file and give a new name, from the directory above)
ls | wc *.pdb (show everything with a certain extension, can do w/o "ls |". results has 3 columns: # of lines, # of words, # of characters)
cat ("concatenate"; with just one file it will print the contents to the screen. With multiple arguments it will print all of those files back-to-back.)
^C to get out
* and ? (wild cards: * is everyting, ? is just one letter)
(under "molecules") wc -l * > lengths.txt
sort -n (treat as numbers) lengths.txt > sorted-lengths.txt (use ">" to output to a file)
echo hello > testfile01.txt (output a string to a flie)
echo hello >> testfile02.txt
(under "data\animals.txt") head -n 3 animals.txt > animalsUpd.txt $ tail -n 2 animals.txt >> animalsUpd.txt (">>" append to the previous)
wc -l *.txt | sort -n | head -n 5 (piping: count all txt files, sort as number, then show the first 5)
(under "creatures") (for loop: list; "$" is like "of"; get the first 3 lines in both files)
Day 1 - PM:
## Python 3
Running Jupyter notebook does NOT require internet connection (for data privacy).
help(max)
to show full function arguements (Shift + Tab)
Day 2 - AM:
organize data folder under jupyter notebook file so can access anytime.
change a cell to markdown: Ctrl + M (release then) M
to look up shortcuts in Jupyter notebook: click on the "keyboard" button.
$$: to center
% is a jupyter notebook magic command, not a python command.
%matplotlib notebook: interactive figure (save button does not work with Chrome, but with firefox).
Glob: loop thru different files.
Create python file in Jupyter notebook: "new" txt file, then change name to "xxx.py".
use seaborn to create regression and other plots by simply copy their codes. https://seaborn.pydata.org/tutorial.html
Day 2 - PM:
In bash terminal: (one-time configurations)
first create a folder on desktop using "mkdir guacamole" command, then go in there "cd guacamole/"
$ git config --global user.name "your alias" (changed/saved by who).
$ git config --global user.email "your email address" (people might contact you)
$ git config --global color.ui "auto"
$ git config --global core.editor "nano -w"
$ git init (initiate an empty repository) - everything in the "guacamole" directory.
$ git status (state of the git at the moment)
$ nano instructions.txt
^o to save, enter, ^x to exit
$ git status
$ git add (stage)
$ git commit (add some text for the log, then)
^o to save, enter, ^x to exit
$ git log (list of commits made so far)
$ git diff (show differences made)
(after making some changes in myfile.txt)
git add myfile.txt
git commit -m "my recent changes" ("my recent changes" will be in the log)
git commit --amend (to change log text)
git checkout (grab an older version of file)
HEAD^^ means 2 commits from the most recent one (so 3 versions back).
Or you can use the unique number (5, 6 digit usually enought) in the log to refer to the commit that you wanted to restore.
checkout command
reset command does not restore the changes in the file, but only drop the commit command.
(not going to cover branching)
## Use GitHub for collaboration:
Bash terminal assumes UNIX system.
Add collaborator: get their URL, terminal clone, add in collaborator (in browser), accept in email.
Resolve conflict (both parties changed the same line): manually keep the line you want to keep and then remove the rest.
To view command history in terminal: history
save a commit for every logical change and view change in browser or in terminal providing unique codes of two commits.
### List of commands
- pwd ---> Print Working Directory
- clear ---> clear screen
- ls ---> list files
- -F flag means indicate what's a file and what's a directory, it tells what is a directory and what is a file
- -l ---> include this flag to list files with details like date and file size
- -h ---> include with -l to list file sizes in "human readable" format
- -t ---> displays by the time (last modified)
- -a ---> "all": include hidden files (which include "." and "..")
- ls <filename> will list the contents of a different directory than the one we're currently in.
- cd ---> move to a new directory
- Use .. (two periods) to indicate the directory "above" where we are now (the parent directory)
- Use . (one period) to indicate the current directory (Q: why would you want this?)
- mkdir ---> creates a directory (takes a single argument with the name of the new directory)
- nano ---> a text editor inside the terminal
- Once you're inside the text editer Ctrl-X will exit out, Ctrl-O will allow you to save what you've written to the file.
- touch ---> (Q: what does this do?)
- rm ---> remove a file (BE VERY VERY CAREFUL; YOU CANNOT UNDO THIS)
- rm -r ---> allows you to remove a directory (BE EVEN MORE CAREFUL; NOT ONLY CAN YOU NOT UNDO THIS, BUT YOU CAN DELETE YOUR ENTIRE COMPUTER)
- rm -i ---> confirm every file that you delete (very useful with rm -r when you weren't completely sure.)
- wc ---> count lines/words/characters from files (counts every file in the arguments
- wc *.pdb ---> counts for all of the files in the directory that end with .pdb
- cat ---> "concatenate"; with just one file it will print the contents to the screen. With multiple arguments it will print all of those files back-to-back.
- Ctrl-C to get back to the command prompt (you will need to do this if you didn't give `cat` a file argument.)
- sort ----> sort contents of a file alphabetically (or numerically with -n)
- head
- tail
- echo
### Checkpoint questions
# Python - Day I
Lesson Materials: http://swcarpentry.github.io/python-novice-gapminder/
Exported Notebook: https://bsmith89.github.io/2018-05-24-LBNL/files/python-day-1.html
- Please download the data file at http://swcarpentry.github.io/python-novice-gapminder/files/python-novice-gapminder-data.zip
- Unzip this file and put the output directory on your desktop.
- Open a NEW terminal (git bash on Windows)
- `python --version` should output "Python 3.<something> Anaconda <something>" or similar
- IF you get an error put up a pink sticky
- We'll have you add the Anaconda3 directory to your path just like before
- PATH="/c/Users/<USER NAME>/Anaconda3":$PATH
- python --version
jupyter notebook
python 3 notepad
rename the newly reated notepad. saved as .ipynb file
import unmpy as np
np.mean() etc
import pandas
pandas.read_cvs("filename")
pandas.read_cvsSHIFTTAB for help
pandas typically as pd
numpy as np
xx.loc('') for row names
xx.columnname.rowname for element access
import matplotlib.pyplot for plots
matplotlib.pyplot.hist(europe.gdpPercap_1952,bins=20) for histogram
Histogram final:
plt.hist(europe.gdpPercap_1952,bins = 20,alpha=0.5,label='1952')
plt.hist(europe.gdpPercap_1957,bins = 20,alpha=0.5,label='1957')
plt.xlabel('GDP Per-capita (US$2007)')
plt.ylabel('Number of European countries')
plt.legend()
plt.title('Economic shifts in post-war Europe')
print()
Scatter plot:
Jupyter: use !ls for linux command
ctr-m -> command mode
ctr-m m -> markdown (enter cell by hitting enter)
equation in markdown with $$ or $$ $$ (centered)
start plot in jupyter (inline) not extra file
%matplotlib input
then plt.plot
or
%matplotlib notebook (will start an interactive figure)
then plt.figure() and plt.plot()
a=range(0,40,2)
b=list(a). # list is specific to python; vectors in numpy
globbing = pattern match read multtiple files
import pandas as pd
def avg_gdpPercap(country,continent,year):
'''
this does xyz
'''
df = pd.read_csv('gapminder-data/gapminder_gdp_'+continent+'.csv',index_col=0)
c = df.loc[country]
gdp_decade = 'gdpPercap_'+str(year//10)
avg=(c.loc[gdp_decade+'2']+c.loc[gdp_decade+'7'])/2
return avg
for continent in df['continent']:
if continent == 'Africa':
print(1)
elif continent == 'Europe':
print(1/2)
else:
print(0)
import seaborn as sns
sns.get_dataset_names()
To leave a space when concatenating strings, you can add ' '; for example physicist + ' ' + chemist
A mean from numpy would be:
import numpy
numpy.mean(....)
Standard abbreviations are:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
Bad practice (most of the time...) is
from numpy import *
# Python - Day II
Helpful visualization tool: http://pythontutor.com/
Seaborn Tutorial: https://seaborn.pydata.org/tutorial.html
https://seaborn.pydata.org/examples/anscombes_quartet.html
Play Data: kaggle.com
# Version Control with Git and Github
- Please navigate to Socrative.com
- Pick student login
- Type in "SMITHSWC" as the room name.
What does git init do ? One-time. Adds a .git folder (Turns "folder" into a "repository")
What does git status do ? Tells you more info about your "repository"
What does git add do ? Stages any changes since the last "commit"
What does git commit do ? Git saves a snapshot of changes that are staged
What does git log do ? Shows history
Notebook from day 2: https://github.com/jfkoehler/swc-berkeley