ADDITIONAL RESOURCES

SWC Python lessons online: http://swcarpentry.github.io/python-novice-inflammation/
SWC Shell lessons online: http://swcarpentry.github.io/shell-novice/
SWC Git lessons online: http://swcarpentry.github.io/git-novice/

Becca's semester-long programming course:
http://ccbbatut.github.io/Biocomputing_Spring2016/
Amazing book for beginning to program: http://practicalcomputing.org/

Currently this book is not available at the UCSF Library but I will put in a purchase order for it! - Ariel

Other programming resources: http://ccbbatut.github.io/Biocomputing_Spring2016/resources/

Becca says: I also wanted to point out that one of the best things about writing scripts for your analyses is that if you change your datafile (maybe there was an error during the input), it takes no time at all to rerun your analyses. This is also great if you receive comments from a reviewer, asking to transform your data before you run your analyses, again this takes no time at all to redo once your change your input file.

-------------------------------

Potential workflow:
    DATA ---(Excel/bash)---> datafile ---(python/R script)---> cleaned data ---(python/R script)---> analyzed data
    all of these can be run in a bash script ("the glue")

    example bash script
    # run concatenation and cleaning script on data files, write output to a csv file
    python concat_and_clean_files.py data*.csv > clean_data.csv
    # run analysis on clean data file, write output to new text file
    python analyze_data.py clean_data.csv > analyzed_data.txt

    more complicated bash script:
     # run concatenation and cleaning script on data files, write output to a csv file

python concat_and_clean_files.py data*.csv > clean_data.csv
# cycle through 3 param1 options
for param1 in 4 5 10
do

# cycle through 3 param2 options
for param2 in 1 100 10000
do

# run analysis using sys.argv to read parameter permutations into python
# write the output to files with names that reflect each parameter permutation
python analyze_data.py clean_data.csv $param1 $param2 > analyzed_data_$param1_$param2.txt

done

done

Welcome to Software Carpentry
This is the pad for the 2016-04-29 Software Carpentry Workshop at UCSF.
The website for the workshop can be found at <https://bsmith89.github.io/2016-04-29-ucsf>.
We will use this Etherpad to share links and snippets of code, take notes, ask and answer questions, and whatever else comes to mind.
The page displays a screen with three major parts:
* The left side holds today's notes: please edit these as we go along.
* The top right side shows the names of users who are logged in: please add your name and pick the color that best reflects your mood and personality.
* The bottom right is a real time chat window for asking questions of the instructor and your fellow learners.
To start, please add yourself to the attendee list below:
- *Byron Smith (microbial ecology, University of Michigan)
- *Becca Tarvin (Evolutionary Biology, University of Texas at Austin)
- Alex Williams (Gladstone / UCSF, SF)
- Manasi Mayekar (UCSF)
-Juhi Ojha(UCSF)
- Ariel Deardorff (UCSF Library)
- Diego Castaneda, (Psychiatry, UCSF) : )!
Harpreet Zoglauer ( Social Sciences,UCB)
- Beverly Piggott (Neuroscience, UCSF)
- Kathleen Cho (Neuroscience, UCSF)
- Louisa Holmes (Center for Tobacco Control, UCSF)
-Jessica Nielson (Neurosurgery, UCSF)
-An Nguyen (Developmental Biology, UCSF)
- Regina Lutz (Cell Biology, UCSF)
- Victoria Wang (UCSF)
Ajit Shah (UCSF)
Youjin Lee (Medicine,UCSF)
Selim Boudoukha (Biochemistry, UCSF)
- Marin Vujic (Dermatology, UCSF)
Zhongsheng Yu (Biochem&Biophy, UCSF)
Daniel Linnen (UCSF, PhD Student)
Elena Minones-Moyano (Neurology, UCSF)
Serah Choi (Radiation Oncology, UCSF)
Hsin Chen (immunology, UCSF)
Dang Dao (Ophthalmology, UCSF)
Sierra Niblett (Neurology, UCSF)
Phillip Dumesic (Biochemistry, UCSF)
Stella Tran (Diabetes Center, UCSF)
Swetha Mohan (Neurology, UCSF)
Vivi Tolani (Surgerry, UCSF)
Carolina Alquézar (UCSF, Neurology)
Simon Wang (Human Genetics, UCSF)
Sasha Skinner (Neurology, UCSF)
Nirupama Krishnamurthi (Medicine, UCSF)
Rami JAAFAR (DIabetes center)
Vijay Natarajan
kai Zhao(lab medicine,UCSF)
Johannes Thrul (Center for Tobacco Control, UCSF)
Tracy Chow (Biochemistry)
En Cai (Pathology)
Chenling Xiong(BTS, UCSF)
Shyam Srivats (Medicine, UCSF)
(discipline, institution)
(* denotes an instructor)
Users are expected to follow our code of conduct: http://software-carpentry.org/conduct.html
All content is publicly available under the Creative Commons Attribution License: https://creativecommons.org/licenses/by/4.0/
Installation Questions:

Hi Guys! I tried to run the test scripts here: http://bsmith89.github.io/2016-04-29-ucsf/setup/index.html but when I downloaded the files and typed the command into my terminal it said "no such file or directory." Here is the whole message: "LIB-142FVH6-LT:~ arieldeardorff$ python swc-installation-test-1.py python: can't open file 'swc-installation-test-1.py': [Errno 2] No such file or directory." Does the file need to be saved somewhere in particular in order to be run? Thanks!

Hi Ariel - are you in the folder with the python scripts when you typed the command? If not, you need to cd (e.g. cd ~/Downloads/) into it first or type the full file path (python ~/Downloads/swc-installation-test-1.py). Not sure if this is the error or not, let me know if you get it to work. -B
That was it! Thanks Becca! I forgot about cd'ing my way into the correct directory :)

Log into etherpad: http://pad.software-carpentry.org/2016-04-29-ucsf-room1
Shell (Day 1)

Please open up a tab with <https://b.socrative.com/login/student/> and type in "SMITHSWC" where it asks for the Room Name.
Download the files here: http://swcarpentry.github.io/shell-novice/shell-novice-data.zip and unpack them on your Desktop

whoami program that prints username
PS1="$ "
pwd=present working directory from the root of the directory
cd=moves to home directory (change directory; can change differnt directory;no argument takes to home directory or takes to the specified directory)
ls=listing everything in the home directory (ls -F specifies which ones are directory)
ls /Users/us/Desktop look somewhere else
command is a single word no space
space tells computer end of the command
/ used to separate directories
control L moves to top of the screen
cd /Users/us/Desktop/data-shell is absolute path
cd Desktop/data-shell is a relative path (no forward slash)
ls -F: Identifies directories with a / at the end
cd .. moves back to the parent directory one level
cd ../.. moves back to several levels of parent directory
cd . is the current directory
ls -F -a lists all hidden files
cd ~ goes back to home directly
cd ~ /directory
tab will autocomplete (short-cut)
mkdir name will create a new directory
nano draft.txt will open a text editor

Control O will save text file
Enter
Contol X to exit text file editor

cat to look at content of file
less to look at file only portion that fits current screen
Q to exit less view
head - n 5 aldrin.pdb - will show first 5 lines of aldrin.pdb
rm draft.txt to remove file (this is irreversible)
rmdir thesis to remove directory but only if the directory is empty
rmdir -r to remove directory even if not empty (-r means recursive)
control c cancels the current command (kind of like the "Escape" key on Windows / Mac)
cp quotes/shakespeare.txt . (copy file to current directory)
cp file1 file2 directory/
mv shakespeare.text quotes/thebard.txt (move to directory quotes and rename)
mv file directory moves file to directory
mv can also be used to rename file e.g., mv quotes/shakespear.txt quotes/thebard.txt - will rename txt file

Usually, if there is no output, then the command worked.
If there is an error, the error will display.
Ctrl+c will not run the code that is typed.
With a spanish keyboard and the Spanish (ISO) layout I get the ~ character with alt + ñ.
less aldrin.pdb (show what will fit on the screen) Q to exit
head -n 5 aldrin.pdb (allows one to see top of the few lines of the file)
wc aldrin.pdb (word count: lines, words, characters)
wc -l will just show number of lines
* is wildcard - if you use wc -l *.pdb will show all .pdb items in directly and number of lines of each document
wc -l *.pdb (show word count for all files)
man command (manual)
Q or control C to get out
wc -l *.pdb > lines.txt (print all lines output to lines.txt)
wc --help -this will print the help info for the wc command. can replace wc with another command to get that command's help info.
wc -l *.pdb > lines.txt -this will print the number of lines for all the files in a text file named "lines.txt" instead of printing in the window. If lines.txt exists already, this will overwrite that file. Helpful to check with tab to see if it exists
sort -n lines.txt "n" flag indicates numerical ordering
sort -n lines.txt > sorted_lines.txt write to a sorted file
wc --help works for git-bash instead of man wc
have a hanging line error (no $ prompt and no response from shell)? hit Ctrl+c to reset
wc -l *.pdb | sort -n | head -n 1 combines/pipes all of the commands, executes one after the other, using the output of earlier as input of later command the -n 1 flag on the head command specifies that you want to list 1 line (can change to 2, 3, etc.) tail is the opposite of head. tail will give you the end of the txt file.
piping
wc -l *.pdb | sort -n | tail -n 2 |head -n 1
sort salmon.txt | uniq > unique_salmon.txt
uniq salmon.txt // uniq works by lines - collapses lines if they don't differ
sort salmon.txt | uniq // if you sort first alphabetically it will collapse entire categories
sort salmon.txt | uniq > unique_salmon.txt
loop
for filename in basilisk.dat unicorn.dat
> do
> head -n 3 $filename
> done
for filename in *.dat // will loop over all .dat files in directory
> do
> head -n 3 $filename
> done

Things you learned (maybe):

What is up with "ls -F"? (this flags the items that are folders with a /)
How about "ls -a?" (shows hidden files)
Q: why does "ls" work but "LS" doesn't? Answer: the shell is ornery and annoying.
If you had a computer in like 1993: it's the same as "dir" on old DOS.

"print where we are" ("print working directory"? "present working directory?")

"change directory" — move where we are.

MKDIR
RM

Note: please don't accidentally remove all your files! (don't use -r as this means recursive)

MV ("move a file")
CAT ("catastrophically print an entire text file onto my terminal, causing it to scroll many pages probably." Actually short for "concatenate")
wc
sort

Note: sort doesn't normally understand numbers! 2 comes after 111.

head (and its opposite, "tail")
less (this is like a really basic text viewer. Press 'q' to exit it)

Python (Day 1 afternoon)
Find materials for today's lesson here https://github.com/rdtarvin/swc_UCSF_python
Download the directory with data and scripts to your computer. Here's how:

Way #1 (requires that GIT be installed):

cd ~/Desktop # or wherever you want this folder to be
git clone https://github.com/rdtarvin/swc_UCSF_python
Just click "Download ZIP" on https://github.com/rdtheadarvin/swc_UCSF_python

OR Way #2:OR Way #2:

If you want to access my python notebook, which I will update during the lesson, just navigate to https://github.com/rdtarvin/swc_UCSF_python/blob/master/RDT_notebook.ipynb and refresh the page. Access here the notbook from yesterday

unzip file
go to directory with swc_UCSF_python-master
ipython notebook RDT_notebook.ipynb // opens notebook in browser

Shift-enter

In notebook

import numpy
numpy.loadtxt(fname='data/inflammation-01.csv', delimiter=',') // imports dataset

variable types
number=5.5 # float
number2=4 # integer
word="AGTC" # string
word[0] start counting at zero
word[0:2] # gives first two letters (letter 0 to 2, excluding 2)
word[-1] # gives last item

len(word) # gives length of word_

indexing
Bracket notation
word[1] # Gives you second letter of word
# Python starts counting at 0
data [row,column]
data.shape --> number of rows and column
data.mean --> mean of the whole data set
data[1].max() --> default is row, max value
data[:,20] --> all data in column 20
data[:,20].mean --> mean of all numbers in column 20

axis=0 --> axis running vertically downward across rows
axis=1 --> axis running horizontally across columns
data.mean(axis=0) --> mean of each row, meaning we operate along columns

plotting
%matplotlib inline
import matplotlib.pyplot
image=matplotlib.pyplot.imshow(data)

ave_plot = matplotlib.pyplot.plot(ave_inflammation)
max_plot = matplotlib.pyplot.plot(data.max(axis=0))
max_plot = matplotlib.pyplot.plot(data.min(axis=0))

import numpy
import matplotlib.pyplot

data = numpy.loadtxt(fname='data/inflammation-01.csv',delimiter=',')

# set up graphing space
fig = matplotlib.pyplot.figure(figsize=(10.0,3.0))

# position separate graphs
axes1 = fig.add_subplot(1,3,1)
axes2 = fig.add_subplot(1,3,2)
axes3 = fig.add_subplot(1,3,3)

axes1.set_ylabel('average')
axes1.plot(data.mean(axis=0))

axes1.set_ylabel('max')
axes1.plot(data.max(axis=0))

axes1.set_ylabel('min')
axes1.plot(data.min(axis=0))

fig.tight_layout()

matplotlib.pyplot.show()

if change axes1 to 2 and 3 will graph on separate graphes

Loops in Python
Python 2 no parentheses
Python 3 needs parentheses
word = AGTC
superword=word*10

for letter in superword:
    print(letter)

for letter in word:
    print(letter)


for letter in word:
print letter

[] will print letter at position

for letter in range(len(word)):
    print(word[letter])

for letter in word:
    print(letter)

# both of the two loops above will produce the same results

for letter in range(len(word)):
    print "The index is", letter
    print "The index at this index is", word[letter]


Lists in Python

nonsense=[]
nonsense.append('This is my first list addition')
# notice the differences in brackes - if you generate list [] - if you append to a list ()
nonsense.append(5)
nonsense.append(sequences)

nonsense[-1] # index list
nonsense[-1][0] # index list within a list

Loops exercise

wholeword=''
for item in newword:
    print("wholeword before addition:", wholeword)
    wholeword=wholeword+item
    print("wholeword after addition:", wholeword)

wholeword=''
for item in range(0,len(newword),2):
    print(newword[item])


list can change; string cannot be changed
***when dealing with loops, needs to indent otherwise loop does not work
wholeword=''
for item in range(len(newword)-1,-1,-1):
    newword[item]
    wholeword=wholeword+newword[item]
print wholeword

newword[::-1]

Let's reverse a string! (Without checking stackoverflow):

    Try #1:
    for some letter in a word:

somehow print it in reverse order??

Try #2

Ok, let's break the word up first. Actually, we don't have to, because we can access each element number "n" as word[ n ]
So:
word = "Newton"
reversed = "" # ok, start it with nothing
n = length of word
for (number i goes from (n-1) down to 0):
reversed = reversed + word[ i ]
print("Ok, the reversed thing is: " + reversed)
Probably that will work! But you ahve to figure out how to do the for loop!

Try #3: # working solution

word1="Newton"
word2=""
for item in range(len(word1)-1,-1,-1):
word2=word2+word1[item]
print(word2)

Decision rules in Python
if ... :

## Remember that the indent (4 spaces) is important. This denotes a "block" of code, which we execute *if* the boolean is True.

elif ... :
    ##
else ... :
    ##

# will exit as soon as it hits the first condition that is true

Python Day 2 (morning)

Notebook for today can be found at: https://github.com/rdtarvin/swc_UCSF_python/blob/master/SWC_Day2_notebook.ipynb
You can also download the notebook from yesterday by clicking

"Download ZIP" on https://github.com/rdtarvin/swc_UCSF_python

I'll be working in python 3 now!

Functions!

def <function_name>(<arguments>, <default_value_names>=<default_values>):
    <code_to_execute>
    return <what_to_return>

def is_animal_a_snake(animal):
    if (animal == "snake")

return "yep it totally is"
else:

return "guess not"

Python 2 vs 3 "fun" fact #1:
Are you annoyed that "2 / 3" is 0 (instead of 0.66667) in python 2? Fix that by putting this in the top of your python 2 file:

from __future__ import division
This was added to python in 2001

Python 2 vs 3 "fun" fact #2:

Why are there both python 2 and python 3? Isn't one an upgrade to the other?
Python 3 has a few new features, but also doesn't support some speed-increasing python 2 features—the main purpose is so nerds can fight a holy war over which is better.

# Add function
def add(a,b):
    result = a+b
    return result

# need return statement to be able to save output value into variable
# return will stop function, if you have another line with print below return, it will not print value

#Tuple argument
def display(a=1,b=2,c=3):
    print("a:", a, "b:", b, "c:", c)
    return (a,b,c)

#display(x) to replace value in default tuple

### CHALLENGE:
    write a function called fence that takes 2 arguments called original and wrapper and returns a string
    a string: fence('name','*')
#solution
def fence(original, wrapper):
    answer=wrapper+original+wrapper
    print(answer)
    return answer

## Script from yesterday
%matplotlib inline

import numpyimport matplotlib.pyplot
import glob

filenames = glob.glob('data/inflammation*.csv')
bad_maxes = 0
bad_mins = 0
ok_graphs = 0
for my_file in filenames:
    print("File being analyzed is", my_file)
    data = numpy.loadtxt(fname=my_file,delimiter=',')

    ## check for suspicious data
    if data.max(axis=0)[0] == 0 and data.max(axis=0)[20] == 20:
        print("suspicious looking maxima")
        bad_maxes += 1
    elif data.min(axis=0).sum() == 0:
        print("Minima add up to zero!")
        bad_mins += 1
    else:
        print("seems ok!")
        ok_graphs += 1

    # set up graphing space
    fig = matplotlib.pyplot.figure(figsize=(10.0,3.0))

    # position separate graphs
axes1 = fig.add_subplot(1,3,1)    axes2 = fig.add_subplot(1,3,2)
    axes3 = fig.add_subplot(1,3,3)

    # set y axis labels and tell it what to plot
    axes1.set_ylabel('average')
    axes1.plot(data.mean(axis=0))

    axes2.set_ylabel('max')
    axes2.plot(data.max(axis=0))

    axes3.set_ylabel('min')
    axes3.plot(data.min(axis=0))

    fig.tight_layout()

Challenge #2
def fence(a,b):
    result = b+a+b
    print(result)
    return result

fence("name","*")

## Break up scripts into separate functions

%matplotlib inline
import numpy
import matplotlib.pyplot
import glob

def analyze(filename):
        data = numpy.loadtxt(fname=filename,delimiter=',')

        # set up graphing space
        fig = matplotlib.pyplot.figure(figsize=(10.0,3.0))

        # position separate graphs
        axes1 = fig.add_subplot(1,3,1)
        axes2 = fig.add_subplot(1,3,2)
        axes3 = fig.add_subplot(1,3,3)

        # set y axis labels and tell it what to plot
        axes1.set_ylabel('average')
        axes1.plot(data.mean(axis=0))

        axes2.set_ylabel('max')
        axes2.plot(data.max(axis=0))

        axes3.set_ylabel('min')
        axes3.plot(data.min(axis=0))

        fig.tight_layout()

        matplotlib.pyplot.show()


def detect_problems(filename):
    bad_maxes=0
    bad_mins=0
    ok_graphs=0
    data = numpy.loadtxt(fname=filename,delimiter=',')
        ## check for suspicious data
    if data.max(axis=0)[0] == 0 and data.max(axis=0)[20] == 20:
        print ("suspicious looking maxima")
        bad_maxes+=1
    elif data.min(axis=0).sum() == 0:
        print ("Minima add up to zero!")
        bad_mins+=1
    else:
        print ("seems ok!")
        ok_graphs+=1
    return(bad_maxes, bad_mins, ok_graphs)

def main():
    filenames=glob.glob('data/inflammation*.csv')
    for my_file in filenames:
        print("file being analyzed is:", my_file)
        analyze(my_file)
        detect_problems(my_file)

main()

## Build in assertion errors
numbers=[3,5,6,-1,-7,0,10]
total = 0
for n in numbers:
    print(n)
    assert n>0, "data should only contain positive values"
    total += n
print('total is:', total)

## Assertion statements
def normalize_rectangle(rect):
    '''
    Normalizes a rectangle so that it is at the origin and 1.0 units long on its longest axis.
    '''

## Git

$ git config --global user.name ""
$ git config --global user.email ""
$ git config --global color.ui "auto"
$ git config --global core.editor "nano -W"
$ git status

nano instructions.txt # add some lines to the file
git add instructions.txt # add to be tracked by git
git commit # add a message saying file was created

nano instructions.txt # change some lines in the file
git add instructions.txt # add agin to be tracked by git
git commit -m "message describing change"

git diff <filename> will show differences between files, if they are not committed yet

## if you've already commited changes
$ git diff HEAD~1 instructions.txt # will go back one change

$ git checkout HEAD~1 ingredients.txt # take a previous version from committed versions