ADDITIONAL RESOURCES


SWC Python lessons online: http://swcarpentry.github.io/python-novice-inflammation/
SWC Shell lessons online: http://swcarpentry.github.io/shell-novice/
SWC Git lessons online: http://swcarpentry.github.io/git-novice/


Becca's semester-long programming course:
http://ccbbatut.github.io/Biocomputing_Spring2016/
Amazing book for beginning to program: http://practicalcomputing.org/
Other programming resources: http://ccbbatut.github.io/Biocomputing_Spring2016/resources/

Becca says: I also wanted to point out that one of the best things about writing scripts for your analyses is that if you change your datafile (maybe there was an error during the input), it takes no time at all to rerun your analyses. This is also great if you receive comments from a reviewer, asking to transform your data before you run your analyses, again this takes no time at all to redo once your change your input file.




-------------------------------




Potential workflow:
    DATA ---(Excel/bash)---> datafile ---(python/R script)---> cleaned data ---(python/R script)---> analyzed data
    all of these can be run in a bash script ("the glue")
    
    example bash script
    # run concatenation and cleaning script on data files, write output to a csv file
    python concat_and_clean_files.py data*.csv > clean_data.csv
    # run analysis on clean data file, write output to new text file
    python analyze_data.py clean_data.csv > analyzed_data.txt
    
    more complicated bash script:
     # run concatenation and cleaning script on data files, write output to a csv file
Things you learned (maybe):
  1. LS
    1. What is up with "ls -F"? (this flags the items that are folders with a /)
    2. How about "ls -a?" (shows hidden files)
    3. Q: why does "ls" work but "LS" doesn't? Answer: the shell is ornery and annoying.
    4. If you had a computer in like 1993: it's the same as "dir" on old DOS.
  2. PWD
    1. "print where we are" ("print working directory"? "present working directory?")
  3. CD
    1. "change directory" — move where we are.
  4. MKDIR
  5. RM
    1. Note: please don't accidentally remove all your files! (don't use -r as this means recursive)
  6. MV ("move a file")
  7. CAT ("catastrophically print an entire text file onto my terminal, causing it to scroll many pages probably." Actually short for "concatenate")
  8. wc
  9. sort
    1. Note: sort doesn't normally understand numbers! 2 comes after 111.
  10. head (and its opposite, "tail")
  11. less (this is like a really basic text viewer. Press 'q' to exit it)



















unzip file
go to directory with swc_UCSF_python-master
ipython notebook RDT_notebook.ipynb // opens notebook in browser

Shift-enter

In notebook

import numpy
numpy.loadtxt(fname='data/inflammation-01.csv', delimiter=',') // imports dataset

variable types
number=5.5 # float
number2=4 # integer
word="AGTC"  # string
word[0] start counting at zero
word[0:2] # gives first two letters (letter 0 to 2, excluding 2)
word[-1]  # gives last item

len(word) # gives length of word_



indexing
Bracket notation
word[1] # Gives you second letter of word
# Python starts counting at 0
data [row,column]
data.shape --> number of rows and column
data.mean --> mean of the whole data set
data[1].max() --> default is row, max value
data[:,20] --> all data in column 20
data[:,20].mean --> mean of all numbers in column 20

axis=0 --> axis running vertically downward across rows
axis=1 --> axis running horizontally across columns
data.mean(axis=0) --> mean of each row, meaning we operate along columns

plotting
%matplotlib inline
import matplotlib.pyplot
image=matplotlib.pyplot.imshow(data)

ave_plot = matplotlib.pyplot.plot(ave_inflammation)
max_plot = matplotlib.pyplot.plot(data.max(axis=0))
max_plot = matplotlib.pyplot.plot(data.min(axis=0))

import numpy
import matplotlib.pyplot

data = numpy.loadtxt(fname='data/inflammation-01.csv',delimiter=',')

# set up graphing space
fig = matplotlib.pyplot.figure(figsize=(10.0,3.0))

# position separate graphs
axes1 = fig.add_subplot(1,3,1)
axes2 = fig.add_subplot(1,3,2)
axes3 = fig.add_subplot(1,3,3)

axes1.set_ylabel('average')
axes1.plot(data.mean(axis=0))

axes1.set_ylabel('max')
axes1.plot(data.max(axis=0))

axes1.set_ylabel('min')
axes1.plot(data.min(axis=0))

fig.tight_layout()

matplotlib.pyplot.show()

if change axes1 to 2 and 3 will graph on separate graphes


Loops in Python
Python 2 no parentheses
Python 3 needs parentheses
word = AGTC
superword=word*10

for letter in superword:
    print(letter)
    
for letter in word:
    print(letter)
    

for letter in word:
print letter

[] will print letter at position

for letter in range(len(word)):
    print(word[letter])
    
for letter in word:
    print(letter)
    
# both of the two loops above will produce the same results

for letter in range(len(word)):
    print "The index is", letter 
    print "The index at this index is", word[letter]
    

Lists in Python

nonsense=[]
nonsense.append('This is my first list addition')
# notice the differences in brackes - if you generate list [] - if you append to a list ()
nonsense.append(5)
nonsense.append(sequences)

nonsense[-1] # index list
nonsense[-1][0] # index list within a list


Loops exercise

wholeword=''
for item in newword:
    print("wholeword before addition:", wholeword)
    wholeword=wholeword+item
    print("wholeword after addition:", wholeword)
    
wholeword=''
for item in range(0,len(newword),2):
    print(newword[item])
    
    
list can change; string cannot be changed
***when dealing with loops, needs to indent otherwise loop does not work
wholeword=''
for item in range(len(newword)-1,-1,-1):
    newword[item]
    wholeword=wholeword+newword[item]
print wholeword 

newword[::-1]

Let's reverse a string! (Without checking stackoverflow):
    
    Try #1:
    for some letter in a word:

Try #2
Try #3: # working solution

Decision rules in Python
if ... :
elif ... :
    ##
else ... :
    ##
    
# will exit as soon as it hits the first condition that is true





Python Day 2 (morning)


Notebook for today can be found at: https://github.com/rdtarvin/swc_UCSF_python/blob/master/SWC_Day2_notebook.ipynb
You can also download the notebook from yesterday by clicking 
I'll be working in python 3 now!



Functions!


def <function_name>(<arguments>, <default_value_names>=<default_values>):
    <code_to_execute>
    return <what_to_return>


def is_animal_a_snake(animal):
    if (animal == "snake")

Python 2 vs 3 "fun" fact #1:
Are you annoyed that "2 / 3" is 0 (instead of 0.66667) in python 2? Fix that by putting this in the top of your python 2 file:

Python 2 vs 3 "fun" fact #2:

# Add function
def add(a,b):
    result = a+b
    return result
    
# need return statement to be able to save output value into variable
# return will stop function, if you have another line with print below return, it will not print value

#Tuple argument
def display(a=1,b=2,c=3):
    print("a:", a, "b:", b, "c:", c)
    return (a,b,c)
    
#display(x) to replace value in default tuple

### CHALLENGE:
    write a function called fence that takes 2 arguments called original and wrapper and returns a string
    a string: fence('name','*')
#solution
def fence(original, wrapper):
    answer=wrapper+original+wrapper
    print(answer)
    return answer



## Script from yesterday
%matplotlib inline

import numpyimport matplotlib.pyplot
import glob

filenames = glob.glob('data/inflammation*.csv')
bad_maxes = 0
bad_mins = 0
ok_graphs = 0
for my_file in filenames:
    print("File being analyzed is", my_file)
    data = numpy.loadtxt(fname=my_file,delimiter=',')
    
    ## check for suspicious data
    if data.max(axis=0)[0] == 0 and data.max(axis=0)[20] == 20:
        print("suspicious looking maxima")
        bad_maxes += 1
    elif data.min(axis=0).sum() == 0:
        print("Minima add up to zero!")
        bad_mins += 1
    else:
        print("seems ok!")
        ok_graphs += 1

    # set up graphing space
    fig = matplotlib.pyplot.figure(figsize=(10.0,3.0))

    # position separate graphs
axes1 = fig.add_subplot(1,3,1)    axes2 = fig.add_subplot(1,3,2)
    axes3 = fig.add_subplot(1,3,3)

    # set y axis labels and tell it what to plot
    axes1.set_ylabel('average')
    axes1.plot(data.mean(axis=0))

    axes2.set_ylabel('max')
    axes2.plot(data.max(axis=0))

    axes3.set_ylabel('min')
    axes3.plot(data.min(axis=0))

    fig.tight_layout()

Challenge #2
def fence(a,b):
    result = b+a+b
    print(result)
    return result

fence("name","*")


## Break up scripts into separate functions

%matplotlib inline
import numpy
import matplotlib.pyplot
import glob


def analyze(filename):
        data = numpy.loadtxt(fname=filename,delimiter=',')

        # set up graphing space
        fig = matplotlib.pyplot.figure(figsize=(10.0,3.0))

        # position separate graphs
        axes1 = fig.add_subplot(1,3,1)
        axes2 = fig.add_subplot(1,3,2)
        axes3 = fig.add_subplot(1,3,3)

        # set y axis labels and tell it what to plot
        axes1.set_ylabel('average')
        axes1.plot(data.mean(axis=0))

        axes2.set_ylabel('max')
        axes2.plot(data.max(axis=0))

        axes3.set_ylabel('min')
        axes3.plot(data.min(axis=0))

        fig.tight_layout()

        matplotlib.pyplot.show()

    
def detect_problems(filename):
    bad_maxes=0
    bad_mins=0
    ok_graphs=0
    data = numpy.loadtxt(fname=filename,delimiter=',')
        ## check for suspicious data
    if data.max(axis=0)[0] == 0 and data.max(axis=0)[20] == 20:
        print ("suspicious looking maxima")
        bad_maxes+=1
    elif data.min(axis=0).sum() == 0:
        print ("Minima add up to zero!")
        bad_mins+=1
    else:
        print ("seems ok!")
        ok_graphs+=1
    return(bad_maxes, bad_mins, ok_graphs)

def main():
    filenames=glob.glob('data/inflammation*.csv')
    for my_file in filenames:
        print("file being analyzed is:", my_file)
        analyze(my_file)
        detect_problems(my_file)

main()


## Build in assertion errors
numbers=[3,5,6,-1,-7,0,10]
total = 0
for n in numbers:
    print(n)
    assert n>0, "data should only contain positive values"
    total += n
print('total is:', total)


## Assertion statements
def normalize_rectangle(rect):
    '''
    Normalizes a rectangle so that it is at the origin and 1.0 units long on its longest axis.
    '''
    
## Git

$ git config --global user.name ""
$ git config --global user.email ""
$ git config --global color.ui "auto"
$ git config --global core.editor "nano -W" 
$ git status

nano instructions.txt # add some lines to the file
git add instructions.txt # add to be tracked by git
git commit # add a message saying file was created

nano instructions.txt # change some lines in the file
git add instructions.txt # add agin to be tracked by git
git commit -m "message describing change"

git diff <filename> will show differences between files, if they are not committed yet

## if you've already commited changes
$ git diff HEAD~1 instructions.txt # will go back one change

$ git checkout HEAD~1 ingredients.txt # take a previous version from committed versions