## IntroductionOver the past 2 decades, new biological tools and additional computing power have allowed scientist to collect and examine more data than ever before. A scientist in Eve Marder's lab here at Brandeis created 20 million different models of a 3-cell pacemaker circuit and discovered that a variety of combinations of cell intrinsic properties and synaptic properties can give rise to circuits that perform the same action (Prinz and Marder, 2004). Scientists in Sacha Nelson's lab used microarrays to examine the RNA expression patterns of 12,556 genes in 12 different cell types in the brain in order to understand which patterns of genes are expressed in different cell types (Sugino et al. 2006). And scientists in Larry Wangh's lab are using innovative polymerase chain reaction (PCR) techniques and pattern recognition methods to identify which DNA sequences are present in diseases such as tuberculosis and cancer. In this unit we will study vector and matrix representations of data. We have been doing this all along, but in this lab we will look specifically at how we can take advantage of this structure to examine our data. By paying attention to the structure of our data and arranging it in certain ways, we will be able to apply some tools from linear algebra to apply transformations such as scalings, rotations, reflections, and other interesting transforms. Further, these tools will allow us to analyze very high-dimensional data sets (in Lab 5.2) to pull out the most informative dimensions. ## Vector and matrix representationsLet's think a bit more actively about data points as vectors or matrix values. Let's consider the following set of ordered pairs (you can cut and paste): v=[ 0.45 1.0; 0.65 1.0; 0.65 0.8; 0.6 0.8; 0.6 0.75; 0.85 0.75; 1.05 1; 1.10 0.95; 0.9 0.65; 0.75 0.65; 0.75 0.1; 0.35 0.1; 0.35 0.65; 0.25 0.3; 0.25 0.1; 0.2 0.1; 0.2 0.35; 0.3 0.75; 0.5 0.75; 0.5 0.8; 0.45 0.8; 0.45 1.0 ]; Here, each row is a different ordered pair with 2 dimensions. We can plot these points: figure; plot(v(:,1),v(:,2),'b-o'); We could also represent the same data in column form, using 1 data point per column, using the transpose operator (a single apostrophe '):vt = v' We can also re-arrange dimensions when the dimension of the matrix is larger than 2 using the permute function, which is the N-dimensional generalization of the transpose operation:v = permute(v,[1 2]) % does nothing vt = permute(v,[2 1]) % does transpose a = rand(1,2,3) % makes a 3 dimensional matrix, 1x2x3 with random elements at = permute(a,[3 2 1]) % re-arranges order to be 3 x 2 x 1 Performing common operations across a dimension Most of the common Matlab routines, such as mean, median, or sum, allow you to specify the dimension along which the computation is to be performed. For example: mean(v) % computes across the 1st dimension; that is, across columns mean(v,1) % tells mean to use 1st dimension, same as mean(v) mean(v,2) % tells mean to use 2nd dimension mean(vt,1) mean(vt,2) sum(v) sum(v,1) sum(v,2) Q1: How does mean(v,2) differ in dimension from mean(v,1)? How does mean(vt,1) differ in arrangement from mean(v,2)?## Linear transformations and Affine transformationsWhen we have our data in the form of a vector or a matrix, it is very easy to perform linear transformations or affine transformations on the data. A linear transformation (also known as a linear projection or a linear mapping) is a change-of-coordinates with a linear function. Suppose we have a set of ordered pairs (xi, yi). We can perform a linear transformation by using the equations: xi* = a*xi + b*yi yi* = c*xi + d*yi The act of performing the transform is usually noted mathematically as T(x,y). A linear transformation has to obey the following rules: T(x1+x2,y1+y2) = T(x1,y1) + T(x2,y2) (additivity) T(a*x1, a*y1) = a*T(x1,y1) (homogeneity) An affine transformation is related to a linear transformation except that translations are allowed before or after the linear transformation: xi* = a*xi + b*yi + e yi* = c*xi + d*yi + f When we write our data points in the form of column vectors, so that each column corresponds to a new data point (as the variable vt above), then we can express linear transformations very easily in the form of matrix multiplication: A = [ a b ; c d] then by matrix multiplication A * [ x1 ; y1] = [ a*xi + b*yi; c*xi + d*yi] Here we'll examine several common and useful affine and linear transformations: ## TranslationThe first common operation we often want to perform is to translate (or shift) all of our points by a particular value. If we are lucky enough that we want to shift all dimensions in the same way, then we can simply write: vt_shift = vt + 5; figure(100); plot(vt(1,:),vt(2,:),'b-o'); hold on; plot(vt_shift(1,:),vt_shift(2,:),'r-x'); title('Original in blue, change in red'); (Leave this figure open.) But often we are not so lucky. We might want to shift each dimension in a different way. In this case, we have to build our own matrix to subtract using the repmat command that we first saw in Lab 2.3. Suppose we want to shift the points by 5 in x, and -5 in y. Then we can create a matrix that is the same size as vt: shift = repmat([5; -5], 1, size(vt,2)) % 1 row, same number of columns as vt vt_shift2 = vt + shift; figure(100); hold on; plot(vt_shift2(1,:),vt_shift2(2,:),'k-^'); title('Original in blue, change1 in red, change2 in black'); (Again, leave the figure open.) One common translation we want to make with experimental data is to remove the mean. The usefulness of this will be apparent very soon, when we look at scaling and rotation: vt_submean = vt - repmat(mean(vt,2),1,size(vt,2)); figure(100); hold on; plot(vt_submean(1,:),vt_submean(2,:),'m-d'); ## ScalingScaling is a very simple linear transformation we often want to apply to the data. vt_scale = 5*vt; figure(101); plot(vt(1,:),vt(2,:),'b-o'); hold on; plot(vt_scale(1,:),vt_scale(2,:),'r-x'); title('Original in blue, scale1 in red'); This scales all of the points. We can also scale all the points but remove the mean first, so that we scale with respect to the center of the points: vt_scale2 = 5*vt_submean; figure(101); hold on; plot(vt_scale2(1,:),vt_scale2(2,:),'k-^'); title('Original blue, scale1 red, scale2 black'); This again scales all of the points. Sometimes we might want to scale the dimensions differently. For example, we might want to scale x by 5 but only scale y by 2. To do this, we can multiply by the matrix [5 0; 0 2]. Think about the matrix multiplication; maybe write it out on a sheet of paper. vt_scale3 = [5 0; 0 2]*vt_submean; figure(101); hold on; plot(vt_scale3(1,:),vt_scale3(2,:),'m-d'); title('Original blue, scale1 red, scale2 black, scale3 magenta'); ## RotationRotation by an angle is also a linear transformation. The rotation matrix in 2-D for an angle theta in radians is as follows: [cos(theta) -sin(theta) ; sin(theta) cos(theta) ] For this transformation, I have written a function (because I can never remember where the negative sign goes and which ones are cosines and which are signs) (save in a folder in tools called math, be sure to add math to your path): The rotation is around the origin, so there is a big difference whether or not you rotate the points in place, or first translate the points to be centered on the origin: vt_rot1 = rot2d(45*pi/180)*vt; % 45 degrees, converted to radians vt_rot2 = rot2d(45*pi/180)*vt_submean; figure(102); plot(vt(1,:),vt(2,:),'b-o'); hold on; plot(vt_rot1(1,:),vt_rot1(2,:),'r-x'); plot(vt_rot2(1,:),vt_rot2(2,:),'k-^'); title('Original blue, rot1 red, rot2 black'); We can create a fun animation to rotate this around the clock. angles = 0:5:360; h1 = []; h2 = []; for i=1:length(angles),
end; ## ReflectionThe last major geometrical linear transformation is reflection about an angle theta: The matrix for a 2-dimensional reflection about theta is: [cos(2* theta) sin(2* theta) ; sin(2* theta) -cos(2* theta) ]; Once again, I've written a function for this transformation (save in the folder in tools called math): We can explore this transformation in the same way we did for rotation: vt_refl1 = refl2d(45*pi/180)*vt; % 45 degrees, converted to radians vt_refl2 = refl2d(45*pi/180)*vt_submean; figure(104); plot(vt(1,:),vt(2,:),'b-o'); hold on; plot(vt_refl1(1,:),vt_refl1(2,:),'r-x'); plot(vt_refl2(1,:),vt_refl2(2,:),'k-^'); title('Original blue, refl1 red, refl2 black'); And again we can create a fun animation to reflect this around the clock. angles = 0:5:360; h1 = []; h2 = []; for i=1:length(angles),
end; ## Compositions of transformationsA beautiful thing about linear transformations is that they can be composed super easily: it's just matrix multiplication. Here's an example of a scaling, rotation, and reflection in 1 line: vt_trans = [5 0 ; 0 2] * rot2d(45*pi/180) * refl2d(0) * vt_submean figure(106); plot(vt(1,:),vt(2,:),'b-o'); hold on; plot(vt_trans(1,:),vt_trans(2,:),'r-x'); title('Original blue, trans red'); axis equal % use same size of x and y axis Now that we are armed with these linear and affine transformations (which have many many applications) we will be equipped for dimensionality reduction in Lab 5.2. Let's now turn to looking at higher dimensional spaces (that is, when the dimension is greater than 2). ## Plotting in higher dimensional spacesIt is quite common in science to generate data that has 3 or dimensions, but making sense out of such data and showing the results to others presents special challenges. ## Continuously sampled data data (such as time series); surfacesConsider the zebra finch song that we analyzed in Lab 3.2 (attached below). The spectrogram of the zebra finch song is essentially a 3-d piece of information; at each point in time (1), we have a power/intensity value (2) at each frequency (3). Recall: [wave,fs] = audioread('song.wav'); figure; spectrogram(wave,256,[],[],fs,'yaxis'); (Leave this figure open.) The spectrogram method demonstrates a powerful way to display 3d information by essentially reducing it down to 2 dimensions in the form of an image (using color to code for power). One could look at a spectrogram in a paper and interpret it easily. The spectrogram actually function plots a surface, which is a 3-dimensional structure. To illustrate how to do this in Matlab, let's provide output arguments to the spectrogram function, so we can get the data and plot it ourselves using the same functions that spectrogram calls.[s,f,t,p] = spectrogram(wave,256,[],[],fs,'yaxis'); size(f) % the frequency dimension size(t) % the time dimension size(p) % the power value for each time, frequency The Matlab function that plots surfaces is called surf (see help surf).figure; surf(t,f,10*log10(abs(p)),'EdgeColor','none'); axis xy; axis tight; colormap(jet); xlabel('Time'); ylabel('Frequency (Hz)'); Q2: Look at the 3D view. You can use the "Rotate 3-D" tool (the box with the circular arrow around it). Which view do you prefer, the 2-D view or the 3-D view? If you were going to put an image in a paper, which do you think would be easier for readers to digest? It is possible to adjust the angle of view on the command line using the view command (see help view):view(0,90); % changes to 2d perspective view(-37.5, 30); % changes to default 3-d view There are a lot of options in Matlab for viewing data in 3-D. A 3-d surface plot can be combined with a contour plot: figure; surfc(t,f,10*log10(abs(p)),'EdgeColor','none'); axis xy; axis tight; colormap(jet); xlabel('Time'); ylabel('Frequency (Hz)'); or to use another bit of example data (derived from the peaks function in Matlab, which is an example surface for demoing the suface commands):figure; [X,Y,Z] = peaks(30); surfc(X,Y,Z) colormap hsv axis([-3 3 -3 3 -10 5]) ## High-dimensional data pointsFinally, often we have individual data points with many parameters. As an example, there is a famous set of data of 3 different Iris species ( Iris setosa, Iris versicolor, and Iris virginica) that was collected by Edgar Anderson and made famous by RA Fisher, who used the data to demonstrate linear discriminate analysis. (Check the link to see pictures of the 3 flower species.) Anderson measured 4 parameters of several flowers of each species: sepal length, sepal width, petal length, and petal width. Therefore, each flower is a 4-dimensional data point.iris_setosa = load('iris_setosa.txt','-ascii') iris_versicolor = load('iris_versicolor.txt','-ascii') iris_virginica = load('iris_virginica.txt','-ascii') How should we plot this data? We could use the built-in function plot3d (see help plot3d) and examine 3-dimensions at a time. figure; plot3(iris_setosa(:,1),iris_setosa(:,2),iris_setosa(:,3),'bo'); hold on; plot3(iris_versicolor(:,1), iris_versicolor(:,2), iris_versicolor(:,3),'rx'); plot3(iris_virginica(:,1), iris_virginica(:,2), iris_virginica(:,3),'k^'); xlabel('Sepal length'); ylabel('Sepal width'); zlabel('Petal length'); Q3: How do these 3 species differ? (You might need to use the Rotate 3-D tool to appreciate the different parameters.)But, you complain, this is lame, we are leaving out one of the dimensions of the data. How can we represent all data points at once? ## High-dimensional scatterplotWe can do this with a multidimensional scatter plot of the data. We first encountered scatter plots of 2-dimensions in Lab 2.1. Recall we can plot figure; plot(iris_setosa(:,1),iris_setosa(:,2),'bo'); hold on; plot(iris_versicolor(:,1), iris_versicolor(:,2),'rx'); plot(iris_virginica(:,1), iris_virginica(:,2),'k^'); xlabel('Sepal length'); ylabel('Sepal width'); To perform an N dimensional scatterplot in a Matlab figure, I use a routine written by a former colleague Maneesh Sahani called scatterplot.m. It relies on another function called assign.m. Both are attached at the end of this lab. Place scatterplot.m in the plotting folder inside your tools folder.help scatterplot % spend a minute reading the documentation % label groups, just like for ANOVA G = [repmat(1,size(iris_setosa,1),1); repmat(2,size(iris_versicolor,1),1); repmat(3,size(iris_virginica,1),1);] data = [iris_setosa;iris_versicolor;iris_virginica]; figure; scatterplot(data,'class',G,'markersize',8); legend('Setosa','Versacolor','Virginica','Location','NorthOutside') Take your mouse and move the legend out of the way so all the plots can be full size. The scatterplot plots every dimension against every other dimension. The upper left graph has dimension 2 (sepal width, on the y axis) plotted against dimension 1 (sepal length, on the x axis). If you look down the left hand column of graphs, you see that the dimension 1 (sepal length) is consistent across the row, allowing you to see how the other dimensions (2, 3, and 4) co-vary with dimension 1. In the second column, dimension 2 is on the x axis, and dimensions 3 and 4 are on the y axis. By comparing the points across the graph rows and columns, you can get a sense of how the data looks in multi-dimensional space. Q4: Describe how iris setosa differs from iris versicolor and iris virginica in all 4 dimensions. How do iris versicolor and iris virginica differ (in all 4 dimensions)? |

Labs >