Monthly Archives: August 2013

The Quoracast

I started a podcast. It’s called “The Quoracast” and is an unofficial podcast dedicated to profiling members of the Quora.com community. Episodes can be found on the iTunes store or on Quoracast.com

Stock Similarities

Stock Similarities is a tool I wrote for comparing equities using cosine similarity.   The source code can be found on github.

Usage

Upon starting the program, the user is presented with the following:

restrict : limits the parsed metrics to a stricter set than default
ld <ticker> : loads all information about a stock into memory
ld <sector> : load tech, pharm, food, or finance
ld all : loads several NASDAQ stocks from various sectors
list : list all loaded companies
print_vect <ticker> : print the formatted stock vector for a ticker which has been loaded into memory
print_atts <ticker> : print all raw attributes of a stock which is in memory
sim <ticker> <ticker> : print the cosine similarity of two vectors
vis : enter visualization mode
sr : perform SageRank
q : quit the system

A standard series of commands can be found here.  It was generated from an older version of the code.  Several key lines are:

measure_similarity MSFT AAPL
0.9487773944602984
measure_similarity AAPL AMZN
0.7766864280064972
measure_similarity MSFT AMZN
0.9341696778035647

The output is code that can be copied into a Processing file to get the following visualization:

visualizationThe lines of output suggest that, of the three companies, AAPL and AMZN are the most disparate.  As a result, AAPL and AMZN are connected by the hypotenuse (the longest line).  The other meaningful component of the visualization is the radius of the circle, which is dictated by price/earnings ratio.

Implementation

Stock data is pulled from Yahoo finance, formatted, parsed, and mapped to vectors.  After this process, a stock can be summarized by a vector such as AAPL -> {contracts traded yesterday = 1000000000, last traded price = 520, short ratio = .5 …}.  Vectors are compared using cosine similarity.

	public static double cosineSimilarity(AttributeVector v1, AttributeVector v2) 
	{
		return dotProduct(v1, v2) / (v1.magnitude() * v2.magnitude());
	}

This creates a 1-to-1 similarity ratio for each pair of stocks.  GraphFactory turns these relationships into edge lengths, so that the stocks form a fully connected graph.

The nodes can each be printed in order of ranked importance.  A node’s importance is the sum of the incoming edges in that node.