Monthly Archives: August 2013

The Quoracast

I started a podcast. It’s called “The Quoracast” and is an unofficial podcast dedicated to profiling members of the community. Episodes can be found on the iTunes store or on

Stock Similarities

Stock Similarities is a tool I wrote for comparing equities using cosine similarity.   The source code can be found on github.


Upon starting the program, the user is presented with the following:

restrict : limits the parsed metrics to a stricter set than default
ld <ticker> : loads all information about a stock into memory
ld <sector> : load tech, pharm, food, or finance
ld all : loads several NASDAQ stocks from various sectors
list : list all loaded companies
print_vect <ticker> : print the formatted stock vector for a ticker which has been loaded into memory
print_atts <ticker> : print all raw attributes of a stock which is in memory
sim <ticker> <ticker> : print the cosine similarity of two vectors
vis : enter visualization mode
sr : perform SageRank
q : quit the system

A standard series of commands can be found here.  It was generated from an older version of the code.  Several key lines are:

measure_similarity MSFT AAPL
measure_similarity AAPL AMZN
measure_similarity MSFT AMZN

The output is code that can be copied into a Processing file to get the following visualization:

visualizationThe lines of output suggest that, of the three companies, AAPL and AMZN are the most disparate.  As a result, AAPL and AMZN are connected by the hypotenuse (the longest line).  The other meaningful component of the visualization is the radius of the circle, which is dictated by price/earnings ratio.


Stock data is pulled from Yahoo finance, formatted, parsed, and mapped to vectors.  After this process, a stock can be summarized by a vector such as AAPL -> {contracts traded yesterday = 1000000000, last traded price = 520, short ratio = .5 …}.  Vectors are compared using cosine similarity.

	public static double cosineSimilarity(AttributeVector v1, AttributeVector v2) 
		return dotProduct(v1, v2) / (v1.magnitude() * v2.magnitude());

This creates a 1-to-1 similarity ratio for each pair of stocks.  GraphFactory turns these relationships into edge lengths, so that the stocks form a fully connected graph.

The nodes can each be printed in order of ranked importance.  A node’s importance is the sum of the incoming edges in that node.