Tag Archives: Finance

Stock Similarities

Stock Similarities is a tool I wrote for comparing equities using cosine similarity.   The source code can be found on github.


Upon starting the program, the user is presented with the following:

restrict : limits the parsed metrics to a stricter set than default
ld <ticker> : loads all information about a stock into memory
ld <sector> : load tech, pharm, food, or finance
ld all : loads several NASDAQ stocks from various sectors
list : list all loaded companies
print_vect <ticker> : print the formatted stock vector for a ticker which has been loaded into memory
print_atts <ticker> : print all raw attributes of a stock which is in memory
sim <ticker> <ticker> : print the cosine similarity of two vectors
vis : enter visualization mode
sr : perform SageRank
q : quit the system

A standard series of commands can be found here.  It was generated from an older version of the code.  Several key lines are:

measure_similarity MSFT AAPL
measure_similarity AAPL AMZN
measure_similarity MSFT AMZN

The output is code that can be copied into a Processing file to get the following visualization:

visualizationThe lines of output suggest that, of the three companies, AAPL and AMZN are the most disparate.  As a result, AAPL and AMZN are connected by the hypotenuse (the longest line).  The other meaningful component of the visualization is the radius of the circle, which is dictated by price/earnings ratio.


Stock data is pulled from Yahoo finance, formatted, parsed, and mapped to vectors.  After this process, a stock can be summarized by a vector such as AAPL -> {contracts traded yesterday = 1000000000, last traded price = 520, short ratio = .5 …}.  Vectors are compared using cosine similarity.

	public static double cosineSimilarity(AttributeVector v1, AttributeVector v2) 
		return dotProduct(v1, v2) / (v1.magnitude() * v2.magnitude());

This creates a 1-to-1 similarity ratio for each pair of stocks.  GraphFactory turns these relationships into edge lengths, so that the stocks form a fully connected graph.

The nodes can each be printed in order of ranked importance.  A node’s importance is the sum of the incoming edges in that node.

MoonStocks Works Better Than I Thought

While staring at MoonStocks today, I noticed that my algorithm to convert the dominant frequency of a stock’s song into the price of that stock was working better than I thought. It is hard to see that this process generates predictable patterns because the stock prices are updating every 100 ms but have variance associated with each individual conversion of dominant frequency measurement to price.  Over some number of iterations of a song -> price-series conversion, every point in time will have a price that is converged upon.

Black Swan

Nassim Nicholas Taleb outlines his black swan theory as:

  1. The disproportionate role of high-profile, hard-to-predict, and rare events that are beyond the realm of normal expectations in history, science, finance, and technology
  2. The non-computability of the probability of the consequential rare events using scientific methods (owing to the very nature of small probabilities)
  3. The psychological biases that make people individually and collectively blind to uncertainty and unaware of the massive role of the rare event in historical affairs

The high potential upside of an event can offset the low probability of its occurrence.