Summer School is a course in algorithms and data structures. The course is run via the WellingtonRuby Meetup and features weekly assignments and review sessions.
- Array, Enumerable & Enumerator
- Source
- Implementation
- Correction
- Diff between my implementation and the correction
- Notes
- CircularBuffer & LinkedList
- Source
- Implementation
- Correction
- Diff between my implementation and the correction
- Notes
- Hash
- Source
- Implementation
- Correction
- Diff between my implementation and the correction
- Notes
- Binary Search Tree
- Source
- Implementation
- Correction
- Diff between my implementation and the correction
- Notes
- Binary Heap & Priority Queue
- Source
- (Partial) Implementation
- Correction
- Diff between my implementation and the correction
- Notes
- Matrix & Page Rank
- Source
- Implementation
- Correction
- Diff between my implementation and the correction
- Notes
- Graph
- Source
- (Partial) Implementation
- Correction
- Diff between my implementation and the correction
- Notes
- Algorithms
- Source
- Implementation
- Correction
- Diff between my implementation and the correction
- Notes
- Weighted Graph Algorithms
When the memory has reached capacity, it needs to be extended. It is done by allocating a new memory chunk and copying all the elements to it.
If it was only extended to exactly what is needed, the next time around, it would happen again.
arithmetic series: 1 + 2 + ... + n = (n^2 + n) / 2
geometric series: 1 + 2 + 4 + 8 + ... + 2^n = 2^(n+1) - 1
Linked Lists lookup is faster than Arrays lookup if we assume:
- Locality of reference
- Random access machine model where access time is constant
Use hash when keys are sparse or not even integer.
For non-integer keys, Ruby calls hash on the key to have an integer.
Ruby uses a SipHash which is collision resistant but not completely secure, see collision resolution.
Binary Tree: Each node has at most 2 child. Binary Search Tree: Same but each node has a value and the left node is always smaller.
The shape of the tree is dependent on the order things are put in. Although, self adjusting/balancing trees help having a balanced tree regardless of the order in which the elements are added.
O(n^2) for insert and search if not balanced. O(log•n) for insert and O(n•log•n) for search if balanced.
Easy in order traversal with recursive function (see each).
Normal implementation of eql? would only verify that the container has the same values in it but for the sake of this exercise, it also check the structure of the tree.
Index calculations for Binary Heaps:
- In general the children of the value at index
iare at index2 * i + 1and2 * i + 2. - The parent of the value at index
iis at index(i - 1) / 2.
Utils methods have a few preconception.
sift_upassume that element at indexjhas a parent (or thatj >= 1).sift_downassume that the element at indexihas 2 children.
Heapsort (runtime of n•log•n) usually use a max-heap which will shift all values from the start and add them from the end.
The inner product of two vectors is the sum of the products of corresponding elements, such as a•b = a1b1 + a2b2 + a3b3 + ... (e.g. with v1 = Vector[1, 2] and v2 = Vector[2, 3], v1 • v2 = 8).
The straightforward algorithm for calculating a floating-point dot product of vectors can suffer from catastrophic cancellation. To avoid this, approaches such as the Kahan summation algorithm are used. – Wikipedia
The product of two matrices is a matrix of the inner product of each row (of the first matrix) with each column (of the second matrix).
Given the pages and links:
- A links to B
- B links to C
- C links to B and D
- D links to A
An adjacency matrix can be created to represent the pages links between each other. It will take n^2 space.
A B C D
A 0 1 0 0
B 0 0 1 0
C 0 1 0 1
D 1 0 0 0
Then it will be transformed into a (right) stochastic matrix (a.k.a. probability matrix) where each value is a non-negative real number representing a probability and where each row sums to 1. It will show the probability to go to a given page.
A B C D
A 0 1 0 0
B 0 0 1 0
C 0 0.5 0 0.5
D 1 0 0 0
The PageRank theory holds that an imaginary surfer who is randomly clicking on links will eventually stop clicking. The probability, at any step, that the person will continue is a damping factor
d. Various studies have tested different damping factors, but it is generally assumed that the damping factor will be set around 0.85. – Wikipedia
It then multiplies the matrix by the damping factor and saves the result.
Separately, it creates another stochastic matrix of the same size, filled with 1/n representing the probability to pick a page when the surfer stops following links. This second matrix is multiplied by the opposite of the damping factor.
Both result are summed: follow_links_matrix * d + pick_new_page_matrix * (1 - d).
In addition to being stochastic, the resulting matrix is also irreducible (can get anywhere from anywhere) and aperiodic (it's possible to be on any page at any given step).
The PageRank is finally computed using the power iteration (an Eigenvalue algorithm) which, in short, produces a non-zero vector, multiplies it by the matrix, saves the resulting vector and repeat the process with that new vector until the probability to be on any page reach stationary distribution.
# With `m` the matrix and `n` the number of pages in the matrix.
v0 = Vector.new(n, 1/n)
while true
v1 = m * v0
t = (v1.magnitude - v0.magnitude)
break if t < 0.0001
v0 = v1
end
# PageRank is the value of v1Note: In this exercise, it give the same weight to each link here but the importance of a link could be weighted. In addition, in case the surfer stops following links and go to another page, all the pages have an equal chance of being selected, but in reality, the surfer probably goes to a well known page.
Representing page links with a matrix takes n^2 space and don't have edges per se.
A more efficient way to represent the pages relations would be to use a graph of size n + e (nodes + edges, a.k.a. links).
Recursive methods are simpler but might cause stack overflow. Iterative methods can be used instead but then need to use a stack to keep track of where you are conceptually.
It is not possible to produce a topological sort if the graph is cyclic (or if a portion of it is).
Note: In this exercise, an array of size n is used to keep track of the visited nodes and might not be practical when the nodes not simple integers. Other solutions might involve using the hash of the object and checking whether it is in the array; or it could use a binary search tree.
The Fisher–Yates shuffle is an algorithm for generating a random permutation of a finite set. [...] The algorithm effectively puts all the elements into a hat; it continually determines the next element by randomly drawing an element from the hat until no elements remain. The algorithm produces an unbiased permutation: every permutation is equally likely. – Wikipedia
Binary search runs in at worst O(log•n) (i.e. logarithmic time).
In computer science, the Knuth–Morris–Pratt string searching algorithm (or KMP algorithm) searches for occurrences of a word within a main text string by employing the observation that when a mismatch occurs, the word itself embodies sufficient information to determine where the next match could begin, thus bypassing re-examination of previously matched characters. – Wikipedia
The Floyd–Warshall algorithm is an algorithm for finding shortest paths in a weighted graph – Wikipedia
Every combination of edges is tested by incrementally improving an estimate on the shortest path between two vertices, until the estimate is optimal.
Dijkstra's algorithm is an algorithm for finding the shortest paths between nodes in a graph – Wikipedia
Prim's algorithm is a greedy algorithm that finds a minimum spanning tree for a weighted undirected graph. This means it finds a subset of the edges that forms a tree that includes every vertex, where the total weight of all the edges in the tree is minimized. The algorithm operates by building this tree one vertex at a time, from an arbitrary starting vertex, at each step adding the cheapest possible connection from the tree to another vertex – Wikipedia
In computer science, big O notation is used to classify algorithms according to how their running time or space requirements grow as the input size grows. – Wikipedia
| Running time | Name |
|---|---|
| O(1) | constant time |
| O(log•n) | logarithmic time |
| O(n) | linear time |
| O(n•log•n) | linearithmic time |
| O(n^2) | quadratic time |
| O(n^3) | cubic time |
Most default implementation of random are only pseudo-random and Ruby's Random is no different, although it also has SecureRandom.