[低端筆記] MIT算法導論 2. Models of Computation, Document Distance

MIT的算法導論公開課有兩種版本, 我選較新版的
MIT 6.006 Introduction to Algorithms, Fall 2011

官方筆記

Typed notes (PDF)

什麼是演算法? (2:16)

解決問題的計算程序

input → algo → output

Model of computation

演算法允許的運算行為
運算花費成本(時間, 空間...)
cost of algorithm = sum of operation costs

Random Access Machine (RAM)

Random Access Memory (RAM) modeled by a big array
在Θ(1)時間內, 可以完成

載入O(1) words
計算O(1) 算式
儲存O(1) words

Θ(1) registers (each 1 word)
word: w(log(size of memory)) bit

Pointer machine

dynamically allocated objects
object has O(1) fields
field = word (e.g.,int) or pointer to object/null
weaker than (can be implemented on) RAM

Pointer 資料結構

Python Model (18:09)

“list” = array → RAM

L[i] = L[j] + 5 → O(1) time (原本的link list結構為O(n))

object with O(1) attributes → pointer machine

x = x.next → O(1) time

list

L.append(x) → via table doubling → O(1) time
L = L1 + L2 ≡ L = [ ] → O(1) time
len(L) → θ(1) time
L.sort() → θ(|L|log|L|)

dict

D[key] = value → O(1) time (via hashing)

long

x+y → O(|x|+|y|)
x∗y → O((|x|+|y|)^lg(3))

heapq

heaps

Document distance problem (33:12)

d(D1, D2)

Document = sequence of words
Word = sequence of alphanumeric characters
idea: 共同words
把Document假想成一個vector

D[w] = #occurrences of word W in D
For example

D1=“the cat”, D2=“the dog”

d'(D1,D2) = D1·D2 = Σ D1[W]·D2[W]
d''(D1,D2) = D1·D2 / |D1|·|D2|
d(D1,D2) = arccos(d''(D1,D2))

Document Distance Algorithm (42:35)

將document切成words
計算word出現頻率
計算D1·D2

re.findall(r“\w+”,doc)

Θ(|doc|)

#2

for word in doc;
  count[word] += 1;

O(Σ|word|) = O(|doc|)

#3

for word, count1 in doc1:  //Θ(k1)
 if word, count2 in doc2:  //Θ(k2)
  total += count1 * count2 //Θ(1)

O(k1·k2)

#3'

start at first word of each list
if words equal:   //O(|word|)
 total += count1*count2
if word1 ≤ word2: //O(|word|)
 advancelist1
else:
 advancelist2
repeat either until list done

O(Σ|word|) = O(|doc|)

11.3.19
Sean Chaox

About Sean Chaox

I'm soulless, so I'm recompiling my soul
I'm lifeless, so I'm enriching my life
I'm homeless, so I build this House
I am Sean, welcome to my House