[低端筆記] MIT算法導論 2. Models of Computation, Document Distance

MIT的算法導論公開課有兩種版本, 我選較新版的
MIT 6.006 Introduction to Algorithms, Fall 2011

官方筆記


什麼是演算法? (2:16)

  • 解決問題的計算程序
    • input → algo → output

Model of computation

  • 演算法允許的運算行為
  • 運算花費成本(時間, 空間...)
  • cost of algorithm = sum of operation costs

Random Access Machine (RAM)

  • Random Access Memory (RAM) modeled by a big array
  • 在Θ(1)時間內, 可以完成
    • 載入O(1) words
    • 計算O(1) 算式
    • 儲存O(1) words
  • Θ(1) registers (each 1 word)
  • word: w(log(size of memory)) bit

Pointer machine

  • dynamically allocated objects
  • object has O(1) fields
  • field = word (e.g.,int) or pointer to object/null
  • weaker than (can be implemented on) RAM
Pointer 資料結構

Python Model (18:09)

  1. “list” = array → RAM
    • L[i] = L[j] + 5 → O(1) time (原本的link list結構為O(n))
  2. object with O(1) attributes → pointer machine
    • x = x.next → O(1) time
  3. list
    • L.append(x) → via table doubling → O(1) time
    • L = L1 + L2 ≡ L = [ ] → O(1) time
    • len(L) → θ(1) time
    • L.sort() → θ(|L|log|L|)
  4. dict
    • D[key] = value → O(1) time (via hashing)
  5. long
    • x+y → O(|x|+|y|)
    • x∗y → O((|x|+|y|)^lg(3))
  6. heapq
    • heaps

Document distance problem (33:12)

d(D1, D2)
  • Document = sequence of words
  • Word = sequence of alphanumeric characters
  • idea: 共同words
  • 把Document假想成一個vector
    • D[w] = #occurrences of word W in D
    • For example
      • D1=“the cat”, D2=“the dog”
    • d'(D1,D2) = D1·D2 =  Σ D1[W]·D2[W]
    • d''(D1,D2) = D1·D2 / |D1|·|D2|
    • d(D1,D2) = arccos(d''(D1,D2))

Document Distance Algorithm (42:35)


  1. 將document切成words
  2. 計算word出現頻率
  3. 計算D1·D2
#1
re.findall(r“\w+”,doc)
Θ(|doc|)

#2
for word in doc;
  count[word] += 1;
O(Σ|word|) = O(|doc|)

#3
for word, count1 in doc1:  //Θ(k1)
 if word, count2 in doc2:  //Θ(k2)
  total += count1 * count2 //Θ(1)
O(k1·k2)

#3'
start at first word of each list
if words equal:   //O(|word|)
 total += count1*count2
if word1 ≤ word2: //O(|word|)
 advancelist1
else:
 advancelist2
repeat either until list done
O(Σ|word|) = O(|doc|)

延伸閱讀
About Sean Chaox
Me

I'm soulless, so I'm recompiling my soul
I'm lifeless, so I'm enriching my life
I'm homeless, so I build this House
I am Sean, welcome to my House

沒有留言:

張貼留言