[低端筆記] MIT算法導論 2. Models of Computation, Document Distance
MIT的算法導論公開課有兩種版本, 我選較新版的
MIT 6.006 Introduction to Algorithms, Fall 2011
官方筆記
什麼是演算法? (2:16)
- 解決問題的計算程序
- input → algo → output
Model of computation
- 演算法允許的運算行為
- 運算花費成本(時間, 空間...)
- cost of algorithm = sum of operation costs
Random Access Machine (RAM)
- Random Access Memory (RAM) modeled by a big array
- 在Θ(1)時間內, 可以完成
- 載入O(1) words
- 計算O(1) 算式
- 儲存O(1) words
- Θ(1) registers (each 1 word)
- word: w(log(size of memory)) bit
Pointer machine
- dynamically allocated objects
- object has O(1) fields
- field = word (e.g.,int) or pointer to object/null
- weaker than (can be implemented on) RAM
Pointer 資料結構 |
Python Model (18:09)
- “list” = array → RAM
- L[i] = L[j] + 5 → O(1) time (原本的link list結構為O(n))
- object with O(1) attributes → pointer machine
- x = x.next → O(1) time
- list
- L.append(x) → via table doubling → O(1) time
- L = L1 + L2 ≡ L = [ ] → O(1) time
- len(L) → θ(1) time
- L.sort() → θ(|L|log|L|)
- dict
- D[key] = value → O(1) time (via hashing)
- long
- x+y → O(|x|+|y|)
- x∗y → O((|x|+|y|)^lg(3))
- heapq
- heaps
Document distance problem (33:12)
d(D1, D2)
Document Distance Algorithm (42:35)
- 將document切成words
- 計算word出現頻率
- 計算D1·D2
re.findall(r“\w+”,doc)Θ(|doc|)
#2
for word in doc; count[word] += 1;O(Σ|word|) = O(|doc|)
#3
for word, count1 in doc1: //Θ(k1) if word, count2 in doc2: //Θ(k2) total += count1 * count2 //Θ(1)O(k1·k2)
#3'
start at first word of each list if words equal: //O(|word|) total += count1*count2 if word1 ≤ word2: //O(|word|) advancelist1 else: advancelist2 repeat either until list doneO(Σ|word|) = O(|doc|)
沒有留言:
張貼留言