Monday, May 6, 2019

Leetcode : Employee Free Time

We are given a list schedule of employees, which represents the working time for each employee.
Each employee has a list of non-overlapping Intervals, and these intervals are in sorted order.
Return the list of finite intervals representing common, positive-length free timefor all employees, also in sorted order.
Example 1:
Input: schedule = [[[1,2],[5,6]],[[1,3]],[[4,10]]]
Output: [[3,4]]
Explanation:
There are a total of three employees, and all common
free time intervals would be [-inf, 1], [3, 4], [10, inf].
We discard any intervals that contain inf as they aren't finite. 
Analysis:
The thoughts of how to tackle this problem is very straight forward, however, choosing which kind of data structure to present and get the answer efficiently and also easy to understand is the difficult part. 
List all the intervals on a timeline, then it will look like this 
b1(b1)      e2      e3     b4     b5     e6                          e10
________________________________________________________________________
So we can see the answer that we need is [e3, b4], and we can easily tell the condition is when the time interval meets situation curEnd < curBegin. 
In the very first place, I thought of using PriorityQueue. Using two priority queues, then push startTime and endTime separately, then compare each pq.top() value. However, this meets a lot of corner cases, especially hard to decide what is the loop end condition.
Then I found out, why not just pair the start and end time up, by increasing order? 
For example: 
Begin: 1, 2, 2, 6
End:    3, 4, 5, 7
Because their sizes are the same, the condition for free time slot is simple, end[i] < begin[i+1]. 
Time complexity: O(nlogn) for sorting. Space complex: O(n). 

class Solution {
public:
    vector<vector<int>> employeeFreeTime(vector<vector<vector<int>>>& schedule) {
        vector<vector<int>> res;
        vector<int> tmp;
        vector<int> begin, end;
        
        for (int i = 0; i<schedule.size(); i++)
        {
            for (int j = 0; j<schedule[i].size(); j++)
            {
                begin.push_back(schedule[i][j][0]);
                end.push_back(schedule[i][j][1]);
            }
        }
        
       sort(begin.begin(), begin.end());
       sort(end.begin(), end.end());
       for (int i = 0; i<begin.size(); i++)
       {
           if (i<begin.size()-1 && end[i] < begin[i+1])
            {
                tmp.push_back(end[i]);
                tmp.push_back(begin[i+1]);
                res.push_back(tmp);
                tmp.clear();
            }
       }
            
        return res;
    }
};

Saturday, May 4, 2019

System Design Questions to ask

  • Will users of our service be able to post tweets and follow other people?
  • Should we also design to create and display the user’s timeline?
  • Will tweets contain photos and videos?
  • Are we focusing on the backend only or are we developing the front-end too?
  • Will users be able to search tweets?
  • Do we need to display hot trending topics?
  • Will there be any push notification for new (or important) tweets?

  • What scale is expected from the system (e.g., number of new tweets, number of tweet views, number of timeline generations per sec., etc.)?
  • How much storage will we need? We will have different numbers if users can have photos and videos in their tweets.
  • What network bandwidth usage are we expecting? This will be crucial in deciding how we will manage traffic and balance load between servers.

  • Since we will be storing a massive amount of data, how should we partition our data to distribute it to multiple databases? Should we try to store all the data of a user on the same database? What issue could it cause?
  • How will we handle hot users who tweet a lot or follow lots of people?
  • Since users’ timeline will contain the most recent (and relevant) tweets, should we try to store our data in such a way that is optimized for scanning the latest tweets?
  • How much and at which layer should we introduce cache to speed things up?
  • What components need better load balancing?

  • Is there any single point of failure in our system? What are we doing to mitigate it?
  • Do we have enough replicas of the data so that if we lose a few servers we can still serve our users?
  • Similarly, do we have enough copies of different services running such that a few failures will not cause total system shutdown?
  • How are we monitoring the performance of our service? Do we get alerts whenever critical components fail or their performance degrades?


Monday, April 29, 2019

Leetcode: Top K Frequent Words

Given a non-empty list of words, return the k most frequent elements.
Your answer should be sorted by frequency from highest to lowest. If two words have the same frequency, then the word with the lower alphabetical order comes first.
Example 1:
Input: ["i", "love", "leetcode", "i", "love", "coding"], k = 2
Output: ["i", "love"]
Explanation: "i" and "love" are the two most frequent words.
    Note that "i" comes before "love" due to a lower alphabetical order.
Example 2:
Input: ["the", "day", "is", "sunny", "the", "the", "the", "sunny", "is", "is"], k = 4
Output: ["the", "is", "sunny", "day"]
Explanation: "the", "is", "sunny" and "day" are the four most frequent words,
    with the number of occurrence being 4, 3, 2 and 1 respectively.
Note:
  1. You may assume k is always valid, 1 ≤ k ≤ number of unique elements.
  2. Input words contain only lowercase letters.
Follow up:
  1. Try to solve it in O(n log k) time and O(n) extra space.

Solution:

There are different ways to solve this problem. 

1. hashmap + priority queue
    
First use a hashmap to scan the whole string vector, use key value pair to record the frequency of each string.
Then use priority queue to insert the objects (pari<string, int>)in hashmap with an order which defined by
comparison function. 
   
Priority Queue :  A list of objects which always have largest value on top in given compare function. Has push(), emplace(), top(), pop().

Emplace() vs Push():

To fully understand what emplace_back does, one must first understand variadic templates and rvalue references.
This is a fairly advanced, and deep concept in modern C++. On a map, it would be labeled "there be dragons".
You say that you're new to C++ and trying to learn this stuff. This may not be the answer you might be looking for, but you should skip this detail for now, and come back later after you've wrapped your brain around variadic templates and rvalue references. Then it should all make sense.
But if you insist: for a container containing simple, elementary types like integers, there's little, if any difference. The difference comes when the container's type is some large, sophisticated class, with a complicated constructor, and/or copy-constructor.
The end result of either push or emplace is exactly, 100%, the same. The container gets another element appended to it. The difference is where the element comes from:
1) push takes an existing element, and appends a copy of it to the container. Simple, straightforward. push always takes exactly one argument, the element to copy to the container.

2) emplace creates another instance of the class in the container, that's already appended to the container. The arguments to emplace are forwarded as arguments to the container's class's constructor. Emplace can have either one argument, more than one argument, or no argument at all, if the class has a default constructor.

Time complexity: push() - O(logn) , emplace() -, top() - O(1), pop() - .
Time and space complexity could be different based on different container given for priority queue. 

Time complexity: O(n) + O(nlogk)
Space: O(n)

2. hashmap + Sorting

use hashmap do the same thing as 1). Then sort by frequency and by the alphabetic order. Then take the last k elements out of the sorted vector as result. 


class Solution {
    struct comp
    {
        comp() {}
        ~comp() {}
        bool operator() (const pair<string, int>& a, const pair<string, int>& b)
        {
            return a.second > b.second ||(a.second == b.second && a.first < b.first);
        }
    };
    
public:
    
    vector<string> topKFrequent(vector<string>& words, int k) {
        vector<string> res;
        
        if (words.size() == 0 || k == 0)
        {
            return res;
        }
        
        unordered_map<string, int> dict;
        
        for (string& s : words)
        {
            dict[s]++;
        }
        
        priority_queue<pair<string, int>, vector<pair<string, int>>, comp> pq;
        
        for (auto it : dict)
        {
            pq.emplace(it.first, it.second);
            if (pq.size() > k) pq.pop();
        }
        
        while(!pq.empty())
        {
            res.push_back(pq.top().first);
            pq.pop();
        }
        reverse(res.begin(), res.end());
        return res;
    }
};

Leetcode : Employee Free Time

We are given a list  schedule  of employees, which represents the working time for each employee. Each employee has a list of non-overlap...