三颗猫饼干系统设计: May 2019

Monday, May 6, 2019

Leetcode : Employee Free Time

We are given a list schedule of employees, which represents the working time for each employee.

Each employee has a list of non-overlapping Intervals, and these intervals are in sorted order.

Return the list of finite intervals representing common, positive-length free timefor all employees, also in sorted order.

Example 1:

Input: schedule = [[[1,2],[5,6]],[[1,3]],[[4,10]]]
Output: [[3,4]]
Explanation:
There are a total of three employees, and all common
free time intervals would be [-inf, 1], [3, 4], [10, inf].
We discard any intervals that contain inf as they aren't finite.

Analysis:

The thoughts of how to tackle this problem is very straight forward, however, choosing which kind of data structure to present and get the answer efficiently and also easy to understand is the difficult part.

List all the intervals on a timeline, then it will look like this

b1(b1) e2 e3 b4 b5 e6 e10

________________________________________________________________________

So we can see the answer that we need is [e3, b4], and we can easily tell the condition is when the time interval meets situation curEnd < curBegin.

In the very first place, I thought of using PriorityQueue. Using two priority queues, then push startTime and endTime separately, then compare each pq.top() value. However, this meets a lot of corner cases, especially hard to decide what is the loop end condition.

Then I found out, why not just pair the start and end time up, by increasing order?

For example:

Begin: 1, 2, 2, 6

End: 3, 4, 5, 7

Because their sizes are the same, the condition for free time slot is simple, end[i] < begin[i+1].

Time complexity: O(nlogn) for sorting. Space complex: O(n).

class Solution {
public:
    vector<vector<int>> employeeFreeTime(vector<vector<vector<int>>>& schedule) {
        vector<vector<int>> res;
        vector<int> tmp;
        vector<int> begin, end;
        
        for (int i = 0; i<schedule.size(); i++)
        {
            for (int j = 0; j<schedule[i].size(); j++)
            {
                begin.push_back(schedule[i][j][0]);
                end.push_back(schedule[i][j][1]);
            }
        }
        
       sort(begin.begin(), begin.end());
       sort(end.begin(), end.end());
       for (int i = 0; i<begin.size(); i++)
       {
           if (i<begin.size()-1 && end[i] < begin[i+1])
            {
                tmp.push_back(end[i]);
                tmp.push_back(begin[i+1]);
                res.push_back(tmp);
                tmp.clear();
            }
       }
            
        return res;
    }
};

Saturday, May 4, 2019

System Design Questions to ask

Will users of our service be able to post tweets and follow other people?
Should we also design to create and display the user’s timeline?
Will tweets contain photos and videos?
Are we focusing on the backend only or are we developing the front-end too?
Will users be able to search tweets?
Do we need to display hot trending topics?
Will there be any push notification for new (or important) tweets?

What scale is expected from the system (e.g., number of new tweets, number of tweet views, number of timeline generations per sec., etc.)?
How much storage will we need? We will have different numbers if users can have photos and videos in their tweets.
What network bandwidth usage are we expecting? This will be crucial in deciding how we will manage traffic and balance load between servers.

Since we will be storing a massive amount of data, how should we partition our data to distribute it to multiple databases? Should we try to store all the data of a user on the same database? What issue could it cause?
How will we handle hot users who tweet a lot or follow lots of people?
Since users’ timeline will contain the most recent (and relevant) tweets, should we try to store our data in such a way that is optimized for scanning the latest tweets?
How much and at which layer should we introduce cache to speed things up?
What components need better load balancing?

Is there any single point of failure in our system? What are we doing to mitigate it?
Do we have enough replicas of the data so that if we lose a few servers we can still serve our users?
Similarly, do we have enough copies of different services running such that a few failures will not cause total system shutdown?
How are we monitoring the performance of our service? Do we get alerts whenever critical components fail or their performance degrades?