Integrate backoff mechanism into the scheduling queue and remove the …#75497
Conversation
|
Hi @goodluckbot. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/ok-to-test |
|
I think unit test for |
denkensk
left a comment
There was a problem hiding this comment.
Thanks @goodluckbot I think podStartTime and podLastUpdateTime aren't needed. We can use podInfo.timestamp as the lastUpdateTime to calculate BackoffDuration. podInfo.timestamp is the timestamp that a pod is added to the scheduling queue.
|
Thanks @denkensk , i am thinking we need currently i am using |
|
/retest |
49740e0 to
a9496ea
Compare
|
/retest |
Huang-Wei
left a comment
There was a problem hiding this comment.
@goodluckbot Thanks for the PR. Please check my comments below.
Huang-Wei
left a comment
There was a problem hiding this comment.
@goodluckbot Thanks for the PR. Please check my comments below.
0545661 to
ea882aa
Compare
aa2f0fa to
8be79ef
Compare
|
thanks @Huang-Wei for reviewing, squashed the commits |
bsalamat
left a comment
There was a problem hiding this comment.
Thanks, @goodluckbot! Looks generally good. A few minor comments.
There was a problem hiding this comment.
If we used PodBackoffMap only for the PriorityQueue, we wouldn't need this here. podInfo already has a timestamp that can be used. For now, we can have it here until we remove the FIFO queue in the next step.
There was a problem hiding this comment.
this PR is using podLastUpdateTime + backoffDuration --> pod completes backoff time
where backoffDuration is calculated from numberOfAttempts
shall we change to podInfo.timestamp + backoffDuration --> pod completes backoff time?
i'm think podInfo.timestamp is the time pod added to this scheduling queue, while podLastUpdateTime is the time pod starts backoff. please correct me if i misunderstood sth. thanks.
There was a problem hiding this comment.
ok i see, in AddUnschedulableIfNotPresent we have pInfo.timestamp equal to podLastUpdateTime
func (p *PriorityQueue) AddUnschedulableIfNotPresent(pod *v1.Pod, podSchedulingCycle int64) error {
......
pInfo := p.newPodInfo(pod)
......
// Every unschedulable pod is subject to backoff timers.
p.backoffPod(pod)
......
}
maybe we could update the comment below from The time pod added to the scheduling queue. to The time pod last updated in the scheduling queue?
// podInfo is minimum cell in the scheduling queue.
type podInfo struct {
pod *v1.Pod
// The time pod added to the scheduling queue.
timestamp time.Time
}
There was a problem hiding this comment.
Yes. podInfo.timestamp can represent the time pod starts backoff in AddUnschedulableIfNotPresent . But the comment is not needed to change. Because timestamp represents the time pod added to the scheduling queue(no matter activeQ、podBackoffQ、unschedulableQ)
There was a problem hiding this comment.
I see.
shall we change this PR to use podInfo here?
i am thinking for PodBackoffMap to perform backoff, we only need namespace_pod as the key of the map, instead of all the info from *v1.Pod in podInfo.
There was a problem hiding this comment.
shall we change this PR to use podInfo here?
We cannot do it in this PR. The FIFO scheduling queue does not use podInfo. The changes you made in the factory.go apply to FIFO queue only.
In a follow-up PR, we can remove the FIFO queue and switch to podInfo and delete this podLastUpdateTime map.
597f676 to
82a5370
Compare
There was a problem hiding this comment.
Many of the functions here are thread safe, but this one is not. It is left to the caller to acquire the lock. I guess the reason is that CleanupPodsCompletesBackingoff uses this function. Please create a thread safe version of this function which is public (starts with a capital letter) and use the new function in other modules.
There was a problem hiding this comment.
the body of this function is non-trivial. Please use defer pbm.lock.Unlock() in the beginning of the function.
82a5370 to
151649d
Compare
bsalamat
left a comment
There was a problem hiding this comment.
/lgtm
/approve
Thanks, @goodluckbot!
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bsalamat, goodluckbot The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind cleanup
/kind feature
What this PR does / why we need it:
Integrate backoff mechanism into the scheduling queue and remove the Backoff util
Which issue(s) this PR fixes:
Fixes #75419
Special notes for your reviewer:
Does this PR introduce a user-facing change?:
NO