Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CA: refactor ClusterSnapshot methods #7466

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

towca
Copy link
Collaborator

@towca towca commented Nov 5, 2024

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

This is a part of Dynamic Resource Allocation (DRA) support in Cluster Autoscaler. The ClusterSnapshot interface is cleaned up to facilitate later changes needed for DRA:

  • There were multiple methods for adding Nodes to the snapshot. This causes redundancy in ClusterSnapshot implementations for no clear reason. Instead of adding DRA handling to all these methods, they're replaced with AddNodeInfo which is DRA-aware already.
  • RemoveNode is renamed to RemoveNodeInfo for consistency with AddNodeInfo.
  • AddPod and RemovePod are renamed to SchedulePod and UnschedulePod. These names are more in-line with the method behavior when DRA is considered (a pod is not "removed" from the snapshot altogether, since we have to keep tracking its DRA objects).
  • An Initialize method is added. All other methods were Node or Pod specific, while for DRA the snapshot will also need to track DRA objects that are not bound to Nodes or Pods. Initialize() will be used to set these "global" DRA objects in later commits.

Which issue(s) this PR fixes:

The CA/DRA integration is tracked in kubernetes/kubernetes#118612, this is just part of the implementation.

Special notes for your reviewer:

This is intended to be a no-op refactor. It was extracted from #7350 after #7447.

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [KEP]: https://github.com/kubernetes/enhancements/blob/9de7f62e16fc5c1ea3bd40689487c9edc7fa5057/keps/sig-node/4381-dra-structured-parameters/README.md

/assign @MaciekPytel
/assign @jackfrancis

@k8s-ci-robot k8s-ci-robot added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. area/cluster-autoscaler area/provider/alicloud Issues or PRs related to the AliCloud cloud provider implementation approved Indicates a PR has been approved by an approver from all required OWNERS files. area/provider/aws Issues or PRs related to aws provider area/provider/azure Issues or PRs related to azure provider labels Nov 5, 2024
@k8s-ci-robot k8s-ci-robot added area/provider/cluster-api Issues or PRs related to Cluster API provider area/provider/digitalocean Issues or PRs related to digitalocean provider area/provider/equinixmetal Issues or PRs related to the Equinix Metal cloud provider for Cluster Autoscaler size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. area/provider/externalgrpc Issues or PRs related to the External gRPC provider area/provider/gce area/provider/hetzner Issues or PRs related to Hetzner provider area/provider/ionoscloud area/provider/kwok Issues or PRs related to the kwok cloud provider for Cluster Autoscaler area/provider/linode Issues or PRs related to linode provider area/provider/magnum Issues or PRs related to the Magnum cloud provider for Cluster Autoscaler area/provider/oci Issues or PRs related to oci provider area/provider/rancher labels Nov 5, 2024
@towca towca force-pushed the jtuznik/dra-snapshot-cleanup branch from c7d18df to e0d1e60 Compare November 5, 2024 16:07
@pohly
Copy link
Contributor

pohly commented Nov 6, 2024

/cc

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 6, 2024
pods = append(pods, podInfo.Pod)
}
err := a.ClusterSnapshot.AddNodeWithPods(upcomingNode.Node(), pods)
err := a.ClusterSnapshot.AddNodeInfo(upcomingNode)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would be a good opportunity to rename vars as follows:

  • upcomingNodes --> upcomingNodeInfos
  • upcomingNode --> upcomingNodeInfo

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


knownNodes := make(map[string]bool)
for _, node := range nodes {
if err := snapshot.AddNode(node); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only error condition for adding a node is if your []*apiv1.Node set has a duplicate. I wonder if there's a more efficient way of doing that targeted error handling earlier in the execution flow so we don't have to do so much error handling at this point. It would also have the side-benefit of allowing us to ditch this knownNodes accounting overhead in this function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this is the perfect place to validate this:

  • The alternative is for Initialize() to assume that some validation happened earlier and that its input is correct. This doesn't seem safe, as it relies on every Initialize() user properly validating data first.
  • There are multiple places that call Initialize(), so ideally we'd want to extract the validation logic to remove redundancy anyway. If we're extracting it outside of Initialize(), we essentially have 2 functions that always need to be called in sequence.

Keep in mind that this should be called once per snapshot per loop, so the knownNodes overhead should be trivial compared to the rest of the loop.

@@ -41,8 +41,6 @@ type ClusterSnapshot interface {
AddPod(pod *apiv1.Pod, nodeName string) error
// RemovePod removes pod from the snapshot.
RemovePod(namespace string, podName string, nodeName string) error
// IsPVCUsedByPods returns if the pvc is used by any pod, key = <namespace>/<pvc_name>
IsPVCUsedByPods(key string) bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There appear to be some other implementations of this interface and usages across the codebase (e.g., cluster-autoscaler/simulator/clustersnapshot/basic.go), sorry if those are in another commit that I didn't see! But think we'll need to clean this up everywhere.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we no longer need this? IIRC this was never directly used in CA, but we needed to be able to satisfy the scheduler NodeInfo lister interface. Is that no longer the case? I feel like I'm missing some important part of CA/scheduler integration

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After talking with Kuba - the functionality still exists and scheduler was accessing it via the lister interface anyway. This makes sense to me.

@@ -164,7 +164,7 @@ func (data *internalBasicSnapshotData) removeNode(nodeName string) error {
return nil
}

func (data *internalBasicSnapshotData) addPod(pod *apiv1.Pod, nodeName string) error {
func (data *internalBasicSnapshotData) schedulePod(pod *apiv1.Pod, nodeName string) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these name changes really meaningful, given that we are simply wrapping the k/k scheduler's NodeInfo methods which will have the existing names?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the point is that when we introduce the DRA logic this will no longer be just a wrapper around the schedulerframework.NodeInfo.AddPod. There will be additional DRA processing, as well as interacting with the scheduler framework plugins.

I just want to get the interface names changed in one go to minimize conflicts later.

@towca towca force-pushed the jtuznik/dra-snapshot-cleanup branch from e0d1e60 to 4a7d702 Compare November 7, 2024 15:04
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Nov 7, 2024
towca added a commit to towca/autoscaler that referenced this pull request Nov 7, 2024
…ddNodeInfo

We need AddNodeInfo in order to propagate DRA objects through the
snapshot, which makes AddNodeWithPods redundant.
AddNodes() is redundant - it was indended for batch adding nodes,
with batch-specific optimizations in mind probably. However, it
has always been implemented as just iterating over AddNode(), and
is only used in test code.

Most of the uses in the test code were initialization - they are
replaced with Initialize(), which will later be needed for handling
DRA anyway. The other uses are replaced with inline loops over
AddNode().
The method is already accessible via StorageInfos(), it's
redundant.
AddNodeInfo already provides the same functionality, and has to be used
in production code in order to propagate DRA objects correctly.

Uses in production are replaced with Initialize(), which will later
take DRA objects into account. Uses in the test code are replaced with
AddNodeInfo().
AddPod is renamed to SchedulePod, RemovePod to UnschedulePod. This makes
more sense in the DRA world as for DRA we're not only adding/removing
the pod, but also modifying its ResourceClaims - but not adding/removing
them (the ResourceClaims need to be tracked even for pods that aren't
scheduled).

RemoveNode is renamed to RemoveNodeInfo for consistency with other
NodeInfo methods.
@towca towca force-pushed the jtuznik/dra-snapshot-cleanup branch from 4a7d702 to 3556f27 Compare November 13, 2024 12:18
@towca
Copy link
Collaborator Author

towca commented Nov 13, 2024

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 13, 2024
towca added a commit to towca/autoscaler that referenced this pull request Nov 13, 2024
towca added a commit to towca/autoscaler that referenced this pull request Nov 14, 2024
Initialize(nodes []*apiv1.Node, scheduledPods []*apiv1.Pod) error

// SchedulePod schedules the given Pod onto the Node with the given nodeName inside the snapshot.
SchedulePod(pod *apiv1.Pod, nodeName string) error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if in DRA world we need a separate method to schedule an existing pod (this one) and one to inject a completely new, in-memory pod? Basically - do we want separate method for "create pod" and "schedule pod"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To close the loop on this - we discussed with @towca offline and we both think such method would be needed (potentially with separate version for injecting a completely new pod for cases like provisioningrequest and one for duplicating existing pods e.g. for replica count optimization -> this is TBD).
However, it makes sense to do it in a separate PR, together with the logic that will cover storing DRA objects for unschedulable pods.

Copy link
Contributor

@MaciekPytel MaciekPytel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks quite good, but I left a few comments that I feel are significant.
/approve

I assume I'll be unavailable when the comments are resolved. Please feel free to assume lgtm from me once you resolve the comments.

@@ -286,15 +282,10 @@ func BenchmarkFilterOutSchedulable(b *testing.B) {
assert.NoError(b, err)

clusterSnapshot := snapshotFactory()
if err := clusterSnapshot.AddNodes(nodes); err != nil {
if err := clusterSnapshot.Initialize(nodes, scheduledPods); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels somewhat inconsistent to use Initialize here and use AddNodeInfo in a loop in TestFilterOutSchedulable above. I assume you do it, because here you have a list of all pods and above it's grouped by node, but that leaks out an assumption that Initialize is just adding NI in a loop. Is that an assumption we want? Or would we tell users that ClusterSnapshot must be initialized to be used?
Maybe this boils down to my other comment about the naming of Initialize and this would be fine if we just renamed the method?

// IsPVCUsedByPods returns if the pvc is used by any pod, key = <namespace>/<pvc_name>
IsPVCUsedByPods(key string) bool

// Initialize clears the snapshot and initializes it with real objects from the cluster - Nodes,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "initializes it with real objects from the cluster" - it doesn't, it initializes snapshot with whatever you passed. Based on the wording I'd expect this method to actually pull stuff out of an informer. Maybe "Initialize replaces current contents of the snapshot with the provided objects (nodes, scheduled pods)"?


// Initialize clears the snapshot and initializes it with real objects from the cluster - Nodes,
// scheduled pods.
Initialize(nodes []*apiv1.Node, scheduledPods []*apiv1.Pod) error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the name "Initialize" here. I think it's misleading:

  • I'd expect "Initialize" to be something that happens one on the start of object's lifetime.
  • I'd expect "Initialize" to be required before using the object in any other way - this isn't the case here, see my comment in one of the tests for example.
  • It's very non-specific, literally anything can happen here. It's fine if it encapsulates a bunch of internal stuff I don't care about as a user, but here it performs a very specific function that is both understandable and important for the user.

Naming is hard obviously, but I'd suggest the name along the lines of "ReplaceContentWith()" or "SetContent" or "SetClusterState" or similar. I think any of this would imply that the current contents of snapshot are dropped and replaced with whatever is provided - which is exactly what the comment above is saying.

@@ -41,8 +41,6 @@ type ClusterSnapshot interface {
AddPod(pod *apiv1.Pod, nodeName string) error
// RemovePod removes pod from the snapshot.
RemovePod(namespace string, podName string, nodeName string) error
// IsPVCUsedByPods returns if the pvc is used by any pod, key = <namespace>/<pvc_name>
IsPVCUsedByPods(key string) bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After talking with Kuba - the functionality still exists and scheduler was accessing it via the lister interface anyway. This makes sense to me.

return caerrors.ToAutoscalerError(caerrors.InternalError, err)
}
}
if err := a.ClusterSnapshot.Initialize(nodes, scheduledPods); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Is Clear() call needed here? Based on the docstring for Initialize(), I'd expect Clear() to be a part of Initialize().
  2. Does it make sense to wrap a single method in a helper function? Feels like at this point we should just inline it back into RunOnce.

}
// Initialize initializes the snapshot.
func (snapshot *BasicClusterSnapshot) Initialize(nodes []*apiv1.Node, scheduledPods []*apiv1.Pod) error {
snapshot.Clear()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we just remove Clear() from interface at this point? It feels completely redundant. Initialize(nil, nil) should do exactly the same thing and, after renaming it, should also be a very obvious and natural way of doing that (not to mention the fact that IIRC we never Clear() a snapshot without immediately re-initializing it, so the Clear() functionality is probably never needed).

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: MaciekPytel, towca

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/cluster-autoscaler area/provider/alicloud Issues or PRs related to the AliCloud cloud provider implementation area/provider/aws Issues or PRs related to aws provider area/provider/azure Issues or PRs related to azure provider area/provider/cluster-api Issues or PRs related to Cluster API provider area/provider/digitalocean Issues or PRs related to digitalocean provider area/provider/equinixmetal Issues or PRs related to the Equinix Metal cloud provider for Cluster Autoscaler area/provider/externalgrpc Issues or PRs related to the External gRPC provider area/provider/gce area/provider/hetzner Issues or PRs related to Hetzner provider area/provider/ionoscloud area/provider/kwok Issues or PRs related to the kwok cloud provider for Cluster Autoscaler area/provider/linode Issues or PRs related to linode provider area/provider/magnum Issues or PRs related to the Magnum cloud provider for Cluster Autoscaler area/provider/oci Issues or PRs related to oci provider area/provider/rancher cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants