-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CURATOR-688. SharedCount will be never updated successful when version of ZNode is overflow. #478
Conversation
Sorry without new unit test here. Because I want to init one znode with version Integer.MAX_VALUE but no interface found. It will be helpful if someone give some guide to cover this case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with this change are you able to manually reproduce the problem ?
@@ -196,7 +196,7 @@ public boolean trySetValue(VersionedValue<byte[]> previous, byte[] newValue) thr | |||
private void updateValue(int version, byte[] bytes) { | |||
while (true) { | |||
VersionedValue<byte[]> current = currentValue.get(); | |||
if (current.getVersion() >= version) { | |||
if (current.getVersion() >= version && version != Integer.MIN_VALUE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens after we reach this new condition ?
is the system stuck ?
also, as "version" is a constant here, could early exit ? in the beginning of the method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @eolivelli , we try to build tmp zookeeper version and create zonde with version Integer.MAX_VALUE force and this case can reappear when trySetCount this znode. When meet this issue, SharedCount#trySetCount(VersionedValue<java.lang.Integer>, int) will always return false, then application will be stuck.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addendum: I still try to dig if it meet wrong version before updateValue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems that VersionedValue.version
are Stat.version
from ZooKeeper. Then it is "known" to overflow in finite time in possible large setData
frequency. Given default
visibility of VersionedValue
's visibility, I think we probably can enhance this a bit with extra zxid
. But the crucial problem comes from ZooKeeper, it only do atomic check against Stat.version
which is a 32-bit integer and overflow finally for long running frequently modified data. And we will finally run into corner case checkAndIncVersion
exposed, -1
is not an atomic condition which means we could write wrong node data in case of wraparound and contention.
I really hope a check_zxid
style setData
. I am always fearing of setData
with version
from old incarnation.
There is a similar case in https://lists.apache.org/thread/4o3rl49rdj5y0134df922zgc8clyt86s, which overflow Stat.cversion
. I believed ZooKeeper was not designed and suitable for certain usages, performance degradation is desireable in wrong usages. But these overflow more sounds like bugs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really hope a check_zxid style setData.
Totally +1. It should push zookeeper to changes, which is another thing. If need we should file another thread to discuss.
Addendum, here I didn't traverse cversion and aclversion if also have the same issue at curator side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But when current.getVersion() reach -1, the next trySetCount is indeterminate.
Sorry, I don't get this meaning, the server side indead to check if version is -1. But if both version
and currentVersion
are -1, it will increase as expect. Do you mean that some other logic I missed at curator side? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is checkAndIncVersion in ZooKeeper side. When the expectedVersion
is -1
, it means setData
blindly.
In Curator side, that is when getVersionedValue
reports -1
, the next trySetValue
will ship -1
as expectedVersion
to server which is apparently not what we want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, my bad. We both say the same code segment, but I misunderstand it. I will add some javadoc for this changes moment later. Thanks again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One corner case when review code back, if the version overflow at server side and it is negative value such as -100 now, then restart application and currentValue
will be initialized at curator side, and the version number will be set to UNINITIALIZED_VERSION which is -1 now. Then SharedCount will never be updated because current.getVersion() >= version && version != Integer.MIN_VALUE
always true now.
cc @kezhuw @eolivelli What do you think about?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess -1
was chose as UNINITIALIZED_VERSION
because of it is minimum in absent of overflow. So, now we can have Integer.MIN_VALUE
as UNINITIALIZED_VERSION
😮💨 .
Some conclusion, |
74deeb7
to
052450f
Compare
One corner case didn't considered, and try to update this PR and trigger CI again. |
052450f
to
7173103
Compare
fix checkstyle and trigger ci again. |
Hi @eolivelli @kezhuw , anymore suggestions here? |
@@ -196,8 +196,12 @@ public boolean trySetValue(VersionedValue<byte[]> previous, byte[] newValue) thr | |||
private void updateValue(int version, byte[] bytes) { | |||
while (true) { | |||
VersionedValue<byte[]> current = currentValue.get(); | |||
if (current.getVersion() >= version) { | |||
// A newer version was concurrently set. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure wether SharedValue
was designed to work with multiple owners, but I saw there is a background watcher to update value and also there is no rule to forbid concurrent usages. So, I assume it should work well in case of concurrency.
Then, let me assume a situation:
current.getVersion
isInteger.MAX_VALUE
.- Thread1 call
trySetValue
and succeed to get overflowed versionInteger.MIN_VALUE
, but the call toupdateValue
is somewhat delayed. - Thread2 (assume watcher, which runs in ZooKeeper thread if I am not wrong) call
updateVersion
with versionInteger.MIN_VALUE + 1
. According to the code, this will be ignored. - Thread1 call
updateValue
to continue its task with versionInteger.MIN_VALUE
. It succeeds. - That is all, assume no changes anymore. I know it may not realistic.
current
stores dated version while the javadoc says "All clients watching the same path will have the up-to-date value (considering ZK's normal consistency guarantees)".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not convinced in the overflow case either. @tisonkun @Hexiaoqiao
For the overflow case,
- +1 to deprecate
VersionedValue#getVersion
so to warn clients about the "ordering assumption" if any aboutversion
. - +1 to a viable workaround if any and/or exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should refactor updateValue
a bit to compare ordering using Stat.mzxid
. This way we are not fearing this overflow issue. I am stupid in reviwing without a deep thought, sorry for that. So my finally points are:
- Deprecate
VersionedValue#getVersion
to warn clients about "ordering assumptions" and "overflow behavior". - Refactor
updateValue
to order usingStat.mzxid
. - Throw exception in case of
-1
Stat.version
intrySetValue
. I am positive to ZOOKEEPER-4743. - Document somehow about "overflow" and exception case in
trySetValue
.
Besides above, should we expose a VersionedValue#getZxid
for client usage ?
Any thoughts @tisonkun @eolivelli @Hexiaoqiao ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have pushed new commits to go through above direction. Could you please take a look @Hexiaoqiao @tisonkun @eolivelli ?
@Hexiaoqiao Sorry for the delay. I am stuck about and also fearing this overflow things. There are not simple bugs, there are limitations. Hard to fix, only fragile workaround for best wish. I have some chaos thoughts for this issue:
I am ok to a workaround but I am also think exception(e.g. let caller know they hit an implementation limitation) is a good fallback except for background watcher. Though, I am not positive to a good workaround. I think there are few choices when encountering this hard limitation.
All these suggestions happened in https://lists.apache.org/thread/4o3rl49rdj5y0134df922zgc8clyt86s. cc @li4wang Besides above, did you find this in production ? What is your use case ? @Hexiaoqiao |
@kezhuw Thanks for your detailed comments. I totally agree that this issue is not very easy/simple to fix perfectly. But for my case, this improvement could fix it when try to reproduce.
Of course YES. The corresponding code snippet as following shows. I would like to give a brief explanation (which is dependent by Hadoop project). At first, I want to fix it at Hadoop side, but it is not smoothy and could not fix the root cause. For this PR, my first thought is fix at curator side first, then try to improve it at both zookeeper and curator side as the solution you mentioned above. Any thoughts? Thanks. [1] https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs-rbf/HDFSRouterFederation.html |
@Hexiaoqiao Thanks your sharing, that is important! I found some potential problem in the usages, though I haven't take a deep in look.
So, for your use case, I suggest:
|
@kezhuw Great response. Some concerns here.
Actually sequence number can be compatible with 32-bit integer overflow, which only should be distinguishable number without other conditions. IMO it is OK even if overflow, actually I try to test it on our test env and it works fine. So it is not hiding issue here. |
@Hexiaoqiao Glad to hear, that is good!
I am open to a workaround as long as it fit. For a workaround to work, I think we should guarantee:
I feel hopeless about viable workaround, but not against one. Anyway please go ahead, I would be happy to hear good news. |
Thanks @kezhuw for your suggestions and sorry for the late response.
I am not sure if I understood this point clearly. 'we have to throw exception if previous.getVersion() equals to -1 in call chain of trySetCount', Will it never update successfully when dataversion is back to -1 if we need to guarantee this rule? |
Yes, and no. When
The later behavior is a bug due to the contract what For the "no" part, callers can risk themselves by doing a blind update through |
+1, Agree. So how about expose the choice to end user and offer some configuration items to set? IMO, some sensitive data should not be updated blindly, but some case we should offer solution to bypass this bug, such as |
Somehow you can copy the shared count implementation..It's not quite a lot of code. Since this requirement is quite customized, I'm afraid that it's not suitable to hack into Curator. The root cause is integer overflow that is backed by ZK. ( Also, I don't understand actually how this patch "fixes" the issue. You add more strict condition to exit but the original issue is hang? |
I am -0 to this approach. Exception is the default, in anyway. Clients can bypass themselves easily on exception if they encounter or aware of this. I don't think it is worth for Curator to do that for this limitation(either ZooKeepr or Curator implementation from perspective) in case of awareness from clients. The important things for Curator here from my side are documentation and exception in code. Anything beyond that are probably overkill. |
I understand the condition now and agree that this patch should work. While it can be better for ZK to jump from version -2 to 0 so that in an overflow loop we don't handle -1 hole, the restart and get versioned value = -1 remains and set data concurrently doesn't hurt as long as we read the updated data finally. |
@eolivelli The tricky point here is that ZK doesn't expose API to manually edit node version so you should change the data for Int.MAX times which can be time-consuming.. |
Good idea! I just saw ZOOKEEPER-4743. |
@tisonkun Glad to see you here.
It will be more smooth while ZK to skip version -1, Strong +1. From my first thought, it would be the next step after we fix at Curator side. Now it is great to see that other guys already try to push this forward. |
I suggest we use The current approach does not sound correct to me(https://github.com/apache/curator/pull/478/files#r1313839760). And I think it is error-prone. When |
They are irrelevant things. For this patch, I may give another look as well as the comments above. @Hexiaoqiao you can try to address @kezhuw's comments/ |
Hi @Hexiaoqiao, do you still work on this ? Or should I take over this ? I plan to resort to |
Sorry for the late response since I took a long vacation. Please feel free to take over it if interested. |
Thank you @Hexiaoqiao ! I have pushed a commit to using |
…n of ZNode is overflow.
@Hexiaoqiao @kezhuw @eolivelli I'll review this patch in this week. I'd like to try include this patch in 5.7.0. |
Signed-off-by: tison <[email protected]>
while (true) { | ||
VersionedValue<byte[]> current = currentValue.get(); | ||
if (current.getVersion() >= version) { | ||
// A newer version was concurrently set. | ||
if (current.getZxid() >= zxid) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if only one client here and it goes through the overflow bound?
Said current.getZxid() == MAX_VALUE
and zxid == MIN_VALUE
, the update action will be skipped and the value will never updated.
Or we have different assumption on zxid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is paranoid -:).
ZooKeeper guarantees a total order of messages, and it also guarantees a total order of proposals. ZooKeeper exposes the total ordering using a ZooKeeper transaction id (zxid). All proposals will be stamped with a zxid when it is proposed and exactly reflects the total ordering. -- https://zookeeper.apache.org/doc/r3.9.0/zookeeperInternals.html
Every change to the ZooKeeper state receives a stamp in the form of a zxid (ZooKeeper Transaction Id). This exposes the total ordering of all changes to ZooKeeper. Each change will have a unique zxid and if zxid1 is smaller than zxid2 then zxid1 happened before zxid2. -- https://zookeeper.apache.org/doc/r3.9.0/zookeeperProgrammers.html
In case of above situation, I believed that ZooKeeper is doomed to failure. The "never updated" should be negligible in case of the disaster.
I met one issue which will never update znode value successfully when integer overflow (-2147483648) of znode data version using curator to invoke SharedCount#trySetCount(VersionedValue, int).
After dig the limitation logic and found that here could be the root cause.
https://github.com/apache/curator/blob/master/curator-recipes/src/main/java/org/apache/curator/framework/recipes/shared/SharedValue.java#L196-L209
My environment is, curator version: 2.10.0, zookeeper version: 3.4.6
Ref - CURATOR-688