Issues with Topic-Level Policies after Cluster Restart in Pulsar 3.0.5 #24930
-
|
We are experiencing an issue with Pulsar 3.0.5 in our environment and would like to seek your advice. Environment DetailsPulsar Version: 3.0.5 Bookie Configuration:journalSyncData: "true"
journalWriteData: "true"Broker Configuration:allowAutoTopicCreation: "true"
brokerDeleteInactiveTopicsEnabled: "false"
defaultNumPartitions: "3"
defaultRetentionSizeInMB: "103424"
defaultRetentionTimeInMinutes: "4320"
managedLedgerDefaultAckQuorum: "2"
managedLedgerDefaultEnsembleSize: "2"
managedLedgerDefaultWriteQuorum: "2"
managedLedgerMaxEntriesPerLedger: "50000"
managedLedgerMaxLedgerRolloverTimeMinutes: "240"
managedLedgerMinLedgerRolloverTimeMinutes: "10"
systemTopicEnabled: "true"
topicLevelPoliciesEnabled: "true"Problem DescriptionAfter a cluster restart (including after a power outage), some topics may occasionally encounter the following error, causing them to be unable to produce or consume: Error 1: BrokerService Exception2025-10-31T14:08:29,658+0000 [pulsar-io-5-8] ERROR org.apache.pulsar.broker.service.BrokerService - Topic creation encountered an exception by initialize topic policies service. topic_name=persistent://10001001/default/log-partition-4 error_message=The subscription multiTopicsReader-f5fb22e226 of the topic persistent://10001001/default/__change_events-partition-0 gets the last message id was failed
{"errorMsg":"Failed to read last entry of the compacted Ledger Error while reading ledger","reqId":4227693217171430891, "remote":"pulsar-broker-0.pulsar-broker.pulsar.svc.cluster.local/22.25.102.149:6650", "local":"/22.25.102.149:59422"}
org.apache.pulsar.client.api.PulsarClientException$BrokerMetadataException: The subscription multiTopicsReader-f5fb22e226 of the topic persistent://10001001/default/__change_events-partition-0 gets the last message id was failed
{"errorMsg":"Failed to read last entry of the compacted Ledger Error while reading ledger","reqId":4227693217171430891, "remote":"pulsar-broker-0.pulsar-broker.pulsar.svc.cluster.local/22.25.102.149:6650", "local":"/22.25.102.149:59422"}
at org.apache.pulsar.client.api.PulsarClientException.wrap(PulsarClientException.java:993) ~[org.apache.pulsar-pulsar-client-api-v3.0.5-v1.0.1.jar:v3.0.5-v1.0.1]
at org.apache.pulsar.client.impl.ConsumerImpl.lambda$internalGetLastMessageIdAsync$64(ConsumerImpl.java:2566) ~[org.apache.pulsar-pulsar-client-original-v3.0.5-v1.0.1.jar:v3.0.5-v1.0.1]
at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:990) ~[?:?]
at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:974) ~[?:?]
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[?:?]
at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162) ~[?:?]
at org.apache.pulsar.client.impl.ClientCnx.handleError(ClientCnx.java:792) ~[org.apache.pulsar-pulsar-client-original-v3.0.5-v1.0.1.jar:v3.0.5-v1.0.1]
at org.apache.pulsar.common.protocol.PulsarDecoder.channelRead(PulsarDecoder.java:192) ~[org.apache.pulsar-pulsar-common-v3.0.5-v1.0.1.jar:v3.0.5-v1.0.1]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) ~[io.netty-netty-transport-4.1.115.Final.jar:4.1.115.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[io.netty-netty-transport-4.1.115.Final.jar:4.1.115.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[io.netty-netty-transport-4.1.115.Final.jar:4.1.115.Final]
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:346) ~[io.netty-netty-codec-4.1.115.Final.jar:4.1.115.Final]
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:318) ~[io.netty-netty-codec-4.1.115.Final.jar:4.1.115.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) ~[io.netty-netty-transport-4.1.115.Final.jar:4.1.115.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[io.netty-netty-transport-4.1.115.Final.jar:4.1.115.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[io.netty-netty-transport-4.1.115.Final.jar:4.1.115.Final]
at io.netty.handler.flush.FlushConsolidationHandler.channelRead(FlushConsolidationHandler.java:152) ~[io.netty-netty-handler-4.1.115.Final.jar:4.1.115.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) ~[io.netty-netty-transport-4.1.115.Final.jar:4.1.115.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[io.netty-netty-transport-4.1.115.Final.jar:4.1.115.Final]
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:868) ~[io.netty-netty-transport-4.1.115.Final.jar:4.1.115.Final]
at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:799) ~[io.netty-netty-transport-classes-epoll-4.1.115.Final.jar:4.1.115.Final]
at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:501) ~[io.netty-netty-transport-classes-epoll-4.1.115.Final.jar:4.1.115.Final]
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:399) ~[io.netty-netty-transport-classes-epoll-4.1.115.Final.jar:4.1.115.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) ~[io.netty-netty-common-4.1.115.Final.jar:4.1.115.Final]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[io.netty-netty-common-4.1.115.Final.jar:4.1.115.Final]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty-netty-common-4.1.115.Final.jar:4.1.115.Final]
at java.lang.Thread.run(Thread.java:840) ~[?:?]Error 2: Admin API FailuresWhen running pulsar-admin topics stats: pulsar-admin topics stats persistent://business/notification/entry-partition-0 --- An unexpected error occurred in the server ---
Message: Topic creation encountered an exception by initialize topic policies service. topic_name=persistent://business/notification/entry-partition-0 error_message={"errorMsg":"Error while reading ledger - ledger=684 - operation=Failed to read entry - entry=0","reqId":3147838028978352523, "remote":"pulsar-broker-2.pulsar-broker.pulsar.svc.cluster.local/22.25.106.148:6650", "local":"/22.25.102.143:43158"}
Stacktrace:
org.apache.pulsar.broker.service.BrokerServiceException$ServiceUnitNotReadyException: Topic creation encountered an exception by initialize topic policies service. topic_name=persistent://business/notification/entry-partition-0 error_message={"errorMsg":"Error while reading ledger - ledger=684 - operation=Failed to read entry - entry=0","reqId":3147838028978352523, "remote":"pulsar-broker-2.pulsar-broker.pulsar.svc.cluster.local/22.25.106.148:6650", "local":"/22.25.102.143:43158"}
at org.apache.pulsar.broker.service.BrokerService.lambda$getTopic$28(BrokerService.java:1080)
at java.base/java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:990)
at java.base/java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:974)
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162)
at org.apache.pulsar.client.impl.PulsarClientImpl.lambda$createSingleTopicReaderAsync$14(PulsarClientImpl.java:689)
at java.base/java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:990)
at java.base/java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:974)
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162)
at org.apache.pulsar.client.impl.MultiTopicsConsumerImpl.lambda$new$2(MultiTopicsConsumerImpl.java:193)
at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2147)
at org.apache.pulsar.client.impl.MultiTopicsConsumerImpl.lambda$closeAsync$24(MultiTopicsConsumerImpl.java:634)
at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2147)
at org.apache.pulsar.client.impl.ConsumerBase.lambda$failPendingReceive$1(ConsumerBase.java:349)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:840)Attempted Recovery MethodsRecreating __change_events Topics: but the errors persisted. Questions for the CommunityIs this a known bug in Pulsar? If so, in which version was it fixed? Are there any temporary workarounds besides disabling topic-level policies entirely? Currently, we're considering reverting to namespace-level policies. Are there other recommended solutions to maintain topic-level Policy requirements while avoiding this issue? Any insights or suggestions would be greatly appreciated. Thank you! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Yes.
The problem might be caused by BookKeeper data loss issue (apache/bookkeeper#4607) fixed in BookKeeper release 4.16.7 / 4.17.2 included since Pulsar 3.0.13, 3.3.8 and 4.0.6 releases. The latest supported Pulsar releases are 3.0.15, 4.0.9 and 4.1.2, so it's recommend to upgrade to one of the supported versions.
That might be a viable option in certain cases. Please check the issues for possible workaround.
Make sure to run an up-to-date version of Pulsar. There are hundreds of bug fixes since 3.0.5 release. If you run on an old version, you might be running into issues that have already been addressed. |
Beta Was this translation helpful? Give feedback.
Yes.
The problem might be caused by BookKeeper data loss issue (apache/bookkeeper#4607) fixed in BookKeeper release 4.16.7 / 4.17.2 included since Pulsar 3.0.13, 3.3.8 and 4.0.6 releases.
The latest supported Pulsar releases are 3.0.15, 4.0.9 and 4.1.2, so it's recommend to upgrade to one of the supported versions.
https://pulsar.apache.org/download/
That might be a viable option in certain cases. Please check the issues for possible workaround.