Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make actualDataNodes expose SPI that can define expression with custom rules and add GraalVM Truffle implementation #22899

Closed
linghengqian opened this issue Dec 15, 2022 · 8 comments · Fixed by #28610 or #29309

Comments

@linghengqian
Copy link
Member

linghengqian commented Dec 15, 2022

Feature Request

For English only, other languages will not accept.

Please pay attention on issues you submitted, because we maybe need more details.
If no response anymore and we cannot make decision by current information, we will close it.

Please answer these questions before submitting your issue. Thanks!

Is your feature request related to a problem?

Describe the feature you would like.

  • Since it is impossible for most people to understand the syntax of Groovy in the first place, issues like The table generated by time fragments does not conform to the configuration logic #22884 can always be mentioned repeatedly, and the initiators of those issues have misunderstood the final results generated by certain Groovy expressions, similar results can actually be verified at https://groovyconsole.appspot.com/ .

  • If we can expose SPI for actualDataNodes, users can define special rules for expressions by implementing SPI, which will obviously help reduce some misunderstandings.

  • The significance of this issue for Make ShardingSphere Proxy in GraalVM Native Image form available #21347 is that using GroovyShell directly means an additional class loader, which means that if single tests involving GroovyShell are directly executed under GraalVM Native Image, all are bound to fail. I've opened [GR-43010] Using Groovy classes under native-image results in UnsupportedFeatureError oracle/graal#5522 and provided common unittests in ShardingSphere using GroovyShell.

  • Once the SPI of actualDataNodes is exposed, we can introduce the implementation of GraalVM Truffle in the master branch of ShardingSphere, which should allow us to use JavaScript, Python, R, Ruby, and LLVM Language in actualDataNodes. And because of the use of GraalVM Truffle, we can imagine completing the nativeTest of ShardingSphere under GraalVM Native Image.

  • For Groovy, I think it can be used unchanged, but we can also try to transfer the Groovy method call to Truffle's implementation of Espresso, which will distinguish the host JVM process from the guest JVM process to ensure that the nativeTest under GraalVM Native Image passes . But it is worth mentioning that Espresso is more limited than other Truffle API implementations.

  • I assume the YAML henceforth should be configured as such.

rules:
  - !SHARDING
    tables:
      t_order:
        actualDataNodes: 
          type: ORIGIN_GROOVY
          props:  
           expression: ds-0.t_order_$->{20221123..20221125}
        tableStrategy:
          standard:
            shardingColumn: create_time
            shardingAlgorithmName: lingh-interval
    shardingAlgorithms:
      lingh-interval:
        type: INTERVAL
        props:
          datetime-pattern: "yyyy-MM-dd HH:mm:ss.SSS"
          datetime-lower: "2022-11-23 00:00:00.000"
          datetime-upper: "2022-11-26 00:00:00.000"
          sharding-suffix-pattern: "_yyyyMMdd"
          datetime-interval-amount: 1
          datetime-interval-unit: "DAYS"
  • The interface corresponding to SPI should be similar to.
import org.apache.shardingsphere.infra.util.spi.lifecycle.SPIPostProcessor;
import org.apache.shardingsphere.infra.util.spi.type.typed.TypedSPI;
import java.util.Properties;

public interface ShardingSphereExpressionParser extends TypedSPI, SPIPostProcessor {

   /**
     * Replace all inline expression placeholders.
     * 
     * @return result inline expression with placeholders
     */
   String handlePlaceHolder();

   /**
     * Split and evaluate inline expression.
     *
     * @return result list
     */
   List<String> splitAndEvaluate();

   /**
     * Get inline expression with placeholders by properties.
     *
     * @return properties
     */
    Properties getProps();
}
@RaigorJiang
Copy link
Contributor

When sharding by interval, it's really not easy for new users to understand the meaning of Groovy expressions.
Looking forward to further discussion!

@linghengqian
Copy link
Member Author

  • For a long time no one gave an opinion. Let me expand on this issue a bit. Considering sharding by date, we should not limit the expression, but directly build an algorithmic SPI.
import org.apache.shardingsphere.infra.util.spi.lifecycle.SPIPostProcessor;
import org.apache.shardingsphere.infra.util.spi.type.typed.TypedSPI;
import java.util.Properties;

public interface AbstractActualdataNodes extends TypedSPI, SPIPostProcessor {
   /**
     *
     * @return result real table list
     */
   List<String> getActualDataNodes();

   /**
     * Get properties.
     *
     * @return properties
     */
    Properties getProps();
}
  • We should only consider the java.util.List of the final real table. For the simplest case of sharding by date, the configuration of the simplest implementation class using JSR 310 should be similar to the following.
rules:
   - !SHARDING
     tables:
       t_order:
         actualDataNodes:
           type: SINGLE_TABLE
           props:
            table-prefix: t_order # The prefix of the real table
            datetime-lower: 2022-10-01 # lower limit of time
            datetime-upper: 2022-12-31 # time upper limit
            datetime-pattern: yyyy-MM-dd # A string conforming to the format of java.time.format.DateTimeFormatter, used to convert datetime-lower and datetime-upper
            table-suffix-pattern: _yyyyMM # The suffix of the real table, also follows the format of java.time.format.DateTimeFormatter
            datetime-interval-amount: 1 # time interval
            datetime-interval-unit: MONTHS # follow java.time.temporal.ChronoUnit
  • It should end up producing an ArrayList containing [t_order_202210,t_order_202211,t_order_202212].
  • Let's assume a simple function to do this conversion.
import org.junit.jupiter.api.Test;

import java.time.LocalDate;
import java.time.format.DateTimeFormatter;
import java.time.temporal.ChronoField;
import java.time.temporal.ChronoUnit;
import java.time.temporal.TemporalAccessor;
import java.util.List;
import java.util.stream.LongStream;

public class DateTest {
    @Test
    void testDate() {
        DateTimeFormatter dateTimeFormatter = DateTimeFormatter.ofPattern("yyyy-MM-dd");
        TemporalAccessor start = dateTimeFormatter.parse("2022-10-01");
        TemporalAccessor end = dateTimeFormatter.parse("2022-12-31");
        if (!start.isSupported(ChronoField.NANO_OF_DAY) && start.isSupported(ChronoField.EPOCH_DAY)) {
            LocalDate startTime = start.query(LocalDate::from);
            LocalDate endTime = end.query(LocalDate::from);
            List<String> actualDataNodes = LongStream.range(0, ChronoUnit.MONTHS.between(startTime, endTime.plusMonths(1)))
                    .mapToObj(startTime::plusMonths)
                    .map(localDate -> "t_order" + localDate.format(DateTimeFormatter.ofPattern("_yyyyMM")))
                    .toList();
            assert actualDataNodes.equals(List.of("t_order_202210", "t_order_202211", "t_order_202212"));
        }
    }
}

@linghengqian
Copy link
Member Author

  • I'm working on this issue. Since I am not familiar with Antlr, I did not start to process the DistSQL syntax, but modified the execution logic on the java code.

@zhfeng
Copy link
Contributor

zhfeng commented Jan 7, 2023

I think this could be also helpful to quarkus-shardingsphere-jdbc to work in native mode. @linghengqian what's DistSQL syntax we need to modify? I can take a look since I have some experience with Antlr before.

@linghengqian
Copy link
Member Author

@zhfeng

CREATE SHARDING TABLE RULE t_order_item (
DATANODES("ds_${0..1}.t_order_item_${0..1}"),
DATABASE_STRATEGY(TYPE="standard",SHARDING_COLUMN=user_id,SHARDING_ALGORITHM(TYPE(NAME="inline",PROPERTIES("algorithm-expression"="ds_${user_id % 2}")))),
TABLE_STRATEGY(TYPE="standard",SHARDING_COLUMN=order_id,SHARDING_ALGORITHM(TYPE(NAME="inline",PROPERTIES("algorithm-expression"="t_order_item_${order_id % 2}")))),
KEY_GENERATE_STRATEGY(COLUMN=another_id,TYPE(NAME="snowflake")),
AUDIT_STRATEGY (TYPE(NAME="DML_SHARDING_CONDITIONS"),ALLOW_HINT_DISABLE=true)
);

CREATE SHARDING TABLE RULE IF NOT EXISTS t_order_item (
DATANODES("ds_${0..1}.t_order_item_${0..1}"),
DATABASE_STRATEGY(TYPE="standard",SHARDING_COLUMN=user_id,SHARDING_ALGORITHM(TYPE(NAME="inline",PROPERTIES("algorithm-expression"="ds_${user_id % 2}")))),
TABLE_STRATEGY(TYPE="standard",SHARDING_COLUMN=order_id,SHARDING_ALGORITHM(TYPE(NAME="inline",PROPERTIES("algorithm-expression"="t_order_item_${order_id % 2}")))),
KEY_GENERATE_STRATEGY(COLUMN=another_id,TYPE(NAME="snowflake")),
AUDIT_STRATEGY (TYPE(NAME="DML_SHARDING_CONDITIONS"),ALLOW_HINT_DISABLE=true)
);

ALTER SHARDING TABLE RULE t_order_item (
DATANODES("ds_${0..3}.t_order_item${0..3}"),
DATABASE_STRATEGY(TYPE="standard",SHARDING_COLUMN=user_id,SHARDING_ALGORITHM(TYPE(NAME="inline",PROPERTIES("algorithm-expression"="ds_${user_id % 4}")))),
TABLE_STRATEGY(TYPE="standard",SHARDING_COLUMN=order_id,SHARDING_ALGORITHM(TYPE(NAME="inline",PROPERTIES("algorithm-expression"="t_order_item_${order_id % 4}")))),
KEY_GENERATE_STRATEGY(COLUMN=another_id,TYPE(NAME="snowflake")),
AUDIT_STRATEGY(TYPE(NAME="dml_sharding_conditions"),ALLOW_HINT_DISABLE=true)
);

SHOW SHARDING TABLE RULES;

SHOW SHARDING TABLE RULES FROM sharding_db;

@zhfeng
Copy link
Contributor

zhfeng commented Jan 7, 2023

Oh, I'm sorry to hear that. Take care yourself in this tough time and hope you recover soon!

@linghengqian
Copy link
Member Author

linghengqian commented Aug 30, 2023

  • It may not be reasonable to make such changes in non-major versions. Now I personally tend to introduce some specific identification symbols in the expression of actualDataNode, so that when parsing actualDataNode, some methods can try to find the implementation of SPI according to the Type. Such a module should be able to default to using the corresponding SPI implementation of Groovy expressions to resolve actualDataNode if resolution fails or no such identifier exists. All work should only affect shardingsphere-infra-expr and its submodules.

  • With GraalVM CE Dev 23.1.0 deprecating GraalVM Updater, I think the ShardingSphere master branch must discard the existence of Truffle, because its minimum JDK version has been raised to JDK21, and Truffle Espresso is GPL LICENSE, similar implementations will have to be implemented by the user side .

@linghengqian
Copy link
Member Author

linghengqian commented Sep 3, 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment