Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 3.3.3 #26

Merged
merged 3 commits into from
Jun 22, 2018
Merged

Release 3.3.3 #26

merged 3 commits into from
Jun 22, 2018

Conversation

tovbinm
Copy link
Collaborator

@tovbinm tovbinm commented Jun 22, 2018

Change List

  1. Convert some more stages tests to use OP stages specs (Pivot with max cardinality percentage #241)
  2. Changed error to occur only when all labels are removed (Outputting Raw Feature Filter information: Part 1 #237)
  3. Fixes for writing/reading stages in OpPipelineStageSpec tests (Logo for TransmogrifAI #235)
  4. Add files via upload (Tweaks to OpBinScoreEvaluator #233)
    workflow description figures
  5. Stop words changes to text analyzers and bug fixes (Cleanup Helloworld examples #230)
  6. Update README.md (Fix rawPrediction of OpXGBoostClassifcationModel for binary classification #229)
  7. Remove null leakage checks for text features from sanity checker (Unable to create features dynamically. #228)
    Update sanity checker shortcut with protectTextSharedHash param (build failed #234)
  8. Remove JSD check for date + datetime features in RFF (Reverse geocoder with Lucene #227)
  9. Introduced FeatureBuilder.fromDataFrame function allowing materializing features from a DataFrame (Cleanup helloworld example + decrease logging verbosity to ERROR #226)
  10. Get rid of ClassTags in OP models (Make decision tree numeric bucketizer tests less flaky #225)
  11. Test if transformer transforms the data correctly after being loaded (Scaler and descaler transformers #223)
  12. Update to BSD-3 license (Upgrade to Gradle 5.2 #218)
    Some more licenses (Make tests a little less flaky #221)
  13. Changed from extending to wrapping spark models.
    wrapped spark model classed using reflection (part 1 of 2) (Fix indices in LOCO for record-level insights and add more robust tests #216)
    wrapped spark estimators so that they return op wrapped models with prediction return type (part 2a of 2) (Release 0.5.1 #222)
    wrapped spark estimators for new models added (part 2b of 2) (Random param builder for random hyperparameter search in model selectors #238)
    Moved code out of spark ml workspace and added comments - no code changes after tickets (TransmogrifAI on Apache Zeppelin #239)
  14. Change ootb transformers to use OPTransformerSpec for tests (Scaler and descaler transformers #215)
  15. Move base stages to features sub project + test classes and specs (Cl/rff metrics #214)
  16. Better clues when asserting stages (Fix sorting in Prediction type for multiclass classification and add stronger tests #213)
  17. Implement multi-class threshold metrics (Integrate helloworld project with Travis CI #212)
  18. NameEntityRecognizer (NER) transformer (Is it possible to prevent fields from being used as features but keep them as output fields? #209)
  19. Allow customizing feature type equality in op test transformer/estimator specs (test cases for RichListFeature #207)
  20. Threshold metrics bug fix (Use class.getName & update splitter meta parsing #204)
    use prediction rather than raw prediction
  21. Added an extra OpEstimatorBaseSpec base class with loosen model type boundaries to allow testing Spark wrapped estimators (Add package which gives ability compile check and execute code provided in documentation #203)
    Fix package access level on OpEstimatorBaseSpec (Error in transmogrifai gen when field has an underscore #205)
    internal OP test base class
  22. Fast materializer method FeatureTypeSparkConverter by full feature type name (Correct some syntax/compilation errors in Titanic Binary Classification Docs Example #202)
  23. Added UID.reset() before tests so that all workflows will generate the same feature names (Syntax/Compilation errors in Titanic Binary Classification Docs Example #201)
  24. Added add/subtract operations for Spark ml Vector types (Regression error = 0.0 - looking for suggestions #200)
  25. workflow cleanup (Export model selector defaults + metadata fixes #199)
  26. Fix TextMapNullEstimator to count a null when text entirely removed by tokenizer (The problem of Xgboost #198)
    fix the issue that certain text strings can be entirely removed by our tokenizers, but null tracking step for text map vectorizers just checks for the presence of a key
  27. Workflow CV Fixes (Possible solution for issue #154 (Geolocation to Country transformer) #196)
    fix dead lock in OpCrossValidation.findBestModel happened due to the fact that when running splits processing in parallel these threads would try to access spake stage params on the same stages.
  28. Update ternary, quaternary and sequence transformer/estimator bases tests (Adds options for tracking text length in text vectorizers #195)
  29. Enabling null-label leakage detection in RawFeatureFilter (Error: Could not find or load main class com.salesforce.op.cli.CLI #191, Illegal character in path at index 2: ..\test-data\PassengerData.avro #192, Use OS specific path separator #193)
  30. Feature Type values docs (Add transformer / estimator for text length calculation #190)
  31. Bump up lucene version and add lucene-opennlp package (Allow convertion from Date and Timestamp Spark types to Date and DateTime TransmogrifAI types #188)
  32. Minor README cleanup (Add length of the text as default features for text fields #187, TransmogirfAI build issues  #189)
  33. Test specs for OP stages (Release 0.5.0 #186)
  34. Adding pr_curve, roc_curve metrics (Upgrade Apache Spark to 2.4 #184)
  35. Create hash space strategy param (Can't use the cloned project #182)
  36. Make new Cross Validation (XGBoost error code 255 #181)
  37. Avoid reseting UID in every test, but only do it when necessary (Upgrade XGBoost to 0.81 #180)
  38. Upgrade to gradle 4.7 (Integrate Streaming Histogram into RawFeatureFilter. #179)
  39. Added OpTransformer.transformKeyValue to allow transforming Map and any other key/value types (Evaluators check for empty data #178) in preparation for sparkless scoring
  40. Adding autoBucketize to transmogrify for numerics & numeric maps + pass in optional label Replace assert with require #159
  41. Autobucketizing for numeric maps should not fail if map is empty, instead we generate empty column for empty numeric map jupyter notebooks for transmogrify samples #231

Migration Guide

  1. OpLogisticRegression() is in progress (evaluator needs updates)
    may use BinaryClassificationModelSelector() instead

  2. need to add .setProbabilityCol($probCol) to evaluator in workflow definition to make sure that the evaluator will get the correct probability column to do the calculation

@tovbinm tovbinm requested a review from sxd929 June 22, 2018 19:46
Copy link
Contributor

@sxd929 sxd929 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tovbinm tovbinm merged commit b49d81c into master Jun 22, 2018
@tovbinm tovbinm deleted the mt/3.3.3-release branch June 22, 2018 20:37
ericwayman pushed a commit that referenced this pull request Feb 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants