Skip to content

Latest commit

 

History

History
143 lines (120 loc) · 7.45 KB

README.md

File metadata and controls

143 lines (120 loc) · 7.45 KB

Cluster Tool

A script for analysing clusters in n-body simulations.

Clusters are identified using DBSCAN, INDICATE and the star's energies. The clusters are then compared to a standard Maschberger IMF model with the given upper and lower bounds using 1-way ks-tests and Cramér-von Mises tests.

Chi-squared tests are also used to compare the histogram of the mass distribution, but the results of these are not statistically valid due to low bin counts - the results are just for curiosity.

Usage

python3 ./cluster_tool.py data_file output_dir [options]

Required parameters

Option Description
data_file Path to input data file
output_dir Path to output directory

Optional parameters

Option Description Default
--data_sep [DATA_SEP] Separator used in simulation data ','
--data_header [DATA_HEADER] Path to a text file containing headers for simulation data None - Read header from data file
-d [DIMENSIONS], --dimensions [DIMENSIONS] No. of dimensions to run the analysis in 2
-e [EPS], --eps [EPS] DBSCAN: Maximum distance between stars in a cluster 3.0
-m [MIN_SAMPLES], --min_samples [MIN_SAMPLES] DBSCAN: Minimum no. of stars per cluster 10
-u [N_DIST], --n_dist [N_DIST] INDICATE: No. of uniform distributions 1
-n [NEAREST_NN], --nearest_nn [NEAREST_NN] INDICATE: No. of nearest neighbours 5
--pos_axes [POS_AXES] Column names of position axes 'x,y,z'
--vel_axes [VEL_AXES] Column names of velocity axes 'v_x,v_y,v_z'
--min_mass [MIN_MASS] Minimum mass 0.1
--max_mass [MAX_MASS] Maximum mass 50
-f, --force-all-steps Force all steps to run - even if they have already been run before. False
-h, --help Show a help message and exit N/A

Dependencies

This program requires Python 3 (ideally >= 3.11), and pip is required to install other Python libraries.

Python libraries

  • matplotlib ~= 3.7
  • numpy ~= 1.25
  • scipy ~= 1.11
  • scikit-learn ~= 1.3
  • pandas ~= 2.0

You can quickly install these in either of two ways:

  • Install using pip and requirements.txt (this installs them into your system or user packages)

    python3 -m pip install -r requirements.txt
  • Install using pipenv and Pipfile.lock (this installs them separately from system or user packages)

    python3 -m pip install pipenv
    pipenv install

    Note that using this method, you will have to run the script using pipenv like so:

    pipenv run ./cluster_tool.py [...]

Input data format

The following columns are required:

Name Description
snapshot Snapshot number - must start from 0
star_id Star's unique id number
mass Mass of the star
x x co-ordinate of the star's position
y y co-ordinate of the star's position
z z co-ordinate of the star's position
v_x x component of the star's velocity
v_y y component of the star's velocity
v_z z component of the star's velocity

Output data format

Main output files

The main output file will have the columns included in the input file plus the following columns:

Name Description
dbscan_cluster_id Initial cluster determined by DBSCAN
indicate_index Star's INDICATE index
indicate_sig_index Significant INDICATE index for that snapshot
ke Star's kinetic energy
pe Star's potential energy
indicate_clustered Whether or not the star's INDICATE index is above the significant index
bound_clustered Whether or not the star is gravitationally bound
closest_cluster_id The nearest cluster to the star as found by DBSCAN

Statistics files (*_stats.csv)

The first two columns are the snapshot and the cluster id:

Name Description
snapshot Snapshot number
cluster Cluster id

After that lies the results for the statistical tests performed on the clusters:

Name Description
no-of-stars Number of stars in the cluster
ks-statistic ks-test statistic
ks-pvalue ks-test p-value
cs-statistic Chi-squared test statistic (do not use)
cs-pvalue Chi-squared test p-value (do not use)
cvm-statistic CvM test statistic
cvm-pvalue CvM test p-value

Clusters are determined by multiple methods, and the column names for the results of each method has a prefix added:

Method Prefix Logic
DBSCAN None Just use dbscan_cluster_id
DBSCAN + INDICATE indicate_ If indicate_index > indicate_sig_index: use closest_cluster_id; else use -1
DBSCAN + graviationally bound bound_ If (ke + pe) < 0: use closest_cluster_id; else use -1
DBSCAN + INDICATE + graviationally bound indicate+bound_ If (indicate_index > indicate_sig_index) and ((ke + pe) < 0): use closest_cluster_id; else use -1

Contributing

You can contribute to this project in multiple ways:

  • 🐛 Reporting bugs: If you find any bugs, please let us know by opening an issue.
  • Feature requests: If there's a feature that would be really useful to add, let us know on the discussion board.
  • 📖 Documentation: If there's any errors or missing parts to any documentation, you can make improvements and open a pull request.
  • 🖥️ Code: This project is open source, so anyone is free to modify the code as they wish. If you implement new functionality that may be useful to others, please consider opening a pull request.

Disclaimer

There is no guarantee that there are not mistakes in the code! Use at your own risk.

Attribution

References

License

Cluster Tool is licensed under the MIT License.

The INDICATE module is based off code by George Baylock-Squibbs, which is based off abuckner89/INDICATE by Anne S.M. Buckner (used under the MIT License).