Pipeline / pipeline / d4c84106ce1

Commits

Kristin Berry authored d4c84106ce1 Merge 07 Aug 2024

Pull request #1338: PIPE-2094 generate run statistics file for internal use

Merge in PIPE/pipeline from PIPE-2094-generate-run-statistics-file-for-internal-use to main

* commit 'a40dbe6bb769ec8ccc60b62d007792da83fe205b': (23 commits)
  PIPE-2094: Remove non-existant file name from error message and also update source sorting to use source.name as the key.
  PIPE-2094: Remove unused imports, add key sorting to json file, consolidate nested if-statements.
  PIPE-2094: Remove draft stats values from this branch. Will include in a followup branch later.
  PIPE-2094: Fix issue that was preventing stats_extractor results from being added to the pool and update structure of flagdata_percentage to not repeat the MS name
  PIPE-2094: Update output to use TARGET instead of SOURCE and n_targets to n_target.
  PIPE-2094: Add n_pointings to SOURCE layer and update bands to be a list of band names.
  PIPE-2094: Remove RegressionExtractor as a parent to StatsExtractor and relegate potential restructuring of this to a future ticket.
  PIPE-2094: Add a 'SOURCE' level to the enum and output format. Update longdescriptions and two statistics value names based on feedback.
  PIPE-2094: Slight restructure to break up a large function.
  PIPE-2094: Update to use the spw id from the first MS for all top-level SPW attributes. Also add a handful of different statistics values to be included.
  PIPE-2094: Move main stats generating function from pipline_statistics to stats_extractor. Clean up and add documentation to stats_extractor.
  PIPE-2094: Fix virtual spw calculation
  PIPE-2094: Add docstrings, additional comments, and tidy-up code.
  PIPE-2094: Removed temporary flat format option.
  PIPE-2094: Switch to use virtual spws for SPW-level information and move this level to directly under MOUS instead of underneath the EB level
  PIPE-2094: Switch to use PipelineStatisticLevel enum. Add ‘EB’ and ‘SPW’ levels above the actual eb and spw information to the output json file. Update to only output EB information for 1 MS datatype to remove redundant output. Update project ID information to output a string instead of a list.
  PIPE-2094: Update to use nested dict structure for output.
  PIPE-2094: Add nested format output option and generally clean up code.
  PIPE-2094: Clean up and restructure code a bit. Add missing eb and mous information to spw-level stats
  PIPE-2094: Add context into stats extraction. Add some per-EB, per-MOUS, and per-SPW values. Add flat version of format in which level is specified
  ...

pipeline/h/tasks/exportdata/exportdata.py

Modified


507 507	# The keys are the session names
508 508	# The values are a tuple containing the vislist and the caltables
509 509	sessiondict = collections.OrderedDict()
510 510	for i in range(len(session_names)):
511 511	sessiondict[session_names[i]] = \
512 512	([os.path.basename(visfile) for visfile in session_vislists[i]], \
513 513	os.path.basename(caltable_file_list[i]))
514 514
515 515	return sessiondict
516 516
517 -	def _do_if_auxiliary_products(self, oussid, output_dir, products_dir, vislist, imaging_products_only):
517 +	def _do_if_auxiliary_products(self, oussid, output_dir, products_dir, vislist, imaging_products_only, pipeline_stats_file=None):
518 518	"""
519 519	Generate the auxiliary products
520 520	"""
521 -
522 521	if imaging_products_only:
523 522	contfile_name = 'cont.dat'
524 523	fluxfile_name = 'Undefined'
525 524	antposfile_name = 'Undefined'
526 525	else:
527 526	fluxfile_name = 'flux.csv'
528 527	antposfile_name = 'antennapos.csv'
529 528	contfile_name = 'cont.dat'
530 529	empty = True
531 530

565 564	if timetracker_file_list:
566 565	empty = False
567 566
568 567	# PIPE-1802: look for the selfcal/restore resources
569 568	selfcal_resources_list = []
570 569	if hasattr(self.inputs.context, 'selfcal_resources') and isinstance(self.inputs.context.selfcal_resources, list):
571 570	selfcal_resources_list = self.inputs.context.selfcal_resources
572 571	if selfcal_resources_list:
573 572	empty = False
574 573
574 +	# PIPE-2094: check for the pipeline stats file
575 +	if pipeline_stats_file and os.path.exists(pipeline_stats_file):
576 +	empty = False
577 +
575 578	if empty:
576 579	return None
577 580
578 581	# Define the name of the output tarfile
579 582	tarfilename = f'{oussid}.auxproducts.tgz'
580 583	LOG.info('Saving auxiliary data products in %s', tarfilename)
581 584
582 585	# Open tarfile
583 586	with tarfile.open(os.path.join(products_dir, tarfilename), 'w:gz') as tar:
584 587

618 621	LOG.info('Saving auxiliary data product %s in %s', os.path.basename(timetracker_file), tarfilename)
619 622	else:
620 623	LOG.info('Auxiliary data product timetracker json report file does not exist')
621 624
622 625	# PIPE-1802: Save selfcal restore resources
623 626	for selfcal_resource in selfcal_resources_list:
624 627	if os.path.exists(selfcal_resource):
625 628	tar.add(selfcal_resource, arcname=selfcal_resource)
626 629	LOG.info('Saving auxiliary data product %s in %s', selfcal_resource, tarfilename)
627 630
631 +	# PIPE-2094: Save pipeline statistics file
632 +	if pipeline_stats_file and os.path.exists(pipeline_stats_file):
633 +	tar.add(pipeline_stats_file, arcname=pipeline_stats_file)
634 +	LOG.info('Saving pipeline statistics file %s in %s', pipeline_stats_file, tarfilename)
635 +	else:
636 +	LOG.info("Pipeline statistics file does not exist.")
628 637	tar.close()
629 638
630 639	return tarfilename
631 640
632 641	def _make_pipe_manifest(self, context, oussid, stdfproducts, sessiondict, msvisdict, exportmses, calvisdict,
633 642	exportcalprods, calimages, calimages_fitskeywords, targetimages, targetimages_fitskeywords):
634 643	"""
635 644	Generate the manifest file
636 645	"""
637 646

pipeline

Commits

Jira Issues

Add shortcut