8ea2ee19dc5af47f7709e634dd631a06.ppt
- Количество слайдов: 7
Technology Infusion: Text-Mining and Tagging for Software Change Requests Executive Briefing Jane T. Malin and David R. Throop NASA Johnson Space Center (JSC) Project: Technology Infusion of Text-mining for Problem Trending into Software Change Reports at JSC Software Assurance Symposium September, 2008 SAS_08_Text-Mining_Tagging_”Software_Change_Requests”_Grissom
The Problem/NASA Relevance International Space Station generates ~1400 Software Change Requests (SCR) annually It is difficult to find trends and recurring anomalies within the large set of SCRs. • Particularly urgent when trying to find ‘more reports similar to this one’ during flight anomalies • Typical “manual” analysis uses database searches • Critical information about software changes is captured in natural-language text fields (English sentences. ) – Text is not well behaved, so keyword search or data mining approaches fail • Syntactic and semantic variants are used often SAS_08_Text-Mining_Tagging_”Software_Change_Requests”_Grissom
Approach Leverage Text-Mining technology used to: • Extract model parts for system modeling from requirements • Find trends in Discrepancy Reports Semantic Text Mining and Tagging • Analyzes sets (10, 000 s) of problem-report records from databases – Each record has multiple fields, some of which contain English-language text describing problems, causes, consequences, equipment. • Text-mining approach – Performs syntactic parsing of each text field in the data record – Uses hierarchical aerospace ontologies of concepts and nomenclature to identify problem-type or equipment-type tags to add to each record • Searches for word-patterns that match problems or entities of interest – Adds additional tag fields to records – Uses tags for graphs and other browsing capabilities for analysts SAS_08_Text-Mining_Tagging_”Software_Change_Requests”_Grissom
Current Capability • User: ISS Robotics • 3200. html SCR records to converted tab-delimited format • Text analysis and hierarchical tagging for problem types – Capability to limit tagging scope to only software failures • Analysis of multiple fields • Improved bar chart formats Errors co-occurring with ‘Deactivation’ in one year SAS_08_Text-Mining_Tagging_”Software_Change_Requests”_Grissom
Current Software Problems • Software problem type hierarchy from Aerospace Ontology, with mapping words – Software_Threat: spyware, spam, virus, malware, worm, Trojan horse, Trojan, root kit, exploit, ping, brute force attack, dictionary attack, replay attack, piggybacking, denial of service, sabotage Programmer_Error: programmer error, {Bad} programming practice Software_or_Computer_Error (error, faulty): software error, software problem, BIT error, controller error, computer error, display error, program error, bit count error, check error, not reinitialized, compiler error, bug, phase error, exception, {Programming_Language} exception, page fault, general protection fault, halt failure, crash – – • • – Software_Security_Anomaly: protocol anomaly, traffic anomaly Software_Sequence_Error: command sequence error, task sequence error, boot sequence error, function sequence error, sequence error – Software_Resource_Contention: thrashing, unwanted synchronization, multithread error, deadlock, live lock, lock error, contention, race condition, data race Data_Error: data error, bit error, parity error, missing pointer, i/o error, input/output error, word error, divide by zero Corruption: corrupted packet, corrupt file – Memory_Error: corrupted memory, memory write error, memory error, read error, integer overflow, buffer overflow, memory leak, {Insufficient} memory, overwritten memory, overwrite, write over, write on top of Software_Vulnerability: dangling pointer, format string vulnerability, code injection, intrusion, hijack Bad_Software_Structure: {Bad} {Software_Structure} Missing_Software_Structure: {Missing} {Software_Structure} Software_Not_Responding: crash, hang up, lock up, freeze Note: Brackets expand. For example, {Software_Structure} expands to: comment, code, dictionary, expression, statement, instruction, computation, algorithm, string, thread, pointer, link, hyperlink, reference, command sequence, error log, DLL, load, software load, dump data, segment, use-define chain, call graph, control flow graph, handler SAS_08_Text-Mining_Tagging_”Software_Change_Requests”_Grissom
Technical Challenges • Software module names identify system failure modes – E. g. don’t tag Fire-in-cabin annunciation as ‘FIRE’. – Handled by tagging only software-related failures Usability Challenges • Determining what trends are most useful – User interviews – Repeated prototypes • Redesigning user displays to accommodate information overload SAS_08_Text-Mining_Tagging_”Software_Change_Requests”_Grissom
Planned Capability • Additional iteration of suggestions and refinement of requirements – More software failure terms and concepts in the tagging ontology – Support for identifying and eliminating false positives • Documented user requirements and capabilities • Proposal for wider use by many JSC organizations that search and analyze SCR database records – Including tighter integration with current SCR database, linking back to it SAS_08_Text-Mining_Tagging_”Software_Change_Requests”_Grissom